Install Qwen3-14B with vLLM Locally

AI Summary

Installation of Coen 3 Model Using VLLM

Overview

Video focuses on installing the 14 billion flavor of Coen 3 using VLLM.

Developed by Alibaba, known for reasoning, math, coding, and multilingual capabilities.

Sponsored by Mast Compute (providing VM) and utilizing an Nvidia H100 GPU.

Setting Up the Environment

Create a virtual environment with Conda.

Install VLLM version >= 8.4 (necessary for the model).

Logging into Hugging Face

Use your Hugging Face free read token to log in.

Downloading and Serving the Model

Command to serve the model locally with reasoning enabled (recommended for math, coding tasks).

Model uses a reasoning parser from DeepCar 1.

Starting the Text Generation Web UI

Navigate to Docker, clone the Open Web UI repository, and run docker-compose up.

Access model via browser on port 8080.

Setup requires an email and password for initial login.

Configuring the Web UI

Disable the default OpenAI endpoint to use the local VLLM API.

Input the VLLM API key as needed.

Testing Model’s Capabilities

Logical Problem: Solved a train meeting problem step-by-step.

Light Switch Puzzle: Provided strategies for identifying lamp switches.

Debugging Code: Quickly resolved code errors.

Creative Writing: Generated a Sci-Fi story centered on Mars settlement.

Agentic Tasks: Assumed the role of a travel planner to simulate booking a flight and summarizing travel restrictions.

Multilingual Translation: Successfully translated “I love you” into 50 languages and provided additional context.

Real-World Social Scenario: Provided dating advice based on a juggling scenario.

Conclusion

Reflects on the model’s performance and abilities. Watch for further content on upcoming models.

ThirdBrAIn.tech

Explorer

Install Qwen3-14B with vLLM Locally

Install Qwen3-14B with vLLM Locally

Graph View

Backlinks