Install Qwen3-14B with vLLM Locally
AI Summary
Installation of Coen 3 Model Using VLLM
- Overview
- Video focuses on installing the 14 billion flavor of Coen 3 using VLLM.
- Developed by Alibaba, known for reasoning, math, coding, and multilingual capabilities.
- Sponsored by Mast Compute (providing VM) and utilizing an Nvidia H100 GPU.
- Setting Up the Environment
- Create a virtual environment with Conda.
- Install VLLM version >= 8.4 (necessary for the model).
- Logging into Hugging Face
- Use your Hugging Face free read token to log in.
- Downloading and Serving the Model
- Command to serve the model locally with reasoning enabled (recommended for math, coding tasks).
- Model uses a reasoning parser from DeepCar 1.
- Starting the Text Generation Web UI
- Navigate to Docker, clone the Open Web UI repository, and run
docker-compose up
.- Access model via browser on port 8080.
- Setup requires an email and password for initial login.
- Configuring the Web UI
- Disable the default OpenAI endpoint to use the local VLLM API.
- Input the VLLM API key as needed.
- Testing Model’s Capabilities
- Logical Problem: Solved a train meeting problem step-by-step.
- Light Switch Puzzle: Provided strategies for identifying lamp switches.
- Debugging Code: Quickly resolved code errors.
- Creative Writing: Generated a Sci-Fi story centered on Mars settlement.
- Agentic Tasks: Assumed the role of a travel planner to simulate booking a flight and summarizing travel restrictions.
- Multilingual Translation: Successfully translated “I love you” into 50 languages and provided additional context.
- Real-World Social Scenario: Provided dating advice based on a juggling scenario.
- Conclusion
- Reflects on the model’s performance and abilities. Watch for further content on upcoming models.