Install Qwen3-14B with vLLM Locally



AI Summary

Installation of Coen 3 Model Using VLLM

  1. Overview
    • Video focuses on installing the 14 billion flavor of Coen 3 using VLLM.
    • Developed by Alibaba, known for reasoning, math, coding, and multilingual capabilities.
    • Sponsored by Mast Compute (providing VM) and utilizing an Nvidia H100 GPU.
  2. Setting Up the Environment
    • Create a virtual environment with Conda.
    • Install VLLM version >= 8.4 (necessary for the model).
  3. Logging into Hugging Face
    • Use your Hugging Face free read token to log in.
  4. Downloading and Serving the Model
    • Command to serve the model locally with reasoning enabled (recommended for math, coding tasks).
    • Model uses a reasoning parser from DeepCar 1.
  5. Starting the Text Generation Web UI
    • Navigate to Docker, clone the Open Web UI repository, and run docker-compose up.
    • Access model via browser on port 8080.
    • Setup requires an email and password for initial login.
  6. Configuring the Web UI
    • Disable the default OpenAI endpoint to use the local VLLM API.
    • Input the VLLM API key as needed.
  7. Testing Model’s Capabilities
    • Logical Problem: Solved a train meeting problem step-by-step.
    • Light Switch Puzzle: Provided strategies for identifying lamp switches.
    • Debugging Code: Quickly resolved code errors.
    • Creative Writing: Generated a Sci-Fi story centered on Mars settlement.
    • Agentic Tasks: Assumed the role of a travel planner to simulate booking a flight and summarizing travel restrictions.
    • Multilingual Translation: Successfully translated “I love you” into 50 languages and provided additional context.
    • Real-World Social Scenario: Provided dating advice based on a juggling scenario.
  8. Conclusion
    • Reflects on the model’s performance and abilities. Watch for further content on upcoming models.