Qwen
Open-source LLM series trained on 36 trillion tokens across 119 languages with reasoning capabilities, MoE architecture, and multimodal support, competing with GPT-4o and Claude
See https://qwenlm.github.io and https://github.com/QwenLM/Qwen3
Features
Model Family (2025):
- Dense models: 0.6B, 1.7B, 4B, 8B, 14B, 32B parameters
- MoE models: 30B-A3B (3B activated), 235B-A22B (22B activated)
- Qwen2.5-Max: Large-scale MoE trained on 20+ trillion tokens
- Qwen3-Max: Latest flagship model (September 2025) outperforming Claude 4 Opus, DeepSeek V3.1
- QwQ-32B-Preview: Reasoning-focused model similar to OpenAI’s o1
Architecture & Context:
- Mixture-of-Experts (MoE) architecture for efficient scaling
- Up to 128K token context window (most models)
- Qwen3: Extended to 256K tokens natively, expandable to 1M tokens
- Trained on 36 trillion tokens in 119 languages and dialects
- Apache 2.0 license for commercial use
Reasoning Capabilities:
- Dual-mode operation: Thinking mode (with reasoning traces) and Instruct mode (direct responses)
- Seamlessly integrated reasoning that can be enabled/disabled via tokenizer
- QwQ-32B outperforms OpenAI’s o1 on some benchmarks
- State-of-the-art results among open-weight thinking models
Multimodal Features:
- Qwen2.5-VL: Parse files, understand videos, count objects in images
- PC and phone control capabilities (similar to OpenAI’s Operator)
- Analyze charts/graphics, extract data from invoices and forms
- Multi-hour video comprehension
- Qwen2.5-Omni: Text, images, videos, audio input; text and audio output
- Qwen-Image-Edit-2511: Advanced image editing with improved consistency
Agent & Tool Integration:
- Superior agent capabilities with precise tool integration
- Model Context Protocol (MCP) support
- Real-time voice chatting (similar to GPT-4o)
- Enhanced long-context understanding across modes
Superpowers
Qwen stands out as the premier open-source multilingual reasoning model with flexible deployment options, making it ideal for:
- Developers building AI agents needing open-source models with advanced reasoning and tool integration
- Multilingual applications requiring support for 119 languages with consistent quality
- Enterprises seeking model sovereignty with Apache 2.0 licensing for full control and customization
- Research teams requiring transparency and fine-tuning capabilities on domain-specific data
- Cost-conscious deployments leveraging MoE architecture for efficient inference
Real-world applications:
- Reasoning-intensive tasks (math, science, logic puzzles)
- Multilingual content generation and translation
- Agent-based automation with tool calling
- Document analysis and data extraction (invoices, forms, charts)
- Video understanding and analysis (multi-hour comprehension)
- Real-time voice chat applications
Key advantages:
- Competitive with GPT-4o and Claude 3.5 Sonnet across benchmarks
- Arena-Hard: 89.4 (beats DeepSeek V3 85.5, Claude 3.5 Sonnet 85.2)
- LiveBench: 62.2 (leads GPT-4o and Claude 3.5)
- Open weights enable self-hosting and fine-tuning
- MoE architecture reduces computational costs
- 256K-1M token context for long-document processing
Pricing
Open Source:
- Free under Apache 2.0 license for all model sizes
- Self-hosted deployment at infrastructure cost only
- Unlimited inference without per-query charges
Alibaba Cloud Model Studio:
- Pay-per-use API access
- Pricing varies by model size and deployment region
- Available through Alibaba Cloud services
Deployment Options:
- Download from Hugging Face, ModelScope, or GitHub
- Self-host on own infrastructure
- Cloud deployment via Alibaba Cloud
Benchmark Performance
vs GPT-4o and Claude 3.5 Sonnet (Qwen2.5-Max):
- Arena-Hard: 89.4 (leads both competitors)
- MMLU-Pro: 76.1 (competitive with GPT-4o 77.0, Claude 78.0)
- GPQA-Diamond: 60.1 (behind Claude 65.0)
- LiveBench: 62.2 (leads DeepSeek 60.5, Claude 60.3)
- LiveCodeBench: 38.7 (competitive with Claude 38.9)
General Improvements (Qwen3-Instruct-2507):
- Significant improvements in instruction following, logical reasoning, text comprehension
- Enhanced mathematics, science, and coding capabilities
- Improved tool usage and agent performance
Getting Started
Quick Setup (Transformers):
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen3-30B-A3B-Instruct-2507"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto") Deployment Frameworks:
- Production: SGLang, vLLM, TensorRT-LLM
- Local: llama.cpp, Ollama
- Specialized: OpenVINO, MLX, ExecuTorch
Available on:
- Hugging Face:
Qwen/Qwen3-* - ModelScope (Alibaba’s platform)
- Alibaba Cloud Model Studio
- GitHub: https://github.com/QwenLM/Qwen3
Model Variants
Qwen3 (April 2025):
- Dense and MoE variants across multiple sizes
- 128K context, 119 languages
- Apache 2.0 license
Qwen2.5-Max (January 2025):
- Large-scale MoE, 20T+ token training
- Beats GPT-4o and DeepSeek-V3 on key benchmarks
Qwen3-Max (September 2025):
- Latest flagship, outperforms Claude 4 Opus
- State-of-the-art non-reasoning model
QwQ-32B-Preview:
- Reasoning specialist (like OpenAI o1)
- 32K context, Apache 2.0
- Outperforms o1 on some benchmarks
Qwen2.5-VL:
- Vision-language model with PC/phone control
- Multi-hour video comprehension
Qwen2.5-Omni:
- Multimodal I/O (text, image, video, audio)
- Real-time voice chat capabilities
Sources
- Qwen - Wikipedia
- GitHub - QwenLM/Qwen3
- Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model
- Alibaba unveils Qwen3, a family of ‘hybrid’ AI reasoning models | TechCrunch
- Qwen 2.5-Max: Features, DeepSeek V3 Comparison & More | DataCamp
- Alibaba’s Qwen team releases AI models that can control PCs and phones | TechCrunch