Qwen

by Alibaba Cloud

Open-source LLM series trained on 36 trillion tokens across 119 languages with reasoning capabilities, MoE architecture, and multimodal support, competing with GPT-4o and Claude

See https://qwenlm.github.io and https://github.com/QwenLM/Qwen3

Features

Model Family (2025):

  • Dense models: 0.6B, 1.7B, 4B, 8B, 14B, 32B parameters
  • MoE models: 30B-A3B (3B activated), 235B-A22B (22B activated)
  • Qwen2.5-Max: Large-scale MoE trained on 20+ trillion tokens
  • Qwen3-Max: Latest flagship model (September 2025) outperforming Claude 4 Opus, DeepSeek V3.1
  • QwQ-32B-Preview: Reasoning-focused model similar to OpenAI’s o1

Architecture & Context:

  • Mixture-of-Experts (MoE) architecture for efficient scaling
  • Up to 128K token context window (most models)
  • Qwen3: Extended to 256K tokens natively, expandable to 1M tokens
  • Trained on 36 trillion tokens in 119 languages and dialects
  • Apache 2.0 license for commercial use

Reasoning Capabilities:

  • Dual-mode operation: Thinking mode (with reasoning traces) and Instruct mode (direct responses)
  • Seamlessly integrated reasoning that can be enabled/disabled via tokenizer
  • QwQ-32B outperforms OpenAI’s o1 on some benchmarks
  • State-of-the-art results among open-weight thinking models

Multimodal Features:

  • Qwen2.5-VL: Parse files, understand videos, count objects in images
  • PC and phone control capabilities (similar to OpenAI’s Operator)
  • Analyze charts/graphics, extract data from invoices and forms
  • Multi-hour video comprehension
  • Qwen2.5-Omni: Text, images, videos, audio input; text and audio output
  • Qwen-Image-Edit-2511: Advanced image editing with improved consistency

Agent & Tool Integration:

  • Superior agent capabilities with precise tool integration
  • Model Context Protocol (MCP) support
  • Real-time voice chatting (similar to GPT-4o)
  • Enhanced long-context understanding across modes

Superpowers

Qwen stands out as the premier open-source multilingual reasoning model with flexible deployment options, making it ideal for:

  • Developers building AI agents needing open-source models with advanced reasoning and tool integration
  • Multilingual applications requiring support for 119 languages with consistent quality
  • Enterprises seeking model sovereignty with Apache 2.0 licensing for full control and customization
  • Research teams requiring transparency and fine-tuning capabilities on domain-specific data
  • Cost-conscious deployments leveraging MoE architecture for efficient inference

Real-world applications:

  • Reasoning-intensive tasks (math, science, logic puzzles)
  • Multilingual content generation and translation
  • Agent-based automation with tool calling
  • Document analysis and data extraction (invoices, forms, charts)
  • Video understanding and analysis (multi-hour comprehension)
  • Real-time voice chat applications

Key advantages:

  • Competitive with GPT-4o and Claude 3.5 Sonnet across benchmarks
  • Arena-Hard: 89.4 (beats DeepSeek V3 85.5, Claude 3.5 Sonnet 85.2)
  • LiveBench: 62.2 (leads GPT-4o and Claude 3.5)
  • Open weights enable self-hosting and fine-tuning
  • MoE architecture reduces computational costs
  • 256K-1M token context for long-document processing

Pricing

Open Source:

  • Free under Apache 2.0 license for all model sizes
  • Self-hosted deployment at infrastructure cost only
  • Unlimited inference without per-query charges

Alibaba Cloud Model Studio:

  • Pay-per-use API access
  • Pricing varies by model size and deployment region
  • Available through Alibaba Cloud services

Deployment Options:

  • Download from Hugging Face, ModelScope, or GitHub
  • Self-host on own infrastructure
  • Cloud deployment via Alibaba Cloud

Benchmark Performance

vs GPT-4o and Claude 3.5 Sonnet (Qwen2.5-Max):

  • Arena-Hard: 89.4 (leads both competitors)
  • MMLU-Pro: 76.1 (competitive with GPT-4o 77.0, Claude 78.0)
  • GPQA-Diamond: 60.1 (behind Claude 65.0)
  • LiveBench: 62.2 (leads DeepSeek 60.5, Claude 60.3)
  • LiveCodeBench: 38.7 (competitive with Claude 38.9)

General Improvements (Qwen3-Instruct-2507):

  • Significant improvements in instruction following, logical reasoning, text comprehension
  • Enhanced mathematics, science, and coding capabilities
  • Improved tool usage and agent performance

Getting Started

Quick Setup (Transformers):

from transformers import AutoModelForCausalLM, AutoTokenizer  
model_name = "Qwen/Qwen3-30B-A3B-Instruct-2507"  
tokenizer = AutoTokenizer.from_pretrained(model_name)  
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")  

Deployment Frameworks:

  • Production: SGLang, vLLM, TensorRT-LLM
  • Local: llama.cpp, Ollama
  • Specialized: OpenVINO, MLX, ExecuTorch

Available on:

Model Variants

Qwen3 (April 2025):

  • Dense and MoE variants across multiple sizes
  • 128K context, 119 languages
  • Apache 2.0 license

Qwen2.5-Max (January 2025):

  • Large-scale MoE, 20T+ token training
  • Beats GPT-4o and DeepSeek-V3 on key benchmarks

Qwen3-Max (September 2025):

  • Latest flagship, outperforms Claude 4 Opus
  • State-of-the-art non-reasoning model

QwQ-32B-Preview:

  • Reasoning specialist (like OpenAI o1)
  • 32K context, Apache 2.0
  • Outperforms o1 on some benchmarks

Qwen2.5-VL:

  • Vision-language model with PC/phone control
  • Multi-hour video comprehension

Qwen2.5-Omni:

  • Multimodal I/O (text, image, video, audio)
  • Real-time voice chat capabilities

Sources