Qwen 3 Is Here… But Where’s DeepSeek-R2?!
AI Summary
Overview of Quen 3
- Release Context: Recent launch by Alibaba, anticipated alongside Deepseek R2.
- Model Name: Quen 3235B A22B, indicating:
- 235 billion parameters (large but not the largest)
- 22 billion active parameters (activated for specific tasks).
Key Features
- Model Architecture: Mixture of experts model for efficiency, activating only specialized experts for given tasks.
- Performance Benchmarks:
- 95% on Arena Hard, outperformed Deepseek R1, 01, and 03 Mini.
- 85% and 81% on Amy 24 and Amy 25 math benchmarks, just behind Gemini 2.5 Pro.
- Outperformed Gemini 2.5 Pro in coding benchmarks (Code Forces).
- Additional Versions: Includes a Quen 3 32B dense model and other variants ranging from 600 million to 30 billion parameters.
- Open Source Availability: Enhances access for researchers and startups, impacting global AI innovation.
- Hybrid Thinkers: Capable of both reasoning and non-reasoning tasks, boosting performance on math and coding.
- Expanded Dataset: Pre-trained on about 36 trillion tokens (compared to 18 trillion for Quen 2.5), covering 119 languages.
- Agentic Capabilities: Optimized for coding and improved support for Anthropic’s MCP (Model Context Protocol).
Conclusion
- Quen 3 is a formidable addition to the AI landscape, competing closely with existing models like Gemini 2.5 Pro, with upcoming updates from Deepseek promising further developments.