Qwen 3 Is Here… But Where’s DeepSeek-R2?!



AI Summary

Overview of Quen 3

  • Release Context: Recent launch by Alibaba, anticipated alongside Deepseek R2.
  • Model Name: Quen 3235B A22B, indicating:
    • 235 billion parameters (large but not the largest)
    • 22 billion active parameters (activated for specific tasks).

Key Features

  • Model Architecture: Mixture of experts model for efficiency, activating only specialized experts for given tasks.
  • Performance Benchmarks:
    • 95% on Arena Hard, outperformed Deepseek R1, 01, and 03 Mini.
    • 85% and 81% on Amy 24 and Amy 25 math benchmarks, just behind Gemini 2.5 Pro.
    • Outperformed Gemini 2.5 Pro in coding benchmarks (Code Forces).
  • Additional Versions: Includes a Quen 3 32B dense model and other variants ranging from 600 million to 30 billion parameters.
  • Open Source Availability: Enhances access for researchers and startups, impacting global AI innovation.
  • Hybrid Thinkers: Capable of both reasoning and non-reasoning tasks, boosting performance on math and coding.
  • Expanded Dataset: Pre-trained on about 36 trillion tokens (compared to 18 trillion for Quen 2.5), covering 119 languages.
  • Agentic Capabilities: Optimized for coding and improved support for Anthropic’s MCP (Model Context Protocol).

Conclusion

  • Quen 3 is a formidable addition to the AI landscape, competing closely with existing models like Gemini 2.5 Pro, with upcoming updates from Deepseek promising further developments.