Deepseek R2 & V4 Deepseek’s NEW MODEL is 98% CHEAPER & 2X BETTER!
AI Summary
Summary of Video on DeepSeek R2 Rumors
- Introduction to rumors about DeepSeek R2 and Huawei’s role.
- Details of DeepSeek R2:
- Estimated to be a 1.2 trillion parameter model with a hybrid mixture of Expert 3.0 architecture, double the size of R1.
- Approximately 78 billion actual parameters, trained on 5.2 petabytes of data.
- Utilizes a self-developed distributed training framework achieving 82% cluster utilization of Huawei’s Ascend 910b chip.
- Huawei Ascend Chips:
- DeepSeek has transitioned to using Ascend chips due to restrictions on Nvidia GPUs.
- Ascend chips are noted for exceeding performance expectations, especially with manual optimizations.
- Cost efficiency: Ascend chips reduce costs by 98% compared to H800.
- Training Framework:
- Operates on a Huawei cluster at FP16 precision.
- Future Release Insights:
- Anticipation of 3.5 model, followed by R1.5 and eventually R2.
- Discussions on the performance of newer Ascend 910C chips compared to Nvidia’s H100.
- Broader Context:
- DeepSeek’s reliance on Huawei emphasizes the need for diversifying hardware vendors in AI development.
- Mention of DeepSeek’s open-source stance and ongoing evolution into optimized training frameworks.
- Closing thoughts on excitement for upcoming DeepSeek releases and community engagement regarding these developments.