Deepseek R2 & V4 Deepseek’s NEW MODEL is 98% CHEAPER & 2X BETTER!



AI Summary

Summary of Video on DeepSeek R2 Rumors

  • Introduction to rumors about DeepSeek R2 and Huawei’s role.
  • Details of DeepSeek R2:
    • Estimated to be a 1.2 trillion parameter model with a hybrid mixture of Expert 3.0 architecture, double the size of R1.
    • Approximately 78 billion actual parameters, trained on 5.2 petabytes of data.
    • Utilizes a self-developed distributed training framework achieving 82% cluster utilization of Huawei’s Ascend 910b chip.
  • Huawei Ascend Chips:
    • DeepSeek has transitioned to using Ascend chips due to restrictions on Nvidia GPUs.
    • Ascend chips are noted for exceeding performance expectations, especially with manual optimizations.
    • Cost efficiency: Ascend chips reduce costs by 98% compared to H800.
  • Training Framework:
    • Operates on a Huawei cluster at FP16 precision.
  • Future Release Insights:
    • Anticipation of 3.5 model, followed by R1.5 and eventually R2.
    • Discussions on the performance of newer Ascend 910C chips compared to Nvidia’s H100.
  • Broader Context:
    • DeepSeek’s reliance on Huawei emphasizes the need for diversifying hardware vendors in AI development.
    • Mention of DeepSeek’s open-source stance and ongoing evolution into optimized training frameworks.
  • Closing thoughts on excitement for upcoming DeepSeek releases and community engagement regarding these developments.