Deepseek R2 & V4 Deepseek's NEW MODEL is 98% CHEAPER & 2X BETTER!

Deepseek R2 & V4 Deepseek’s NEW MODEL is 98% CHEAPER & 2X BETTER!

AI Summary

Summary of Video on DeepSeek R2 Rumors

Introduction to rumors about DeepSeek R2 and Huawei’s role.

Details of DeepSeek R2:

Estimated to be a 1.2 trillion parameter model with a hybrid mixture of Expert 3.0 architecture, double the size of R1.

Approximately 78 billion actual parameters, trained on 5.2 petabytes of data.

Utilizes a self-developed distributed training framework achieving 82% cluster utilization of Huawei’s Ascend 910b chip.

Huawei Ascend Chips:

DeepSeek has transitioned to using Ascend chips due to restrictions on Nvidia GPUs.

Ascend chips are noted for exceeding performance expectations, especially with manual optimizations.

Cost efficiency: Ascend chips reduce costs by 98% compared to H800.

Training Framework:

Operates on a Huawei cluster at FP16 precision.

Future Release Insights:

Anticipation of 3.5 model, followed by R1.5 and eventually R2.

Discussions on the performance of newer Ascend 910C chips compared to Nvidia’s H100.

Broader Context:

DeepSeek’s reliance on Huawei emphasizes the need for diversifying hardware vendors in AI development.

Mention of DeepSeek’s open-source stance and ongoing evolution into optimized training frameworks.

Closing thoughts on excitement for upcoming DeepSeek releases and community engagement regarding these developments.

ThirdBrAIn.tech

Explorer

Deepseek R2 & V4 Deepseek's NEW MODEL is 98% CHEAPER & 2X BETTER!

Deepseek R2 & V4 Deepseek’s NEW MODEL is 98% CHEAPER & 2X BETTER!

Graph View