China’s HUGE AI Chip Breakthrough NVIDIA is out
AI Summary
This video analyzes Huawei’s breakthrough AI GPU and data center solution that challenges NVIDIA’s dominance in China. The main focus is on Huawei’s Ascend 910C GPU and the CloudMatrix 384 system.
Key Technical Developments
Huawei’s Ascend 910C GPU:
- Features double die design similar to NVIDIA’s Blackwell architecture
- Delivers 800 tFLOPS of compute at 16-bit precision
- 4x more powerful than NVIDIA’s H20 (the most advanced chip NVIDIA can sell in China)
- Still 3x less powerful than NVIDIA’s GB200
- Reportedly manufactured at 7nm by TSMC
- Higher memory bandwidth and 2x more efficient performance per watt
CloudMatrix 384 System:
- Built with 384 Ascend 910C GPUs (5x more GPUs than NVIDIA’s NVL72)
- Nearly doubles the performance of NVIDIA’s competing system
- Consumes 600kW vs NVIDIA’s 145kW (4x more power consumption)
- Features fully optical interconnects between all GPUs
System Architecture Analysis
NVIDIA’s Approach (NVL72):
- 72 GPUs with 36 NVLink switches
- Uses mostly copper connections (1,500 copper cables)
- 6x cheaper and more power efficient
- Delivers 180 PFLOPS of FP16 compute
Huawei’s Strategy:
- Fully optical connections using thousands of optical transceivers
- Direct GPU-to-GPU optical communication
- Higher bandwidth but significantly more power consumption
- More complex and failure-prone system
Software and Ecosystem
- Runs on Huawei’s proprietary CANN software stack (similar to NVIDIA’s CUDA)
- Built for Huawei’s DA VINCI architecture NPUs
- Handles compilers, graph optimization, and workload distribution
- Critical for managing the complex optical system
Manufacturing and Geopolitical Context
- Manufacturing remains China’s biggest challenge
- Heavy reliance on US/European tools and technologies
- US export controls are driving Chinese domestic innovation
- Future generations (Ascend 910D and 920) are in production
Energy and Infrastructure Considerations
- Power consumption is less critical in China due to cheaper energy costs
- 50% of China’s energy still comes from coal/oil
- Water consumption concerns: 100MW data center uses 2M liters/day
- China experimenting with underwater data centers for cooling
Market Impact
- Represents a “DeepSeek moment” for AI data centers
- Good enough to replace NVIDIA’s H20 in Chinese market
- Demonstrates that system-level innovation can compensate for silicon limitations
- Shows China’s strength in software and networking solutions
The video concludes that while Huawei is behind in pure silicon performance, their system-level architecture and software stack provide a viable alternative for the Chinese market, highlighting the shift from chip-focused to infrastructure-focused competition in AI computing.