How to Build Your Own AI Data Center in 2025 — Paul Gilbert, Arista Networks

AI Summary

Title: AI Network Infrastructure Overview
Speaker: Paul Gil, Tech Lead at Arista Networks
Key Points:

Introduction to AI Models and Infrastructure:

Focus on training models and the infrastructure for inference.

Importance of understanding job completion times related to model training and inference.

GPU Infrastructure:

Discussed the setup of backend and frontend networks for AI workloads.

Typical GPU configurations include 248 GPUs for training and 4 H100s for inference.

Networking Challenges in AI:

Isolation of GPU networks due to high cost and power consumption.

Use of fast switches (leaf and spine) but no connections to external networks to avoid compromising performance.

Need for one-to-one bandwidth ratios to manage bursty traffic generated by GPUs.

Importance of Design and Scale:

Design choices affect the performance and capability to scale up networks for AI as workloads grow.

Comparison of scale up vs. scale out in AI infrastructure.

Traffic Management in AI Networks:

Traffic is east-west due to GPU communication, with north-south traffic for data retrieval.

Use of tools like RDMA and error management for efficient networking.

Key protocols include Cuda and Nickel, which impact network performance.

Power Requirements:

AI racks require significantly more power than traditional server racks (e.g., 10.2 KW for 8 GPUs).

Enterprises must adapt to higher power consumption and cooling requirements, including water-cooled racks.

Future Trends:

Expected advancements in Ethernet technology to improve congestion control and packet handling.

Continuous growth of data consumption and network demands in AI.
Conclusion:

The AI infrastructure landscape is evolving, focusing on optimizing network design to support increased GPU utilization and managing complexity in power and traffic control.

ThirdBrAIn.tech

Explorer

How to Build Your Own AI Data Center in 2025 — Paul Gilbert, Arista Networks

How to Build Your Own AI Data Center in 2025 — Paul Gilbert, Arista Networks

Graph View

Backlinks