The Arrival of AI Networking at Petascale

The Arrival of AI Networking at Petascale

The AI industry has taken us by storm, bringing supercomputers, algorithms, data processing and training methods into the mainstream. The rapid ramp of large language inference models combined with Open AI’s ChatGPT has captured the interest and imagination of people worldwide. Generative AI applications promise benefits to just about every industry. New types of AI applications are expected to improve productivity on a wide range of tasks, be it marketing image creation for ads, video games or customer support. These generative large language models with over 100 billion parameters are advancing the power of AI applications and deployments. Furthermore, Moore’s law is pushing silicon geometries of TPU/GPU processors that connect 100 to 400 to 800 gigabits of network throughput with parallel processing and bandwidth capacity to match.

Data and Compute Intensive AI Workloads

Not only are AI/ML applications a huge driver of compute today, but the silicon industry is also keeping up with the demand by churning out scalable processors. These could be CPUs, GPUs, or TPUs optimized for workloads with parallel cores or specialized processors optimized for tensor and matrix computations, with memory and IO interfaces to match. A common characteristic of these workloads is that they are not only data but also compute-intensive. A typical structure of an AI workload involves a large sparse matrix computation, so large that the parameters of the matrix are distributed across hundreds or thousands of processors. Each processor involved performs intense computations for a period of time, sharing the “parameters” with other processors involved in the computation. Once the data from all peers is received, it can be reduced (or merged) with the local data, and another round of processing begins. This compute-exchange-reduce cycle increases the volume of data exchanged exponentially. A slowdown due to a suboptimal network can critically impact the application performance creating inefficient wait-states and idling away processor performance by 30% or more while wasting the efficiency of expensive GPUs. A modern, scalable AI network is imperative.

Ethernet for AI networking Scale

The mandate to avoid these idle states with massive processor density requires a specialized AI network with wire-rate delivery of large and synchronized bursts of data improving performance at speeds of 400/800G. One must rethink the network to scale to many hundreds upon thousands of racks of AI servers. High-performance and repetitive transit of data export and import are critical to these applications. In the past, this sort of performance existed only in the domain of specialized HPC networks such as InfiniBand. Today the combination of RDMA Ethernet NICs and RoCE (RDMA over converged Ethernet) allows Ethernet and IP to be used as the transport fabric without overhead. The advantage of Ethernet for AI networking is obvious with the economics of standards, a massive installed base, industry-wide interoperability and merchant silicon support, as shown in the AI network design guidelines below.

JU-AI-Blog

Arista 7800 AI Spine

The mega performance against any AI workload is best captured by the Arista 7800 as the premier AI spine. It delivers an unmatched combination of high-bandwidth, lossless, high radix fabric interconnecting hundreds and thousands of GPUs at speeds of 400/800G. Arista’s AI spine addresses key characteristics, including:

  • Hotspot-free load balancing to handle any size of elephant flow
  • Congestion-free flow control (PFC) from sender to receiver
  • Field-hardened control and notification with ECN implementations proven at scale to work for RDMA systems
  • Exceptional buffering flexibility
  • High radix to support large AI fabrics (up to 576 400G ports and 800G ahead)
  • Advanced quality of service and monitoring capabilities for flow control, load-balancing, latency, and memory usage

Arista AI spines bring a balanced combination of low power, predictable performance/latency and reliability characteristics for the most demanding AI workloads.

Arista EOS for AI Networking

The Arista 7800 AI spine is based on the flagship software stack Arista EOS, which is critical to handling enormous workloads. We deliver our advanced customers the optimal AI network assurance for their mission-critical workloads. By infusing AI properties into the programmable EOS, we can construct a reliable AI network for automation, visibility, resilience and dynamic controls. Examples of customizable dimensions include:

  • Dynamic load balancing designed to handle different GPU topologies and traffic conditions
  • AI Analyzer to monitor traffic counters at microsecond level time windows and handle microbursts due to the synchronized nature of AI/ML traffic flows
  • End-to-end congestion control to support RDMA congestion with PFC/ECN
  • Differentiated Quality of Service (QoS) to prioritize control traffic and separate it from RDMA traffic in a different queue
  • Advanced application and network monitoring capabilities such as watchdog, counters and latency analyzers

AI Networking at an Inflection Point

It is an exciting time at Arista as we look forward to helping our customers with their AI networking strategies. We deliver high scale bandwidth capacity with predictable workload performance for cloud networking. With Arista AI platforms, we continue to deliver the best combination of Ethernet versatility and IP protocol capabilities at petascale with unmatched, congestion free, lossless fabric for our customer’s AI strategies. The exponential growth of AI workloads as well as distributed AI processing traffic are placing explosive demands on the network traffic. Welcome to the new wave of petascale AI networking!

References:



Source link

The Arrival of AI Networking at Petascale Read More »