DPUs, the New Heroes Powering AI Factories

As AI continues its rapid evolution, the scale and complexity of the infrastructure required to support it is also growing at unprecedented rates. A new type of data center is being developed to satisfy this.

Dubbed AI factories, these are large-scale infrastructures with massive storage, networking, and computing investments purpose-built to serve the high-volume, high-performance requirements of AI training and inference.

These factories are very similar to large-scale manufactories – except inside these industrial buildings, AI models and services are created, refined, and deployed. Much like how a manufacturing plant uses robotics to produce finished goods, AI factories produce industry-changing innovations like the large-language models (LLMs) and AI systems powering autonomous vehicles.

Because of their robust infrastructure, AI factories can home immense data volumes, satisfy low-latency requirements, and provide stellar energy efficiency— and this exposes the limits of traditional computing architecture. Where historically CPUs and GPUs have served as the twin computing pillars, the rise in infrastructure complexity and workload diversity has created the need for a third computing component: the data processing unit (DPU).

What Is a DPU?

A DPU is a programmable processor designed specifically to handle data movement and processing at a network’s line rate using hardware optimized for the purpose.

Historically, DPUs have been used for data-centric operations, such as network traffic acceleration and data encryption. More modern versions have expanded into areas of storage optimization and security enforcement, and this has opened a new model for data center compute – one where traditional CPUs handle application logic, GPUs perform computations for the core tasks of model training and inferencing, and DPUs relieve the burden of system-level chores, including security and delivery functions across the network and application stack.

DPUs typically reside on network interface cards (NICs) that connect servers to a network. Embedding additional DPU processing capabilities in the data connectivity path, the DPU effectively upgrades the NIC from being purely networking-centric into a powerful compute engine that is inline with the data path.

And because of its inline location, the DPU can process not only data flowing from external clients to the server, but also orchestrate inter-cluster networking tasks when multiple servers work together within an AI factory.

The Advantage of DPUs for AI Factories

For AI factories to operate at scale and meet the demands of advanced workloads, DPUs provide several critical benefits like:

Offloading Workloads for Faster Performance
One of the DPU’s primary functions is to offload infrastructure-related tasks from CPUs and GPUs. These include network packet processing, storage management, and security operations like data encryption and traffic monitoring. By relieving the other compute elements from critical yet more mundane system-level tasks, DPUs free up cycles on CPUs and GPUs allowing them to focus on the application-level needs of high-value workloads like AI training and inference, thus improving the overall throughput of AI factories while reducing bottlenecks.

Enabling Power Efficiency
As the environmental impact of large-scale computing becomes a growing concern, energy efficiency is a key focus for AI factories. DPUs designed with specialized hardware for their data-centric tasks, such as compression and encryption, perform the same work as CPUs while consuming significantly less power. This improvement in efficiency not only reduces operational costs but also helps organizations achieve sustainability goals.

Reducing Latency for Real-Time AI Inference
Since DPUs are located with the inline network path, they can move data at the “line rate” of networks, which ensures that AI applications can access data without delays caused by additional packet copies and network hops, improving response times and user experience.

Adaptable, Simplified Scalability
Development and deployment of distributed AI models are driving the need for larger and more interconnected infrastructure. Because DPUs are provisioned at an appropriate scale hand-in-hand with the network, organizations deploy DPUs as the data throughput requirements of AI factories increase. Additionally, the DPU’s programmability allows them to adapt to increasingly complex networking and data processing needs, achieving scalability and adaptability.

How Are DPUs Used in AI Workflows?

In AI factories, DPUs are most used when high-speed parallel data processing is required – especially if that processing must seamlessly scale and be integrated with other computing components. Key use cases include:

Optimizing AI Model Pipelines: DPUs help optimize large-scale training, fine-tuning, and inferencing workflows by ensuring massive amounts of data are efficiently moved between GPUs, storage systems, and CPUs.
Supporting Data Storage Clusters: As AI pipelines rely on high-performance storage systems, DPUs facilitate intelligent data routing to optimize storage access and retrieval.
Powering the Edge: As edge computing grows in importance, DPUs bring low-latency and power-efficient data processing to applications that depend on them like autonomous vehicles and remote sensing.
Real-Time Networking in 5G and Beyond: DPUs are being adopted in high-performance environments like 5G radio access networks, where low power consumption and immediate data throughput are essential.

The Future: A Third Pillar of Computing

As AI moves to impact every industry, the computing landscape must continue to adapt. DPUs are increasingly being used as the third pillar of computing, alongside CPUs and GPUs. Their ability to manage massive data flows, reduce power inefficiencies, and enable scalability will be essential to meeting AI factories’ growing demands. In the race to build more powerful and efficient large-scale AI infrastructure build-outs, organizations adopting DPUs will be better positioned to accelerate innovation, lower operational costs, and unlock the full potential of their compute resources.