Akamai Technologies used to be known as the Content Delivery Network (CDN) company; some called it the “backbone of the internet” with its media streaming capabilities and its web stability technologies.
Today we know Akamai Technologies for delivery and stability, but on a somewhat different basis; the company is now recognized for its distributed cloud, cybersecurity and content delivery prowess as it makes use of its massive edge network to provide low-latency AI inference, zero-trust security and high-performance media delivery.
Akamai Clouds, Defined
The Akamai Connected Cloud employs thousands of Nvidia Blackwell GPUs for distributed AI inference at the edge. What kind of edge? We’re talking about AI inference nodes, edge cache servers, plain compute instances and so-called gecko nodes, i.e., compact, localized servers designed to run Virtual Machines and WASM (WebAssembly) for efficiency and security.
Looking deeper here, Akamai Inference Cloud is a global-scale implementation of Nvidia AI Grid, a distributed network of interconnected AI infrastructure designed to transform isolated datacenters into a unified, grid-aware intelligence platform. Akamai Inference Cloud works by intelligently routing AI workloads across its edge, regional and core footprint to balance latency, cost and performance. In other words, it’s AI in the right place at the right time, at the right volume, and at the right cost.
This contextualization is necessary if we are to understand why the company thinks it has passed a milestone in the evolution of AI by this week, unveiling the first global-scale implementation of Nvidia AI Grid reference design. This reference design is a “blueprint for standardized interconnected AI infrastructure” from Nvidia that integrates its Blackwell chips, liquid cooling and networking.
Liquid-Cooled Chips?
Liquid-cooled chips, really? Yes, rather like a common household radiator, non-potable reclaimed water is used to absorb heat from the GPUs, travels to a heat exchanger, gets cooled back down (by outside air or a separate water loop) and returns to the GPU.
By integrating Nvidia AI infrastructure into Akamai’s infrastructure and using intelligent workload orchestration across its network, Akamai intends to move the industry beyond isolated AI factories toward a unified, distributed grid for AI inference.
The move marks a step in the evolution of Akamai Inference Cloud, introduced late last year. As the first to operationalize its AI Grid in this way, Akamai is rolling out thousands of snappily-named Nvidia RTX PRO 6000 Blackwell Server Edition GPUs, providing a platform to enable enterprises to run agentic and physical AI (a term often used to refer to humanoid robots and autonomous vehicles) with the responsiveness of local compute and the scale of the global web.
“AI factories have been purpose-built for training and frontier model workloads – and centralized infrastructure will continue to deliver the best tokenomics for those use cases,” said Adam Karon, chief operating officer and general manager, cloud technology group, Akamai. “But real-time video, physical AI and highly concurrent personalized experiences demand inference at the point of contact, not a round trip to a centralized cluster. Our AI Grid intelligent orchestration gives AI factories a way to scale inference outward – leveraging the same distributed architecture that revolutionized content delivery to route AI workloads across 4,400 locations, at the right cost, at the right time.”
What is Tokenomics?
NOTE: We can define tokenomics as the economic model and framework used to manage the supply, demand and cost of data units that consume AI tokens processed by AI models to track inference efficiency often in terms of tokens-per-watt (although watts-per-token is also equivalent), employing techniques such as prompt caching to store frequently used AI instructions and reduce token costs and ensure sustainable profitability.
At the heart of the AI Grid is an intelligent orchestrator that acts as a real-time broker for AI requests. Applying Akamai’s expertise in application performance optimization to AI, this workload-aware control plane optimizes “tokenomics” by improving cost per token, time-to-first-token and throughput.
A major differentiator for Akamai is the ability for customers to access fine-tuned or sparsified models through its enormous global edge footprint, which offers a massive cost and performance advantage for the long tail of AI workloads.
For example, enterprises can reduce inference costs by matching workloads to the right compute tier automatically. The orchestrator applies techniques like semantic caching and intelligent routing to direct requests to right-sized resources, reserving premium GPU cycles for the workloads that demand them. Underpinning this is Akamai Cloud, built on open-source infrastructure with generous egress allowances to support data-intensive AI operations at scale.
Gaming studios can deliver AI-driven non-player character (NPC) interactions that maintain player immersion in milliseconds. Financial institutions can execute personalized fraud detection and marketing recommendations in the moment between login and first screen. Broadcasters can transcode and dub content in real time for global audiences.
Globally Distributed Edge Network
These outcomes are powered by Akamai’s globally distributed edge network with over 4,400 locations with integrated caching, serverless edge compute and high-performance connectivity that processes requests at the point of user contact, bypassing the round-trip lag of origin-dependent clouds.
Large language models, continuous post-training and multi-modal inference workloads require sustained, high-density compute that only dedicated infrastructure can deliver. Akamai’s multi-thousand GPU clusters, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, provide the concentrated horsepower for the heaviest AI workloads, complementing the distributed edge with centralized scale.
The company says it can manage complex SLAs across edge and core locations:
- The Edge (4,400+ locations): Delivers rapid response times for physical AI and autonomous agents. It will leverage semantic caching and serverless capabilities like Akamai Functions (WebAssembly-based compute) and EdgeWorkers to deliver model affinity and stable performance at the point of user contact.
- Akamai Cloud IaaS and dedicated GPU clusters: Core public cloud infrastructure enables portability and cost savings for massive-scale workloads, while pods powered by Nvidia RTX PRO 6000 Blackwell GPUs enable heavy-duty post-training and multi-modal inference.
“New AI-native applications demand predictable latency and better cost efficiency at planetary scale,” said Chris Penrose, global VP of business development for telco at Nvidia. “By operationalizing the NVIDIA AI Grid, Akamai is building the connective tissue for generative, agentic and physical AI, moving intelligence directly to the data to unlock the next wave of real-time applications.”
The Next Wave of Real-Time AI
Akamai is already seeing strong, early adoption for Akamai Inference Cloud across compute-intensive, latency-sensitive industries. The first wave of AI infrastructure was defined by massive GPU clusters in a handful of centralized locations, optimized for training. But as inference becomes the dominant workload and businesses across every industry focus on building AI agents, that centralized model faces the same scaling constraints that earlier generations of internet infrastructure encountered with media delivery, online gaming, financial transactions and complex microservices applications.
Akamai is solving each of those challenges through the same core approach, i.e., distributed networking, intelligent orchestration and purpose-built systems that bring content and context together as close as possible to the digital touchpoint.

