
Akamai today added an ability to run workloads based on artificial intelligence (AI) inference engines across its distributed cloud computing network.
The Akamai Cloud Inference service will make it possible to deploy AI inference engines closer to the sources of data that drive them, using a content delivery network (CDN), says Ari Weil, vice president of product marketing for Akamai. “We built cloud services into our CDN,” he adds.
Akamai also provides access to application development tools from NVIDIA, including Triton, TAO Toolkit, TensorRT, and NVFlare to optimize the performance of AI inference models.
The company has also partnered with VAST Data to provide latency-sensitive AI models with access to real-time data along with support for vector databases such as Aiven and Milvus, to enable retrieval augmented generation (RAG).
Additionally, Akamai provides access to a distribution of Kubernetes optimized for the network edge that is gained with the acquisition of Linode, along with support for open source extensions such as Kubeflow, KServe and SpinKube that enable IT teams to build and deploy AI models. That latter project was developed by Fermyon Technologies, which is now making its Web Assembly serverless computing framework available as an Akamai service.
As IT organizations begin to deploy AI applications many of them are starting to encounter latency issues that require them to, for example, deploy AI inference engines as close to the network edge as possible. The Akamai Cloud Inference service makes it possible to deploy those inference engines for both predictive and generative AI models in a way that improves throughput by as much as a factor of three, while reducing latency up to as much as 2.5x, said Weil.
While data science teams are still largely responsible for training AI models, internal IT operations teams are increasingly being tasked with managing the inference engines needed to run those AI models in production environments. Most of those inference engines will be deployed on graphics processing units (GPUs), but Akamai provides access to a range of classes of processors that, in some instances, could dramatically reduce the total cost of running an AI inference model.
As AI applications inevitably become more distributed, Akamai is making a case for a set of integrated compute, networking and storage services that deliver more than one petabyte per second of throughput for data-intensive workloads using infrastructure resources spanning more than 4,100 points of presence (PoPs) connected via more than 1,200 networks.
It’s not clear yet where most AI inference engines will be deployed, but the closer they are to the data they need to access, the better the overall application experience becomes. In many cases, it’s simply not going to be feasible to continuously transfer data across a wide area network (WAN) to an AI inference model. So the only real alternative is to deploy the AI inference engine as close as possible to where data is being created, analyzed and stored.
The challenge then becomes finding a way to manage all those distributed AI models that over time will eventually need to be replaced by another iteration that has been trained using additional data collected since the first model was initially deployed. Naturally, the more integrated compute, storage and networking services are, the easier that task becomes.