aws logo

Amazon Web Services (AWS) this week previewed a cloud service based on its Tranium2 processors that is optimized for running the inference engines that drive artificial intelligence (AI) applications.

Announced at the AWS re:Invent 2024 conference, the latency optimized inference option for the Amazon Bedrock service provides an alternative to graphical processor units (GPUs) or traditional CPUs to run inference engines.

Launched last year, Trainium2 processors were originally created to optimize the training of AI models using an EC2 UltraClusters service that provides access to up to 100,000 processors.

Peter Desantis, senior vice president for AWS Utility Computing, told conference attendees this latest addition to the AWS portfolio is an example of a tightly coupled systolic array that is designed from the ground up to process data in parallel using Trainium2 processors that are purpose-built to process deep learning algorithms.

AWS claims the latency optimized inference option for the Amazon Bedrock service provides access to servers that are optimized to run AI inference engines such as Llama 3.1 405B and 70B faster than any other cloud service. That approach provides a more cost-effective approach to GPUs and other classes of processors that were not originally designed for AI models. That will prove to be especially critical for workflows that span multiple AI agents, notes Desantis. “There is a desire for really fast agentic workflows,” he says.

It’s not clear how many inference engines are being deployed in the cloud, but IT organizations generally prefer to deploy them where most of the data being processed and analyzed is located. The amount of data being stored in cloud computing environments has increased considerably in recent years, but much of the data that inference engines need to process also resides in on-premises IT environments where it was originally created.

Of course, there continues to be a general shortage of GPUs. Many IT teams as a result have been using traditional CPUs to run inference engines, but there is usually a performance tradeoff compared to running those models on GPUs that process data in parallel. AWS is now making a case for Tranium2 processors to process data in parallel at a lower total cost.

Trainium2 processors are one of several classes of processors that AWS is now designing and building to optimize the performance of a range of applications. General purpose processors are not capable of running AI applications to run at the level of scale cloud applications now require, says Desantis.

Each IT organization will, of course, need to decide to what degree they might want to move data into the AWS cloud to take advantage of Trainium2 processors. The one thing that is certain is that as more AI applications are built, internal IT teams are exercising more control over the IT platforms needed to run the inference engines that enable AI models to be deployed in production environments.

One way another, however, AWS is betting that it’s only a matter of time before more organizations realize that processors designed for AI workloads are always going to out perform any type of general-purpose class of processors.

TECHSTRONG TV

Click full-screen to enable volume control
Watch latest episodes and shows

Qlik Tech Field Day Showcase

TECHSTRONG AI PODCAST

SHARE THIS STORY