Red Hat today expanded its alliance with Amazon Web Services (AWS) to add support for the development and deployment of artificial intelligence (AI) application and agents on processors built by the cloud service provider beginning in 2026.
Announced at the AWS re:Invent 2025 conference, the Red Hat AI Inference Server is now certified to run on AWS AI processors such as the AWS Inferentia2 and AWS Trainium3.
Red Hat and AWS are collaborating to optimize the open-source vLLM inference server originally developed in the Sky Computing Lab at the University of California at UC Berkeley to run on AWS processors.
Additionally, Red Hat is collaborating with AWS to provide an AWS Neuron operator for deploying AI workloads on Red Hat OpenShift, Red Hat OpenShift AI and Red Hat OpenShift Service running on AWS.
Following an initial developer preview of support for AWS AI processors, these platforms will become options for both training AI models and the inference engines needed to deploy AI applications.
The overall goal is to provide organizations with lower cost alternatives to graphical processor units (GPUs), says Tushar Katarki, senior director for product for generative foundation model platforms.
Eventually, Red Hat is working toward providing a level of abstraction that will enable application developers to describe their intent, with the underlying infrastructure and models required automatically being selected, configured and deployed to provide a level of control that minimizes costs, he adds. In effect, Red Hat will be providing a model-as-a-service platform that enables organizations to mix and match AI models and processors as needed, says Katarki. “The focus will shift to cost, complexity and control,” he adds.
That capability is also becoming increasingly important in regions such as Europe where sovereign cloud requirements are becoming more important, notes Katarki.
Red Hat has been at the forefront of an effort to reduce the complexity of building and deploying artificial intelligence (AI) applications at a time when IT teams are starting to assume more responsibility for optimally deploying AI inference engines on platforms based on Kubernetes clusters. While much of the training of those AI models continues to be done on high performance computing (HPC) platforms, the deployment of AI inference engines is becoming more distributed. That capability will ultimately prove crucial to improve the performance of AI applications as they are increasingly deployed at the network edge.
In the meantime, each organization will need to determine what level of AI expertise they will require as it becomes simpler to build and deploy AI agents. The number of organizations, for example, that might need to build or distill an AI model is going to be comparatively small compared to the number of organizations that are building and deploying AI agents. The one thing that is certain, however, is that as alternative approaches to building those AI agents become available the total cost of AI is only going to continue to decline to a point where deploying billions of them actually becomes economically feasible.

