Amazon Web Services (AWS) today unveiled a portfolio of Nova artificial intelligence (AI) models that it plans to make available alongside an expanding portfolio of AI models hosted on the Amazon Bedrock service.
At the same time, AWS also revealed it plans to make available late next year a three-nanometer Trainium3 processor that promises to improve the rate at which AI models can be trained by a factor of four.
Announced at the re:Invent AWS 2024 conference, there are six foundational Nova models that address everything from text and speech to a multimodal model that makes it possible to combine various inputs.
Amazon CEO Andy Jassy told conference attendees that in the near future it will also become possible to, for example, mix and match different inputs and outputs by, for example, using a speech AI model to interact with an AI model designed for video applications.
AWS is also previewing an Amazon Bedrock Model Distillation service that promises to make it simpler to train smaller models using foundational models such as Nova.
The overall goal is not to necessarily replace AI models from partners such as Anthropic as much as it is to expand the number of options available to organizations, says Jassy. “We’re going to provide the best combination of all of these,” he says.
In the short term, AWS will continue to provide access to a mix of classes of processors, including graphical processor units (GPUs), to train AI models, but next year it is planning to add Trainium3, a set of processors that will be 30% more efficient than the Trainium2 processors, which are now generally available. AWS claims already provide 30 to 40% better price performance than the current GPUs made available by AWS.
At the same time, AWS also unfurled an Amazon EC2 Trn2 UltraServers service based on 64 interconnected Trainium2 chips that are connected over a NeuronLink interconnect to provide up to 83.2 peak petaflops of compute that could be used to train next-generation LLMs spanning more than a trillion parameters. AWS also provides access to 16 Trainium2 chips via the AWS ECS service to provide 20.8 peak petaflops of compute for training and deploying LLMs with billions of parameters.
Trainium processors are specifically optimized for the deep learning algorithms that drive LLMs. It’s not clear to what degree Tranium2 processors might reduce the current level of dependency on GPUs to train LLMs but regardless of approach, AWS is betting that generative AI inference engines will be widely employed across nearly every application.
AWS, to help build those applications, is also now making available a Neuron software development kit (SDK) that includes a compiler, runtime libraries, and tools to help developers optimize their models to run with optimal performance on Trainium chips. Neuron is natively integrated with popular frameworks like JAX and PyTorch to enable application developers to use their existing code and workflows with fewer code changes. Neuron also supports over 100,000 models on the Hugging Face model hub, and via a Neuron Kernel Interface (NKI) provides access to bare metal Trainium chips to enable them to write high-performance compute kernels. Neuron is also designed to make it easy to use frameworks such as JAX to train and deploy models on Trainium2 while minimizing code changes and tie-in to vendor-specific solutions.
It may be a while yet before AWS is able to supplant more established AI rivals, but the one thing that is certain is the cloud service provider is committed to providing the level of investment that will clearly be required.