NVIDIA this week launched a microservice that streamlines the process of integrating data with a large language model (LLM).

Announced at the AWS re:Invent 2023 conference, NVIDIA NeMo Retriever provides a toolkit for consistently implementing retrieval augmented generation processes that are used to extend a large language model. It was discussed, along with NeMo LLM framework, a framework for building generative AI applications, and BioNeMo, a generative AI platform for drug discovery.

At the same time, NVIDIA announced it is making NVIDIA DGX Cloud, a service for training AI models, available on the AWS cloud.

In addition, NVIDIA is working with AWS to provide access to an AI supercomputer based on the NVIDIA Grace Hopper Superchip in addition to launching Project Ceiba, an effort to develop the fastest supercomputer based on graphical processor units (GPUs) that NVIDIA will use to train its AI models. AWS, meanwhile, is committed to using the NVIDIA NeMo frameworks and GPUs to train its Titan LLMs.

Finally, NVIDIA is making more instances of the latest generation of GPUs from NVIDIA available on the AWS cloud, which can also be used to build robotics applications twice as fast.

In general, NVIDIA is committed to working with AWS to streamline the building and deployment of AI applications, says Ian Buck, vice president and general manager for accelerated computing at NVIDIA.

NVIDIA NeMo Retriever, for example, provides IT teams with a microservice constructed using containers that IT teams can securely reuse to expose enterprise data to an LLM, notes Buck.

The goal is to make it simpler for data science, data engineers and DevOps teams to collaboratively extend LLMs to build and deploy generative AI applications, he adds. “You can insert it into a workflow,” says Buck.

It’s still early days so far as the building and deploying of these applications is concerned, but it’s already apparent that best practices used to build applications today will need to be modified to enable development teams to infuse AI models into them. In some cases, that may be as simple as adding application programming interfaces (APIs) to make a call to a model, but in many other instances AI models will be embedded within the application.

Most organizations today build AI models using machine learning operation (MLOps) tools and framework that data science teams manage before handing off an AI model to a DevOps team to be deployed alongside other software artifacts. The challenge now will be operationalizing that process at scale as potentially hundreds of LLMs are added to an application environment.

Each organization will need to determine how many LLMs it economically make sense to invoke, given the costs involved, but in time just about every applications is going to have some AI capability. The issue now is prioritizing those efforts based on the limited resources at hand and the immediate value that might be derived for the business.

In the meantime, however, IT leaders would be well-advised to start a review software development lifecycle (SDLC) process today on the assumption they are going to need to evolve to accommodate AI models tomorrow.