AI News

Solo.io today added a gateway that centralizes access to large language models (LLMs) that are being invoked using multiple application programming interfaces.

The Gloo AI Gateway from Solo.io is based on an existing API gateway on open source Envoy proxy software that Solo.io has now extended to add capabilities such as the ability to create API keys to map one or more LLM provider secrets and then ensuring those keys are securely stored.

The overall goal is to first make it simpler for developers to access LLMs using APIs in a way that enables organizations to enforce policies, says Keith Babo, head of product for Solo.io.

For example, rather than having to write the same boilerplate code over again each time an LLM is invoked, developers instead can simply route queries through the Gloo AI Gateway.

That centralized approach then makes it possible to not only apply governance controls; an IT team is also provided with a means to audit how prompts are being created and used to ensure, for example, no one is deliberately trying to misuse an LLM and apply prompt guards to ensure sensitive data isn’t revealed.

AWS

In addition, IT teams get visibility into how LLMs are being consumed to better control costs that are typically based on the number of tokens used to input data and generate outputs, notes Babo. “It tracks which clients are invoking which LLMs,” he says.

That’s critical because as organizations look to operationalize LLMs, they will quickly discover that the cost of the tokens that are used to track inputs and outputs can quickly add up, notes Babo.

Finally, Gloo AI Gateway also provides a centralized method for managing retrieval augmentation generation (RAG) workflows through which data science teams customize LLMs by exposing them to external data, he adds. Providing that capability makes it simpler for IT teams to apply RAG techniques to minimize hallucinations by, for example, ensuring that local data stored in a vector database is searched before using the data that an LLM has been trained on at some earlier date that is not likely to be as current or accurate, notes Babo.

It’s only a matter of time before most organizations will be regularly invoking not just multiple LLMs, but also smaller language models that have been trained using data from a specific domain such as finance or health care. A gateway presents the opportunity to streamline and govern all the API calls that are being made to these LLMs in a way that can be more easily managed and governed via a centralized IT team.

It’s not clear yet to what degree LLMs are going to be managed by data science and data engineering teams versus an existing IT operations team that is already using best DevOps practices to programmatically build and deploy applications using APIs. However, the one thing that is certain is that the rate at which API calls are being made to AI models is from here on out only going to exponentially increase.

TECHSTRONG TV

Click full-screen to enable volume control
Watch latest episodes and shows

AI Data Infrastructure Field Day

TECHSTRONG AI PODCAST

SHARE THIS STORY