AI agent

Bloomberg and Tetrate have made available a stable version of an application programming interface (API) gateway specifically designed to make it simpler to invoke multiple artificial intelligence (AI) models.

Tetrate CEO Varun Talwar said version 0.1 of the open source Envoy AI Gateway project will democratize access at a time when organizations are increasingly invoking multiple AI models to build a range of applications.

Developed under the auspices of the Cloud Native Computing Foundation (CNCF), the gateway is based on an open source Envoy Gateway project for Kubernetes clusters. That gateway is based on open source Envoy proxy software that is also overseen by the CNCF.

The overall goal is to provide developers of AI applications with a unified API through which they can invoke services from, for example, OpenAI or Amazon Web Services (AWS), noted Talwar.

Additionally, IT organizations can take advantage of API rate limiting capabilities based on word tokens to contain costs.

Finally, the Envoy AI Gateway also provides single sign-on (SSO) capabilities to improve cybersecurity.

Going forward, engineers working on the project plan to add support for the Google Gemini 2.0 service along with prompt templates, fallback capabilities to ensure availability, and semantic caching to further reduce costs when invoking remote AI services.

The Envoy AI Gateway project is the brainchild of Dan Sun, engineering team lead for Bloomberg’s Cloud Native Compute Services. He is also co-founder/maintainer of the KServe project, an AI platform built on Envoy and Kubernetes. Bloomberg is using Envoy AI Gateway to centrally manage access to AI service in a way that in addition to setting limits and quotas, also provides developers with a consistent experience.

It’s not clear to what degree organizations are standardizing on one set of AI services versus another, but given the pace of innovation, the need to be able to switch AI services when needed is clear.

Additionally, the cost of using AI tokens can escalate quickly, which creates a need for organizations to be able to apply governance policies to ensure AI costs don’t suddenly spiral out of control.

Ultimately, organizations will find themselves using a raft of AI models of varying sizes to drive these applications. Not every AI application, for example, needs to directly access a passive large language model (LLM). In fact, in most use cases, the smaller an AI model is, the more accurate it tends to be given the limited amount of data required to train it.

At the same time, as organizations begin to discover that AI models, as they are exposed to additional data, will tend to drift beyond the scope of their original tasks. IT teams will be able to leverage APIs to swap out AI models in a way that doesn’t require them to rewrite an entire application.

Of course, it’s still early days so far as the operationalization of AI models is concerned but the one thing that is certain is that defining the best practices for managing them is still very much a work in progress.

TECHSTRONG TV

Click full-screen to enable volume control
Watch latest episodes and shows

Networking Field Day

TECHSTRONG AI PODCAST

SHARE THIS STORY