AI news

There is no shortage of large language models (LLMs) for building generative artificial intelligence (AI) applications these days, so the challenge has quickly become determining which ones to use to automate specific tasks. Predibase today is making available a collection of more than 25 open source LLMs that have been fine-tuned for different use cases.

In addition, each LLM has been optimized to outperform GPT 4.0, the LLM that is at the heart of the latest iterations of the ChatGPT service provided by OpenAI.

Predibase is a provider of a platform that makes it possible to run multiple LLMs on the same graphical processor unit (GPU) to reduce GenAI infrastructure costs. Using a framework dubbed LoRAX, Predibase makes it possible to run hundreds of models on the same GPU using caching techniques and an ability to route storage requests in real time. The company also developed a Serverless Fine-tuned Endpoints capability to streamline the code used to launch a prompt.

Now the company is adding LoRA Land, a repository that houses the open source LLMs it has fined tuned to, for example, analyze sentiment or provide summarizations.

Given the demand for GPUs they are, naturally, expensive to buy and difficult to find. It can take nearly a year for some classes of GPUs to be acquired. Making matters worse, each fine-tuned LLM without LoRAX requires its own dedicated set of GPU resources. Predibase breaks that innovation bottleneck by making it to both fine tune and deploy AI models on GPUs, says Predibase CEO Dev Rishi.

It also eliminates the need for a cold GPU to spin up before prompt which makes it faster to test and iterate models, he noted.

In effect, Predibase provides a means to share GPUs across models in a way that is similar to how hypervisors have been used to enable multiple applications to run on the same server platform. Each IT team will need to decide for itself how many LLMs to run on a GPU but as more LLMs are developed for automating specific tasks they are shrinking in size. IT teams will then need to orchestrate those LLMs to apply generative AI across an end-to-end business process.

While it’s clear that data science teams are using best machine learning operations (MLOps) practices to build LLMs, the management of the deployment process is still evolving. “A lot of teams are still trying to figure that out,” says Rishi. In many instances, however, the deployment of AI models are increasingly being handled by DevOps teams. In effect, AI models that need access to inference engines running on either a GPU or other type of processor architecture are just another type of software artifact that those teams manage. The issue now is making sure there are enough infrastructure resources available to optimally run those AI models.

On way or another, it’s now only a matter of time before AI models are pervasively deployed across entire enterprise IT environments. The only thing that remains to be seen is exactly how that goal will be achieved.