In 2023, large language models (LLMs) were the talk of the town. We saw a massive uptake in AI development, and the market became saturated with generative AI. It’s not just ChatGPT anymore — nearly all major technology companies seized the AI moment, releasing their own public language models and chatbots, from OpenAI’s GPTs to Google’s LaMDA, Meta’s LLaMA and other incarnations.
In addition to public LLMs, many SaaS and off-the-shelf LLMs have been designed for specific use cases or particular domains. To frame things in perspective, Hugging Face now catalogs over 300,000 models. The AI community has also spawned a wide array of open-source models that could, with additional training, be used to create LLMs fine-tuned around internal data.
Put simply, organizations are now spoiled for choice when it comes to integrating generative AI into their applications and development workflows. Furthermore, companies are starting to notice the rising fees associated with AI, which can demand high processing power and costly charges per request.
As such, we will probably see a greater focus on operationalizing LLMs in the year ahead. This will likely involve a more strategic approach to generative AI using leaner fit-for-purpose LLMs that are smartly routed given the task at hand. LLM security will also emerge as more of a pressing topic, given the potential privacy risks.
Spoiled For Choice With LLMs
The software development world is saturated with AI frameworks, tools and APIs, making choosing which LLM to use a bit confusing. Now, organizations must test various models if they truly want to see which performs the best for the task at hand. They must also weigh the benefits between commercial and open-source LLMs, as well as generic versus fit-for-purpose models.
“No single LLM is perfect for every use case, and bigger isn’t always better,” says Tal Lev-Ami, co-founder and CTO, Cloudinary. “In fact, while a given LLM might be considered state of the art on a given public benchmark, it might fail on the specific tasks most relevant to you.”
According to Steve Wilcockson, Product Marketing, KX, we are spoiled for choice, with the situation evolving daily. He emphasizes Mistral.AI’s models and Google’s Gemini multimodal model as gaining mindshare, following other releases like Amazon Q and OpenAI custom agents. “Navigating all of this is tough for organizations adopting tech amidst the GenAI tornado, particularly those for whom the pace of change is simply bewildering,” he adds.
This surge may warrant having domain expert leaders in the executive branch to provide guidance and evaluate various LLMs against certain benchmarks, explains Wilcockson. Choosing the right fit will also likely hinge on other factors, such as your CSP and vendor and the regulatory considerations of your location and industry. He adds that how well a model can integrate your infrastructure and differentiating datasets will be an important factor.
Although public LLMs are impressive for generic tasks, like writing email summaries or composing essays on widely documented topics, they require a lot of manual intervention when tackling more specific areas of expertise, says Eric Avigdor, VP of Product Management, Votiro. Therefore, we are seeing a healthy growth in fit-for-purpose LLMs. He points out BloombergGPT, trained on financial data, and StarCoder and DeepSeek Coder, explicitly trained on coding, as examples of fit-for-purpose LLMs making very rapid progress.
“There are notable advantages to relying on smaller, niche models for certain tasks,” says Lev-Ami. Although he foresees the potential for future consolidation in the AI market, granularity seems more in vogue currently. “The appetite for one-size-fits-all enterprise technology has largely faded, and I expect that we’ll see a clear divide between general-purpose and purpose-built LLM models for some time,” says Lev-Ami.
LLMs aren’t a silver bullet, and traditional AI techniques are still superior for specific tasks, adds Fabio Soares, engineering manager of Data Platforms at Andela. “[LLMs] are optimized to persuade you, not to give precise answers unless the right dataset was used to train it,” he says. “The decision between SLM and LLMs is a function of how practical it is to adapt pre-existing LLM services and how feasible it is to train SLM models to compete with them and still bring high-precision results.”
Consolidation around ChatGPT took years, but the recent Cambrian explosion of LLMs beyond ChatGPT took only months. “Over the last few months, the recent flurry of innovation around agents, multimodality, and Mixture of Experts models also consolidates an ecosystem towards dedicated use cases while building on top of an explosive plethora of LLM choices,” says Wilcockson. He also highlights the value of using newer retrieval augmented generation (RAG) frameworks to contextualize an enterprise’s data and work in conjunction with LLMs.
Considering The Repercussions of Rising LLM Use
Although there is much excitement around incorporating new generative AI technologies, there are some drawbacks, notably regarding energy, cost and security. As such, organizations will likely begin to audit their AI footprint and fine-tune or reduce usage to keep things leaner and avoid rising processing energy and costs.
“The repercussions are absolutely mind-blowing,” says Wilcockson, noting how the generative AI wave is likely to exacerbate the paradox of uncontrolled cloud spending. Given the high energy and water consumption for LLM interactions, he also foresees adverse environmental outcomes.
To respond, the role of AI leaders must encompass both discovering the proper architecture and preventing cost escalation. This could mean reusing hardware, like GPU-free instances. “With improved memory efficiency, faster search, and an all-around more efficient management ecosystem, significant cost and carbon savings can be made,” says Wilcockson.
But don’t expect drastic streamlining to happen overnight. “I believe we are still very far from efficient (and safe) use of AI,” says Avigdor. Like our love affair with the cloud (hint: it’s complicated), the ‘honeymoon with AI’ will likely last for a while before we harness it efficiently and safely. “My expectation is that initial massive adoption of AI will actually increase FinOps challenges, and we can expect technologies that tackle that problem to flourish,” he adds.
There are also security repercussions of using the wrong LLM, such as hallucination, copyright infringement, and IP leakage. “We’ve been working with customers who are extremely concerned about developers using the wrong LLM for the job, resulting in data leakage to public AIs,” says Michael Bargury, Co-Founder and CTO of Zenity. As such, he’s found enterprises being very cautious about what types of LLMs they use and what business data they are allowed access to.
Potential Solution: Smart LLM Routing
“Finding the right LLM or SLM will be key to get to a cost effective approach to using language models,” adds Soares. Yet, drowning in a sea of competing language models, organizations might find it challenging to discover the appropriate model for the task at hand.
“There is a lot of value in making specialized models,” says Shriyash Upadhyay, founder at Martian. But at the same time, it requires much research to understand the differences between the models, and individual development teams might be using multiple models simultaneously. “Deciding what LLM to use is a remarkably hard problem,” he said.
One interesting solution is the concept of LLM routing. This would involve an interpretability layer that accepts a prompt from a user and routes it to the correct model, thus giving the most optimized option in terms of domain accuracy, performance, and cost.
For example, Martian is an interesting startup building what Upadhyay calls the “Google for LLMs.” Using their API, developers can send a prompt along with metadata related to business constraints, like how much they are willing to pay or their latency requirements. Then, the API will automatically source the best LLM fit for the job and process the request as well.
Cherry-picking from giant libraries of open-source models could add to supply chain risks, noted Steve Wilson, CPO at Exabeam and Lead for OWASP Top 10 for Large Language Model AI Security. “But I think that with the right choices, routing requests to the ‘right’ LLM might have dramatic time to market and cost implications,” he adds.
In the future, organizations will probably utilize different LLMs matching different experiences, predicts Steve Sloan, CEO of Contentful. Furthermore, he underlines the need for an API-first approach to AI integration. Not only do we require robust APIs to access AI models, but we’ll also need API accessibility to access content systems where internal data lives.
Expect More LLMOps in 2024
Generative AI is accelerating software development and enabling powerful, innovative end-user experiences. However, as the pendulum swings back toward pragmatism, organizations will need to continually refine their AI portfolio adoption to keep it from getting out of control. Especially within a tighter tech economy, IT leaders have to ensure the business outcomes justify the investment into new AI. As such, smart operationalization of LLMs will probably become a strong focus in the year ahead.
“In 2024, digital startups and enterprises alike will move beyond the art of the possible and start building practical roadmaps for operationalizing large language models operations (LLMOps) across the enterprise,” says Kevin Cochrane, CMO of Vultr. He anticipates much of this will be led by the CISOs and CIOs, who will reshape the infrastructure stack and dictate how to govern compliance and proper use of data and AI/ML. “Enterprises will take a new approach to sourcing, securing, transferring, and ensuring governance and compliance of the large-scale datasets needed to power future AI/ML applications,” he says.
“I believe the journey here will be very similar to the migration from on-premise hardware to cloud spend,” said Soares. “It is important to keep an eye on [AI] and optimize cost for the value the enterprise is getting from the deployment.”