From RAG to Riches: Why Retrieval-Augmented Generation is Key

While large language models have been around for decades, the release of ChatGPT and other generative AI applications put LLMs in the limelight and rightfully so. LLMs have become the backbone of some of the most innovative applications for language translation, sentiment analysis, chatbots, virtual assistants, copilots, content and code generation apps and more.

Their ability to generate detailed, creative responses to queries in natural language and code has sparked a wave of excitement that made ChatGPT the fastest-adopted consumer application in history, according to UBS research. But not in enterprises. Why? Generative AI models have significant limitations due to hallucinations and limited context. They struggle with fact-based use cases for question/answering and they can only address public knowledge – they know nothing of company-specific information. For generative AI applications to be viable for enterprise use, they must deliver information that is accurate and traceable in a secure and scalable manner.

2023 was a year of experimenting with different approaches to help GenAI in these areas. Four techniques seemed plausible, and soon two approaches emerged with promise. One is fine-tuning: Showing an LLM new material to tailor it to a specific domain or specialized area. While fine-tuning has its uses (such as improving its language in a particular domain) it became clear that the best solution for domain-specific knowledge is to feed it that knowledge using search. Pairing enterprise search with LLMs in this way is an approach called Retrieval-Augmented Generation (RAG). The search system does the retrieval (R), the search results augment (A) the prompt and the GenAI generates (G) the response. In 2024 RAG will become one of the most used techniques in the domain of large language models.

Let’s dig a little deeper into RAG, why it’s so important and what to keep an eye on for the next wave of innovation.

What is RAG and Why all the Attention?

Despite the funny name, RAG simply means doing a search before engaging the LLM:

Use search to find (retrieve) the best information available (R)
Ask the GenAI to answer your question using that information (A)
The GenAI generates an answer (G)

Understanding how LLMs work and their limitations puts this in perspective.

LLMs are trained on terabytes of data, creating a model of the patterns of language. This means LLMs know which words go together, and how they go together, making them great at handling language. They work great for reformulating text (e.g., translation, summarizing) and for non-factual creative tasks (e.g., poems) and for well-established common knowledge (e.g. a high school history essay) for example. But they don’t have any true knowledge about the real world. That is why they hallucinate – a hallucination is a situation where the words fit acceptable language patterns, but are factually incorrect.

That means that LLMs are not suited for situations where accuracy and credibility are needed, and they are almost useless for topics outside of the public domain. LLMs are trained only on public content, and know nothing about internal corporate information. In short, LLMs suffer from four common problems:

“Hallucinations” – Confidently presenting fabricated misinformation
Opaqueness – No ability to give or cite sources
Obsolescence – Lack of timeliness and no knowledge of events since the model was initially trained
Ignorance – No knowledge of internal corporate information

RAG solves these problems by first retrieving factual information from within your business, and asking the LLM to answer using that information (rather than the material it was trained on). The answer is based on facts pulled from your corporate content, ensuring accuracy and presenting them in the context of your business. Better still, with RAG, the LLM can point directly to the source material that it was given, providing full transparency and traceability to the original source. This gives you full confidence, with no uncertainty.

Search + GenAI = Better Together

Search and GenAI are amazingly complementary. Search provides the facts, traceability and up-to-date information that GenAI can’t, so that it can be used in the enterprise. GenAI reads and interprets the results from a search, and writes a plain-language answer in real-time, which eliminates the need for a human to click and read, click and read, etc. through a bunch of different search results.

It would be simple if we only needed to search documents or very structured data within our company. But knowledge is everywhere – in – documents, emails, presentations, Sharepoint, Slack discussions, SOPs and spread internally and externally. It’s very hard to find all of the relevant content across all of these sources and unify it. It’s also essential to maintain focus; the more information sent to the LLM, the weaker the answer. LLMs are capable of “reading” lots of content, but the more they read, the more diluted their responses become, the more likely they are to get distracted, and the more likely they are to leave out important facts.

RAG solves both of these problems. First, it looks across all your enterprise systems and knowledge stores, all of the digital conversations between your employees, all your manuals and procedures and all your specialized applications, and gets the most relevant information wherever it lives. Second, it doesn’t overwhelm the LLM with a dozen full-length documents; the search focuses on the parts of those documents that matter. This ensures both a comprehensive and a targeted answer – with full traceability to the sources used for that answer.

The Importance of Security

LLMs have no concept of security; all of the information they were trained on is incorporated into the model, and accessible to anyone using the LLM. RAG is secure, and gives you complete control over the two kinds of security concerns: internal and external.

A robust enterprise search system ensures internal security by automatically honoring all of your existing permissions. No one sees anything they’re not supposed to see, because the results match the permissions of the employee. Those results are then sent to the GenAI and its response is based only on those results.

For external security, there are two options. The most common is to send the search results to a commercial LLM (like OpenAI’s GPT-4 or Anthropic’s Claude or Google’s Gemini). These commercial models will not train on that data, so it’s safe from other companies that are using those same models. But if you have extra-sensitive content that can’t be given to these providers, you can host an open-source LLM (like Mistral’s Mixtral or Meta’s LLaMa-2) yourself.

Caution: Inadequate Search = Disappointing RAG

With all of the excitement for LLMs, it’s easy to miss how important the R is in RAG. The LLM bases its response on the information that it’s given, so the traditional “garbage in, garbage out” mantra applies. It’s crucial to have search that is relevant and comprehensive so that the LLM is given the best information to work with. Unless you only have a few documents from a single source, don’t assume that searching with an open source vector database is sufficient. It might work for a small-scale PoC but may fail dramatically in production. At a minimum you need a multi-source, hybrid search platform (with vector and keyword retrieval) that honors source system security.

The advent of LLMs and with it, generative AI has ushered in a new era of technological innovation, but generative AI has several shortcomings that prevents its use in most enterprise applications. In 2023, pairing search with GenAI in a technique called RAG emerged as the solution to these challenges, mitigating weaknesses and opening up a broad range of opportunities to use generative AI in fact-based scenarios within businesses.

The promise of generative AI to revolutionize enterprise applications through RAG is immense, giving employees a superhuman assistant so they can leverage all corporate knowledge simply by having a conversation. Enterprises that swiftly adopt and deploy robust RAG-powered assistants will have an edge over companies that don’t, harnessing the potential of GenAI to drive innovation, enhance productivity and maintain a competitive edge in the evolving digital economy. The best RAG solutions require not just a capable GenAI but also a robust search capability, so picking the right search platform is key.

About the author:

Jeff Evernham is vice president of product strategy at enterprise search provider, Sinequa. His 30-year career spans data analytics consulting, professional services, sales, and engineering roles at multiple software and management consulting firms. He holds a Master of Engineering degree from MIT.

https://www.linkedin.com/in/jeffevernham/

TECHSTRONG TV

Click full-screen to enable volume control

Watch latest episodes and shows

From RAG to Riches: Why Retrieval-Augmented Generation is Key

What is RAG and Why all the Attention?

Search + GenAI = Better Together

The Importance of Security

Caution: Inadequate Search = Disappointing RAG

SHARE THIS STORY

FOLLOW US

From RAG to Riches: Why Retrieval-Augmented Generation is Key

What is RAG and Why all the Attention?

Search + GenAI = Better Together

The Importance of Security

Caution: Inadequate Search = Disappointing RAG

TECHSTRONG TV

Tech Field Day Events

TECHSTRONG AI PODCAST

SHARE THIS STORY

RELATED STORIES:

FOLLOW US

NEWSLETTER SIGN UP