AI News

Qdrant today added a BM42 search algorithm to its vector database that promises to surface more accurate results using retrieval-augmented generation (RAG) techniques that will not only run faster but also reduce processing costs.

The BM42 search algorithm replaces a widely used BM25 algorithm for launching hybrid searches across vector databases and large language models (LLMs) that was developed decades ago, says Qdrant CTO Andrey Vasnetsov.

The overall goal is to provide a more efficient vector search alternative that is optimized for RAG versus relying on a keyword algorithm invented long before modern generative artificial intelligence (AI) models were created, he added.

Additionally, BM42 also provides a new way of classifying search results in a way that integrates sparse and dense vectors to accurately pinpoint relevant information within a document. A sparse vector handles exact term matching. Dense vectors handle semantic relevance and deep meaning. Collectively, these capabilities reduce the amount of data that needs to be exposed to an LLM. That approach not only produces more accurate costs; it also reduces the number of input and output tokens that an organization pays for anytime they prompt an LLM service.

That capability will also make it easier for data science teams to move from a proof-of-concept (PoC) into a production environment as the amount of data being processed continues to scale, notes Vasnetsov.

There is an ongoing debate over the degree to which organizations should rely on a purpose-built vector database to implement RAG versus extending a legacy database by adding support for vectors. The issue with that latter approach is that as AI models evolve there will be a need to regularly update legacy databases that were never designed for AI models that need to continuously evolve, says Vasnetsov. “Databases are not easy to update,” he adds.

In general, responsibility for vector databases is increasingly shifting to data engineers that are part of a machine learning operations (MLOps) team. The challenge those teams will encounter as they look to operationalize AI at scale will only increase in complexity as a wide range of LLMs are customized using data that today is often strewn across the enterprise. As the number of RAG use cases expands, the need to process data efficiently will only become that much more pressing, noted Vasnetsov.

It’s not clear to what degree organizations have mastered RAG as a methodology for customizing LLMs, but it’s apparent some type of vector database capability will prove essential. The issue now is determining how best to achieve that goal in a way that generates reliable outputs in a way that doesn’t break the IT budget. Many organizations are already limiting the number of AI projects they are willing to initially launch because of budget concerns. Optimizing the amount of data that needs to be exposed to an LLM at any given time should help organizations optimize AI spending.

In the meantime, there is now a greater appreciation for data management than ever. The challenge is converting all that interest into a set of well-defined best practices that makes it possible to consistently operationalize LLMs at scale.

TECHSTRONG TV

Click full-screen to enable volume control
Watch latest episodes and shows

Next Gen HPE ProLiant Compute Deep Dive

TECHSTRONG AI PODCAST

SHARE THIS STORY