
The Open Search Foundation this week released an update that, in addition to improving vector performance, adds support for the Model Context Protocol (MCP) and an application programming interface (API) for artificial intelligence (AI) agents.
Announced at an Open Software Summit North America event, OpenSearch 3.1 also adds an ability to accelerate graphical processor unit (GPU) performance using a revamped index and integrations between the ML Commons benchmarking toolset for AI models with the OpenSearch metrics framework.
There is also a suite of tools for improving the quality of search results, a redesigned workflow detail page that simplifies configuration and a new workflow template for semantic search using sparse encoders to simplify deployment of search-by-text and rank results by semantic similarity.
Security has also been improved with an update to a plug-in that the Open Search Foundation provides. In addition to making a user object immutable, the amount of performance overhead previously created has been significantly reduced. There is also now available experimental functionality designed to improve the overall security posture of OpenSearch deployments.
Finally, support for OnDisk compression and integration with open source OpenTelemetry agent software for monitoring IT environments has also been added.
First launched in 2021 as a fork of the Elasticsearch and Kibana projects, OpenSearch was donated last year by Amazon Web Services (AWS) to The Linux Foundation, which then set up the OpenSearch Software Foundation. OpenSearch today under an Apache 2.0 license now includes a suite of tools to ingest, search, visualize and analyze data. It includes a data store and search engine, a set of dashboards and a server-side data collector, dubbed Data Prepper. There are now hundreds of contributors to projects that now span more than 90 repositories.
Mukul Karnik, general manager for OpenSearch at AWS, noted that while there are many use cases for OpenSearch, usage of the platform has expanded greatly mainly because data science teams use these tools to collect massive amounts of data from across the web that is then used to train large language models (LLMs).
It’s not clear how many organizations have now either deployed a search engine or are using a service to collect data to train LLMs. The one thing that is certain is as more data is collected the need to index it becomes more pressing. In fact, search engines now play a crucial data management role in the AI era.
There are, of course, multiple open source and proprietary search engine options and, depending on the use case, different data science teams within the same organization might have opted for different options. The challenge is that prior to the rise of AI, the number of organizations that had any experience deploying and maintaining a search engine was limited, which means many are now relying on some type of managed service.
Regardless of how a search engine is employed, it’s only one piece of a larger data engineering puzzle that teams building AI applications need to master.