MongoDB today adds a slew of additional capabilities to its managed Atlas service for its open source document database to support a range of new types of workloads, including large language models (LLMs) used to enable generative artificial intelligence (AI) capabilities.
At the same time, the company revealed at a MongoDB.local NYC developer conference that it is partnering with Google to enable developers to build generative AI applications based on large language models (LLMs) accessed via Google Cloud and that it has launched an AI Innovators Program to provide credits to developers building AI applications.
Sahir Azam, chief product officer for MongoDB, said the vector search capability that has been added to the MongoDB databases is at the core of any effort to build generative AI applications using LLMs that are constructed using vectors that data scientists create. Rather than using a dedicated database to construct those vectors, MongoDB is making a case for using a single polymorphic database to build LLMs using the same platform organizations already use it manage a range of other types of data.
In addition to supporting vector search, MongoDB today also announced support for stream processing and time-series data as part of an effort to make MongoDB a polymorphic database platform. At the same time, MongoDB added support for MongoDB Atlas Search Nodes to provide dedicated resources with search workloads at enterprise scale and a MongoDB Atlas Data Federation for querying data and isolating workloads on the Microsoft Azure Cloud in addition to existing support for Amazon Web Services (AWS) clouds.
MongoDB also made generally available a PyMongoArrow library to simplify data analysis using the Python programming language and MongoDB Relational Manager, a migration tool for converting relational databases into the MongoDB database format.
Finally, MongoDB extended the infrastructure-as-code (IaC) tools it provides to provision databases to add support for additional programming languages in addition to making it possible to build server-side applications with Kotlin, a programming language created by Google, and streamlined capabilities for the MongoDB Atlas Kubernetes Operator.
While nearly every organization is at the very least exploring AI, the models created are only as reliable as the data sources used to train them. The reason for a general-purpose AI platform such as ChatGPT is that it was trained using data collected from public sources that often conflict with one another or are outright wrong. Organizations looking to apply generative AI to their own business processes need to be certain the data used to create those models is accurate. Otherwise, the results generated are not going to consistently stand up to any level of scrutiny that might be applied by the end users of those applications.
The challenge then becomes how best to aggregate all the data organizations collected to train AI models. Today most of that data is stored in disparate databases that are managed in isolation. MongoDB is making a case for unifying the management of multiple types of data in a single database that eliminates the need for data engineering teams to do as much work to bring together data in a way that data scientists teams can use to build a reliable AI model.
MongoDB, of course, is not the only provider of a database platform making a similar case for unifying the management of data. The issue organizations face now is which one to standardize as part of an effort to hopefully reduce the total cost of building AI applications.