Artificial intelligence (AI) requires a strong data management foundation that many organizations unfortunately lack. Cohesity, via a partnership with Amazon Web Services (AWS), announced today it is providing early access to an existing data management platform that is now integrated with the managed Amazon Bedrock service for invoking foundational generative AI models.
The Cohesity Turing platform provides a framework for securely managing data for AI models that is based on a Cohesity Data Cloud platform already used by many enterprise IT organizations to manage data. Previously, Cohesity integrated that framework with the Vertex AI models provided by Google.
That approach streamlines the Retrieval Augmented Generation (RAG) processes that many organizations are using today to extend the capabilities of large language models (LLMs) by exposing them to additional data. The Cohesity Turing platform enables IT teams to achieve that goal without having to expose sensitive data to an LLM model that might then incorporate that data into results generated for anyone who has access.
That approach also enables IT teams to leverage a Cohesity Data Cloud platform that is often used to protect data, notes Greg Statton, a member of the office for the CTO for Cohesity. IT teams will be able to use multiple versions of data sets that have, over time, been backed to provide LLMs with more context, says Statton. “They can get more value from that existing data,” he says.
There are, of course, multiple RAG methodologies that organizations can employ, but at this point it’s apparent most will be using this type of approach to extend the capabilities of an existing LLM rather than customizing or building their own. In fact, one of the benefits of RAG is that a talented DevOps or data engineering professional can manage the process versus having to rely solely on data scientists to manage every aspect of AI.
It’s not clear to what degree organizations are embracing RAG, but as they move to operationalize AI it’s only a matter of time before best data engineering practices become better defined. Cohesity, in effect, is baking a set of best RAG processes into its platform to reduce the overall effort required to extend an AI model in a way that enables organizations to accelerate adoption, notes Statton.
Ultimately, the data operations (DataOps) will need to converge with machine learning operations (MLOps) and developer operations (DevOps) to operationalize AI in a way that safely enables organizations to build, deploy and update AI models. As more AI regulations are passed, organizations will also need to be able to document those workflows to pass audits that at this point are all but inevitable – given current concerns over AI transparency.
In the meantime, IT teams will need to decide to what degree they may want to build their own data management platforms that they then need to maintain, versus leveraging one provided by a vendor. Regardless of the approach, the pressure to better organize data in a way that enables AI models to be built faster is mounting.