Generative AI

Artificial intelligence (AI) is going to be needed to manage the data being used to train AI models. As IT organizations continue to experiment with AI models, they are discovering that they now need to manage the data more proactively being used to train them. The AI models are only as reliable as the data used to train them. The issue is that enterprise IT organizations might have a petabyte or more of data that is often conflicting. In absence of any capability to effectively manage that data, all the effort put into building an AI model that might hallucinate could wind up being more trouble than it’s worth.

In fact, many organizations may need to spend the bulk of 2024 getting their data management house in order before they can customize or build AI models. The fastest way to achieve that goal is to apply AI to managing all the data needed train AI models.

For example, Informatica has expanded its alliance with Amazon Web Services (AWS) to now provide integrations with the managed Amazon Bedrock service for accessing large language models (LLMs) that are at the core of generative AI applications. The overall goal is to create an ecosystem for training and extending AI models from different providers using enterprise data that revolves around the Informatica data management platform that itself makes use of an AI model dubbed CLAIRE to manage data more efficiently, says Rik Tamm-Daniels, group vice president of strategic ecosystems and technology for Informatica.

Achieving that goal requires providers of LLMs to more tightly integrate their offerings to enable enterprise IT organizations to use, for example, retrieval-augmented generation to enable an LLM to consume data created after it was created. “It’s about working together,” says Tamm-Daniels.

Each enterprise IT organization will need to navigate the nuances of those integration efforts, but it’s now only a matter of time before a wide range of data management platforms make it possible to streamline the training AI models for specific use cases in a way that doesn’t result in proprietary data inadvertently becoming publicly accessible. The challenge, as always, is making sure the controls and policies are in place to prevent such mistakes from happening.

It’s not clear how long it will take enterprise IT organizations to operationalize AI models at scale, but given all the interest in generative AI, it’s now more a question of when rather than if. The pace at which any organization will achieve that goal is likely to be dictated by how well their data is currently being managed.

In the meantime, the rise of generative AI does present IT leaders with an opportunity to address what are often long-standing data management challenges. Business leaders may not like to hear that investments in data management need to be made to realize their AI dreams, but in many ways an issue that has long been ignored by many organizations is finally about to get its full due.