Commvault today made available a preview of Data Rooms, a secure environment that enables data science teams to more easily access backup data that organizations have already stored.

Additionally, Commvault has made available a private preview of a conversational interface to its data protection that it is enabling via an instance of a Model Context Protocol (MCP) server it has added to its platform to make data more accessible to artificial intelligence (AI) applications and agents. Expected to be generally available in the Spring of 2026, that same MCP server also enables Commvault to centrally apply governance policies to limit access to that data as well.

Both these initiatives are part of a broader company effort to make the massive volumes of data that have already been classified relevant for data science teams, says Tim Zonca, vice president of portfolio marketing for Commvault. Rather than having to manually classify all that data themselves, data science teams can leverage the classifications that Commvault automatically applies as it backs up data, he adds. “The data classifiers are already built in,” says Zonca.

Most data science teams have been, to varying degrees, relying on extract, transform and load (ETL) tools to load massive amounts of data into AI models. However, much of that same data is likely to have already been loaded into a data protection platform. Commvault is now making a case for using those platforms to streamline data engineering workflows using a capability it plans to make generally available this winter.

Data science teams can, for example, locate and prepare data directly from backup repositories residing in the cloud or an on-premises IT environment. Data sets can then be safely shared and exported using role-based access controls (RBAC), with classification, sensitivity tagging, and audit trails automatically applied.

It’s not clear how closely data science teams are working with IT professionals who manage backup and recovery processes, but the less time they spend prepping data, the more time there should be to train AI models. One of the primary reasons that building AI models is challenging is that the amount of time spent manually aggregating and classifying data is substantial.

Of course, not every data set that might be needed is going to reside in a data protection platform, but it should be relatively simple for data science teams to inventory that data as part of any effort to streamline data engineering workflows.

Each organization will need to determine for itself just how feasible it might be to leverage existing data protection platforms to build AI models faster. The one certain thing is that many of the data engineering tasks that are today performed by data science teams could be assigned to IT teams that have a long history of working with ETL tools to move and classify data.

In fact, it’s arguable that data science teams, rather than engaging in manually moving data, should at this point already be relying on internal IT teams to manage data on their behalf. The only reason why they might not, after all, is simply because nobody may have ever thought to ask.