NetApp Makes Tracing Data Lineage of AI Models Simpler

As more artificial intelligence (AI) models are deployed in production environments there is a growing need to be able to track the provenance of these models back to the data that was used to train them. NetApp, to address that specific issue, added an ability to trace multiple versions of AI models back to the data that was used to train them to its portfolio of data management offerings.

Announced at the NetApp INSIGHT 2023 conference, this capability will provide greater transparency into how AI models are built across a hybrid cloud computing environment, says Ronen Schwartz, senior vice president and general manager for cloud storage at NetApp.

At the same time it is also moving to integrate NetApp Volumes storage management software running on Google Cloud with the Vertex AI platform Google makes available, in addition to adding an AFF C-Series platform to its existing ONTAP AI converged infrastructure system lineup based on NVIDIA DGX processors that have been integrated with Flash memory storage via an NVMe backplane.

Generative AI requires organizations to address a variety of data management challenges because the models are trained using unstructured data, notes Schwartz. In addition, organizations are also loading massive amounts of unstructured data into some type of vector database to customize large language models (LLMs), he adds. “Organizations want to be able to augment the LLM using their own data,” says Schwartz.

Much of that data today resides in on-premises environments that IT teams need to find ways to securely expose to LLMs that might reside either in the same on-premises IT environment or in the cloud, notes Schwartz. In fact, a recent global survey of 1,000 C-level, technology and data executives commissioned by NetApp finds 98% of respondents reporting three-quarters of their workloads still run in on-premises IT environments.

The survey also finds nearly three quarters (72%) of respondents work for organizations that are already using some type of generative AI platform or service, with 74% leveraging public cloud AI and analytics services. Nearly two-thirds for respondents (63%) also noted AI budgets are the result of additional funding rather than reallocated budgets, and 65% expecting to engage new vendors as AI usage expands.

Ultimately, the building and deployment of AI models will require access to massive amounts of data spanning hybrid cloud computing environments. It’s not clear to what degree the rise of AI will force organizations to revisit their data management strategies but as AI models become more pervasively deployed data engineering, machine learning operations (MLOps), DevOps and security operations (SecOps) will increasingly converge. It’s not clear how long it might take to achieve that goal, but the NetApp survey identifies data security (57%), data integration (50%), and talent scarcity (45%) persist as barriers.

At this point, there’s little doubt that AI will transform the way IT is managed. The only thing that remains to be seen is the degree of friction that will occur as IT organizations move to eliminate silos that, in many cases, have persisted for longer than many of the members of those organizations have been alive.

NetApp Makes Tracing Data Lineage of AI Models Simpler

TECHSTRONG TV

Tech Field Day Events

TECHSTRONG AI PODCAST

SHARE THIS STORY

FOLLOW US

NetApp Makes Tracing Data Lineage of AI Models Simpler

TECHSTRONG TV

Tech Field Day Events

TECHSTRONG AI PODCAST

SHARE THIS STORY

RELATED STORIES:

FOLLOW US

NEWSLETTER SIGN UP