
A global survey of 280 enterprise leaders published this week suggests that while 91% are comfortable with using sensitive data to train artificial intelligence (AI) models, more than three quarters (78%) are also concerned that data might be stolen or that the AI model itself might be breached.
Conducted by Perforce Software, the survey finds 60% of respondents admit their organization has experienced a data breach or theft in non-production environments used for software development, AI and analytics applications, with a full 84% allowing data compliance exceptions in non-production environments despite a rising number of incidents.
On the plus side, 86% of respondents plan to invest in AI data privacy technologies over the next one to two years, the survey finds.
The challenge is that many of the existing guardrails that are in place are not very robust, says Steve Karam, a principal product manager for Perforce. As a result, it’s probable that sensitive data being shared with an AI model is likely to be used to train the next iteration of that model. As a consequence, there is a good chance that data might one day manifest in an AI output in a way that not only results in reputational damage but also violates various compliance mandates.
Ideally, data governance policies need to be baked into the machine learning operations (MLOps) workflows that are being used by AI models to ensure sensitive data is not used to train an AI model, says Karam. “There needs to be data anonymity for training,” he says.
In reality, however, it’s still too easy for personally identifiable information (PII) data to be used when training AI models, notes Karam.
More challenging still, the fact that AI models contain sensitive data turns them into a honeypot that attracts malicious actors that are becoming more adept at using prompt injection techniques to expose that sensitive data, adds Karam.
Of course, it can be difficult to convince senior business leaders to invest in the tools needed to ensure the appropriate guardrails are in place when there have only been a limited number of instances where sensitive data has been exposed by an AI model. Nevertheless, it’s more a question of when these types of data breaches will become more common as the number of AI models being deployed in production environments continues to expand.
In the meantime, AI is shining a light on just how poorly sensitive data is being managed today. Many end users are already exposing sensitive data to LLMs in ways that IT and security teams can’t easily discover. Worse yet, many of them are not checking the licensing agreement to opt out of any provision that might allow the provider of an AI service to use that data for training purposes.
Ultimately, each organization will need to revisit their data management and governance policies in the AI era because the one thing that is certain is any data that can be accessed by an AI model or agent will be in ways that no one is likely to appreciate.