data

Israeli startup PVML this week emerged from stealth mode with $8 million in seed money and a platform aimed at solving a problem that is keeping many organizations from more fully adopting AI: The security and compliance risks associated with using corporate data to train and use AI models.

The two-year-old company’s platform combines a mathematical framework called differential privacy with retrieval augmented generation (RAG), a method that is being adopted within the AI field to improve the accuracy and reliability of large language models (LLMs) by bringing in more data from the outside, which in this context would be an organization’s data.

Generative AI tools like OpenAI’s ChatGPT or Google’s Gemini are useful business tools, but they become more valuable if the general datasets they’re trained on can be combined with corporate data. However, the worry is that this data – which can be highly sensitive and proprietary – can leak or be accessed by cybercriminals when used by LLMs, creating significant security risks and compliance issues.

For example, bad actors could use such data to glean more information about individuals or organizations. Differential privacy is designed to ensure that the privacy of the individual isn’t compromised when a large dataset is being analyzed, thus protecting the person’s privacy.

“Differential privacy in relation to data analysis can be informally defined using a before-and-after approach,” according to the IEEE. “That is, the analyst should not know more about any individual after analyzing data. Further, any adversary should not have too different a view of any individual after having access to a database.”

Data Access an Ongoing Issue

Rina Galperin, PVML’s co-founder and CTO, wrote in a blog post that she ran into this problem two years ago when developing a product for virtual classrooms and was asked about finding the average grade of students in a classroom.

“That’s when I first discovered the complexities of analyzing production data,” wrote Galperin, a former Microsoft engineer who co-founded PVML with her husband, Schachar Schnapp, the CEO who has a Ph.D in differential privacy. “An average exam grade could potentially be traced back to a specific student given additional information or queries. Statistics can either tell a broad story, or a very precise and targeted story, depending on the data, the questions and intent. Instead of extracting a simple average, I was only allowed to calculate approximate and very high-level figures.”

Schnapp said in a statement that the “pain point we originally wanted to address was streamlining access to data. We were motivated by our own experience, seeing how cumbersome accessing data can be even in the most sophisticated enterprises.”

PVML’s platform is designed to offer organizations a single place to manage, connect, access and monitor multiple data sources, to ensure a collaborative and secure a workspace.

Galperin wrote that the company’s solution is “a new generation of data access platforms that does not require a binary approach of either ‘see raw data’ or ‘see no data.’”

She pointed to a study by the AI Infrastructure Alliance, a consortium of AI startups, which found that 51% of enterprises have either limited or no AI adoption, with 56% pointing to security and compliance as a key barrier. “Corporations have built up a storehouse of valuable assets and they want to protect them,” the group wrote.

Democratizing Differential Privacy

A key part of PVML’s strategy is to make differential privacy data protection more widely available. The mathematical framework – which was pioneered by vendors like Google, Apple, and Microsoft – essentially adds random data into a dataset. By introducing “noise” to the dataset, those analyzing the data – or bad actors looking in – can’t pull out enough information to identify individuals.

There is a balance, though. There can’t be so much randomness in the data that its hinders an accurate analysis of it, but enough that it protects the privacy of the data.

“We help organizations get visibility on everything in one place, without moving data,” Galperin said. “PVML secures and controls permissions regardless of how the data is accessed – via SQL, BI, or API. But we thought, why stop there? We went one step further. PVML unlocks access to complicated data for non-technical users, offering a natural language interface to analyze data with AI.”

Building Out the Platform

The $8 million in seed funding was led by NFX and included FJ Labs and Gefen Capital. Gigi Levy-Weiss, co-founder and general partner of NFX, said in a statement that PVML’s platform will help more organizations embrace AI.

“In the stock market today, 70% of transactions are made by AI,” Levy-Weiss said. “That’s a taste of things to come, and organizations who adopt AI today will be a step ahead tomorrow. But companies are afraid to connect their data to AI, because they fear the exposure – and for good reasons.”

Organizations can use PVML’s cloud-based data access platform to connect to any database via a simple connection string, giving administrators a single point for monitoring and controlling data access and seeing queries, permissions, user requests and tasks. Galperin said it tracks excessive, unused and conflicting permissions to bolster security.

“We built an SQL compiler able to translate database queries to Differential Privacy queries on the fly and without the user needing to alter the query syntax,” she wrote. “This guarantees privacy and permissions enforcement for any way the users choose to interact with data, whether it be SQL, AI or through an external BI tool. PVML is in charge of live output-level data protection.”