One of the issues that doesn’t get enough attention is that a lot of the tools used to create the code used to build artificial intelligence (AI) models can be rife with vulnerabilities that cybercriminals already know well how to exploit.
In fact, the recent disclosure of a PoisonGPT attack technique shows how vulnerable large language models (LLMs) used to build generative AI models can be. That issue is being addressed, fortunately, with the creation of AntidoteGPT, a Python library that extends safetensors from HuggingFace, which was developed by the Sigstore code signing project, using images provided by Chainguard.
Now Chainguard is making available a collection of hardened container images spanning everything from development images and workflow management tools to Milvus and Weaviate vector databases for production storage to build AI models. The goal is to provide a set of curated images that Chainguard will both regularly update and patch, should any new vulnerability be discovered, says Chainguard CEO Dan Lorenc.
Other images include Python, Conda, OpenAI and Jupyter notebook images for developing models and using the OpenAI application programming interface (API), Kubeflow images for deploying production machine learning pipelines to Kubernetes-based platforms. The images are all based on Wolfi, a distribution of Linux that hardens images by default.
The overall goal is to prevent any security issues from arising before an AI model is deployed in a production environment. That’s crucial because few of the data scientists using images to build models are familiar with application security issues. Most data science teams are not using automated testing, code review or version control tools to ensure AI models are secure, notes Lorenc. “They are skipping steps,” he says.
Unfortunately, cybercriminals are now targeting software supply chains in the hopes of injecting malware into software components that might find their way into any number of downstream applications, including AI models. Theoretically, malicious code could be embedded into an AI model to make it deliberately hallucinate, says Lorenc.
The more data science teams rush to build AI models without using hardened images that have been tested, the more likely it becomes a perfect AI security storm will inevitably occur, he adds.
Remediating vulnerabilities in an AI model after it’s already been deployed in a production environment is a time-consuming proposition. Using hardened images may add some cost to AI model development, but compared to the cost of remediating an AI model after its been deployed it’s trivial. There is no guarantee that at some point a patch will never be needed to be applied, no matter how hardened the images used are, but the number of instances where a patch might be required should be substantially reduced.
The security of AI models, alas, probably won’t get the attention deserved until there is a major breach. However, an ounce of prevention today could avert a breach that could have a catastrophic impact on an AI model being used to drive workflows, at levels of scale that were once unimaginable. The concern is that narrowing the blast reach of a breach involving an AI model might be extremely difficult, so the level of potential risk to the business is likely to be a lot higher.
The Chainguard Labs team recently published lessons learned from securing container operating systems and how these principles apply to AI/ML infrastructure.