Microsoft is releasing a red teaming tool to give security teams and machine learning engineers the automation capabilities they need to proactively search for security risks in their generative AI systems.
The tech giant’s automation framework, PyRIT (Python Risk Identification Toolkit for generative AI), has been tested and used for almost two years by Microsoft’s AI Red Team, probing generative AI systems for myriad risks and adding features as the need arose.
Now the company, which has been building out its AI red teaming capabilities since 2019, is making PyRIT available in GitHub, giving others in the industry access to the framework to augment the manual red teaming by automating the myriad tedious tasks involved in the practice.
This comes amid rapid acceleration of both the innovation and adoption of generative AI software in the wake of OpenAI’s launch of its ChatGPT chatbot in late November 2022. In addition, President Biden’s Executive Order in October 2023 for ensuring secure AI development and use points to red teaming as an important tool.
Microsoft and other tech companies for several years have developed AI red team capabilities, including with the vendor’s release in 2021 of Counterfit, a red team tool for traditional machine learning systems. Google last year introduced its AI Red Team, just weeks after unveiling its Secure AI Framework (SAIF).
Different Tact Needed for Generative AI
However, Microsoft found that but generative AI presents particular challenges that aren’t found classical AI systems or traditional software that call for more scalability and flexibility in how such probing for risks is done, according to Ram Shanker Siva Kumar, Microsoft’s AI Red Team lead.
“We found that for generative AI applications, Counterfit did not meet our needs, as the underlying principles and the threat surface had changed,” Kumar wrote in a blog post. “Because of this, we re-imagined how to help security professionals to red team AI systems in the generative AI paradigm and our new toolkit was born.”
In particular, red teaming generative AI software involves identifying both security risk and responsible AI risks. Probing tradition software or classical AI systems primarily focuses on detecting security failures.
“Responsible AI risks, like security risks, can vary widely, ranging from generating content that includes fairness issues to producing ungrounded or inaccurate content,” he wrote. “AI red teaming needs to explore the potential risk space of security and responsible AI failures simultaneously.”
In addition, generative AI red teaming is more probabilistic. Running the same attack path multiple times on traditional software likely will deliver similar results. However, with generative AI systems, the same input can return different outputs, he wrote. That could be due to the app-specific logic, small variations in the input – which tends to be language – can deliver different outputs, and the orchestrator controlling the output can interact with different extensibility or plugins.
“Unlike traditional software systems with well-defined APIs and parameters that can be examined using tools during red teaming, we learned that generative AI systems require a strategy that considers the probabilistic nature of their underlying elements,” he wrote.
There also is a broad range of architectures for generative AI systems, all of which need to be addressed.
“These three differences make a triple threat for manual red team probing,” Kumar wrote. “To surface just one type of risk (say, generating violent content) in one modality of the application (say, a chat interface on browser), red teams need to try different strategies multiple times to gather evidence of potential failures. Doing this manually for all types of harms, across all modalities across different strategies, can be exceedingly tedious and slow.”
PyRIT Augments Manual Red Teams
Though PyRIT is highly automated, security teams remain in control of the strategy and execution.
“PyRIT provides the automation code to take the initial dataset of harmful prompts provided by the security professional, then uses the LLM [large-language model] endpoint to generate more harmful prompts,” he wrote. “However, PyRIT is more than a prompt generation tool; it changes its tactics based on the response from the generative AI system and generates the next input to the generative AI system. This automation continues until the security professional’s intended goal is achieved.”
Balazs Greksza, threat response lead at cybersecurity company Ontinue, told Techstrong.ai “red teaming GenAI systems is a novel area requiring both traditional application security testing and red teaming of the system components and genAI-specific testing to protect the intellectual property, user base and integrity of the systems, for example training data contamination.”
Greksza added that all this “requires multifaceted test cases using techniques like prompt injection attacks, trying to take control over the infrastructure, initiate model thefts, test for sensitive information disclosure, and resilience against denial-of-service [attacks].”
Onus Is on the Tech Industry
Patrick Harr, CEO of cybersecurity firm SlashNext, said the combination of the rapid development of AI and limited regulation, it’s up to the tech industry to develop the tools and processes to protect AI systems.
“With their new framework for red teaming generative AI systems, it’s great to see Microsoft taking the lead,” Harr said. “Using AI-based security tools to fight AI cyberattacks is an effective way to stop these threats.”
Projects like PyRIT that offer tools that security pros can use is key to the collaboration that is needed in cybersecurity, according to Nicole Carignan, vice president of strategic cyber AI at security company Darktrace, adding that such automated tools help red teams understand vulnerable points within an AI system.
“These are often the connection points between data and ML models, including access points, APIs, and interfaces,” Carignan said.” It will be important for this to be continuously expanded on as threat actors develop new techniques, tactics, and procedures (TTPs) and will be crucial to test other ML model types in addition to generative AI.”
Red teaming is a good start, but other considerations need to be included when securing AI systems, she said, including data storage security, data privacy enforcement controls, data and model access controls, AI interaction with security policies, and technology to detect and respond to policy violations.