IBM‘s Watson was perhaps one of the first times that AI went mainstream. Its release proved AI’s ability to boost productivity, but its aptitude as a security savior is still questionable over a decade later, since Watson’s launch.
Generative AI, AI-assisted coding, large language models (LLMs) and more are here to stay, thanks to their impressive speed, accuracy and convenience. Most organizations use SCA, SAST and DAST as critical AppSec testing methods. AI and ML automate manual security testing and auditing, helping reduce pressures in light of the global skills shortage. Intelligent tools can scan code rapidly, provide remediation and detect threats.
While these processes ensure the security of applications, AI also opens up a new can of worms. Research by Invicti found that ChatGPT creates open source projects to imitate non-existent libraries, meaning that developers could import malicious code into their applications without knowing it. Only 40% of businesses understand their third-party risks entirely, and adding AI to the mix further muddies the waters where there should be transparency. That’s where SCA steps in.
Suddenly, Everyone’s a Coding Expert
Most recently, ChatGPT recreated Watson’s cultural success and took the public by storm. Its ability to generate and auto-complete code has made everyone a coding expert. While this is good from an accessibility and productivity point of view, it risks creating a culture of blind AI adoption without considering the security implications. Technically, you can paste your code into ChatGPT and say, “Tell me what’s wrong with it.” But would you dare?
ML models like this are only as good as the data they’re trained on, which could include security vulnerabilities and poor coding practices. As long as you can teach AI, like ChatGPT, to generate prompts, it will also recreate malicious content and be vulnerable to embedded code in the prompts.
Yet, AI will become an invaluable cybersecurity defense tool if we train models on a specialized and well-curated dataset and teach AI to treat data securely. Attacks are becoming increasingly sophisticated, from realistic spear phishing emails generated by natural language processing to malware that can mutate in real-time. Attacks like these might avoid detection (if they don’t already), encouraging hackers to continue using automation to trick users and systems. Is it true that only automation can keep up with automation?
Is AI Moderation a Job for SCA?
A recent report by Synopsys found that 96% of modern codebases contain open source software, and the adoption of software composition analysis (SCA) tools coincides with its meteoric rise. SCA automates vulnerability detection in software or across the software supply chain, providing visibility into all components, license requirements and dependencies. But whether AI is open source or not is yet to be determined. So, is it really SCA’s concern?
It all comes down to the evergreen conversation surrounding responsible AI usage. If models are open source (like the first iteration of Google’s BERT), there is potential for misuse and malicious training data. If they’re completely closed (e.g., Google’s original LaMDA), it is impossible to evaluate the training data.
Either way, developers are rushing to use AI to capture the productivity and speed benefits, and companies like Meta and Microsoft – that recently bought OpenAI – are hurrying to get new AI innovations out of the door. Smaller models are often built using foundational data from these head honcho companies, creating a power dynamic that, interestingly, seems to have no hierarchy at all; no one is stepping up to take responsibility for deployment, risk mitigation, standardization and ethics.
Does that mean individual companies must deploy SCA tools and other application security best practices to protect themselves, since the model creators won’t do it for you?
AI Will Trigger an SCA Evolution
It’s not a case of ‘if’ AI will innovate SCA; it’s a case of ‘how’. The world is still coming to terms with AI’s exploitable behaviors, and SCA can be used to mitigate AI-related security risks.
SCA flags code that is someone else’s intellectual property, which is a minefield. Ownership becomes a sticking point with some AI models – who owns the training data? Who owns AI-generated code? How does this affect license compliance? Besides, generative AI is fed proprietary data through prompts, and there’s no way of knowing where this data is headed next or if the prompts have good intentions.
AI-generated code is an entirely new attack vector that has a domino effect on a series of vulnerabilities, such as cross-site scripting (XSS) attacks. It’s easy for bad actors to inject malicious queries, revealing sensitive information and data and making the AI act in dangerous ways. From a social engineering perspective, it’s almost impossible for everyday people to distinguish real communications from fakes, as anyone can make realistic and misleading content.
Third, how does AI-generated code affect software bill of materials (SBOM) creation? Existing SCA tools identify and manage software licenses, providing an SBOM the ability to demonstrate secure development practices and processes. Moving forward, SCA tools will likely need to adapt to new license compliance requirements, in line with increasing AI practices.