Free Software Easily Strips Guardrails from Meta, Google’s Open AI Models: Report

Widely available, free software tools can strip away safety protections from open-source artificial intelligence (AI) models in less than 10 minutes.

A joint investigation by the Financial Times and AI safety research group Alice has revealed that modified versions of premier models from Meta Platforms Inc. and Google are being used to generate dangerous content, including blueprints for biological weapons, malicious cyber-tools, and child exploitation material.

The findings sent shockwaves through the tech sector and regulatory bodies, intensifying debate over the inherent risks of open-source AI. Unlike proprietary “closed” systems like OpenAI’s ChatGPT or Anthropic’s Claude — where the inner workings are kept strictly under lock and key — open models allow global developers to download and alter the underlying code. While this open access has rapidly accelerated commercial innovation, it has simultaneously created a critical vulnerability.

At the center of the controversy is a technical process known as “abliteration,” which systematically deletes the ethical and legal boundaries hardcoded into AI systems. Researchers utilized a tool called Heretic, hosted on the popular developer repository GitHub, to completely neutralize the safety protocols of Meta’s Llama 3.3 and Google’s Gemma models.

Once guardrails were removed, the systems readily answered highly dangerous prompts that the original versions were designed to block.

According to the report, Google’s Gemma provided explicit instructions on how to disperse lethal chlorine gas inside a crowded room and generated functional malware to steal credit card information.

Meanwhile, Meta’s Llama 3.3 calculated the precise lethal dose of ricin per kilogram of body mass required to kill a human being.

Heretic creator Philipp Emanuel Weidmann revealed that his software has already been used to spawn more than 3,500 “decensored” AI models, which have accumulated 13 million downloads since last year. Weidmann boasted that he managed to strip the safeguards from Google’s Gemma model within 90 minutes of its public release.

“The genie is out of the bottle,” warned Alice CEO Noam Schwartz. “Things that look like sci-fi are no longer sci-fi, and we need as a society to prepare accordingly.”

The sheer scale of the downloads underscores a nightmare scenario for global policymakers attempting to govern AI at the point of development. Once a model is cloned and modified offline, original tech developers lose all oversight and enforcement capabilities.

Furthermore, expert consensus suggests current mitigation strategies are falling short.

While OpenAI has experimented with training models on data pre-scrubbed of harmful material, academic experts remain skeptical. Kawin Ethayarajh, an assistant professor of applied AI at the University of Chicago’s Booth Business School, warned that removing dangerous data could simply make models naive and unable to detect when they are being manipulated for malicious intent.

“Whereas historically it might have taken a more informed and persistent actor (to strip out safety features), nowadays it’s much easier for the average person,” Ethayarajh told the FT.

As frontier AI systems display increasingly sophisticated capabilities such as identifying critical vulnerabilities across major operating systems, the ease with which these systems can be unaligned raises urgent questions. The conversation is now rapidly shifting from a purely technical debate into a broader crisis concerning enterprise liability, corporate accountability, and national security.

“This investigation highlights a hard truth about open-weight AI: safety controls are not durable if they can be stripped after release,” said Mitch Ashley, vice president and practice lead of Software Lifecycle Engineering at The Futurum Group. “Once models are modified and redistributed, the risk becomes a model supply chain problem, not just a model development problem. Enterprises need provenance, derivative-model governance, runtime monitoring, and abuse controls. Guardrails matter, but the real test is whether safety survives real-world modification, deployment, and misuse.”

Free Software Easily Strips Guardrails from Meta, Google’s Open AI Models: Report

SHARE THIS STORY

FOLLOW US

Free Software Easily Strips Guardrails from Meta, Google’s Open AI Models: Report

TECHSTRONG AI PODCAST

SHARE THIS STORY

RELATED STORIES:

FOLLOW US

NEWSLETTER SIGN UP