Anthropic on Wednesday dropped a startling internal safety assessment: It admitted its most advanced artificial intelligence (AI) model to date, Claude Opus 4.6, demonstrated an “elevated susceptibility” to assisting with heinous activities, including the development of chemical weapons.

The company’s Sabotage Risk Report arrives just days after Anthropic’s AI safety lead, Mrinank Sharma, resigned with a stark public warning. Sharma claimed the world is “in peril” and admitted that internal corporate values often struggle to govern actual business actions.

While Anthropic maintains that the overall risk posed by Opus 4.6 is “very low,” the specific behaviors documented in testing rattled the industry. When integrated with graphical user interfaces (GUIs), the AI reportedly provided minor but actionable support for criminal planning and chemical weaponry.

The report details several agentic red flags where the AI took unauthorized initiative. In multi-agent environments, Opus 4.6 was more likely than previous versions to manipulate or deceive other participants to achieve narrow goals. In coding environments, the model occasionally bypassed human oversight, even sending unauthorized emails to complete tasks. And, during internal tests, the AI attempted to harvest login credentials, a behavior Anthropic described as “aggressive.”

The company was quick to clarify that it found no evidence of “coherent misaligned goals,” meaning the AI isn’t harboring a secret, persistent “desire” to sabotage humanity.

Instead, the risks appear to stem from the model becoming “single-mindedly” focused on an objective to the point of ignoring ethical constraints, according to Anthropic.

Daisy McGregor, Anthropic’s UK policy chief, acknowledged the gravity of the findings. “This is obviously massively concerning,” McGregor said, acknowledging the urgent need for “alignment research” to ensure that as AI becomes more autonomous, it remains tethered to human safety.

The revelation comes as Anthropic CEO Dario Amodei takes a more vocal role in Washington. This week, Amodei met with lawmakers, including Sen. Elizabeth Warren, D-Mass., to discuss AI safety and the potential for “major attacks” that could result in mass casualties.

During an interview at the World Economic Forum in Davos, Amodei and Google DeepMind CEO Demis Hassabis suggested that AI companies should scale back competition in favor of a collaborative focus on safety.

However, the industry remains divided. While some see these reports as essential transparency, others argue that doomsday rhetoric is a strategic move by AI giants to invite regulations that would stifle smaller competitors. With the Future of Life Institute launching an $8 million ad campaign urging regulators to “protect what’s human,” the tension between rapid innovation and existential safety has reached a fever pitch.

AI model capability is advancing faster than the governance structures meant to contain it, said Mitch Ashley, vice president and practice lead, Software Lifecycle Engineering, at The Futurum Group. He said when companies publish safety findings showing their own systems can meaningfully assist in chemical weapons development or serious criminal activity, “that should not be viewed as transparency alone. It is evidence that frontier capability is outpacing enforceable safeguards.”

“We are operating in a climate that prioritizes competitive growth and market acceleration while hesitating on clear, even limited, regulatory frameworks,” Ashley said. “One concern is misaligned regulation that either slows innovation in the wrong places or leaves critical risk surfaces exposed. But, waiting for a large-scale incident to force legislative clarity is a very risky and reactive posture. Voluntary self-reporting, however responsible, does not eliminate liability, public harm, or systemic risk. As AI systems move from advisory roles into autonomous execution, safety cannot remain a theoretical policy discussion.”

In its report, Anthropic concluded that while Opus 4.6 is currently safe for use in coding and data generation, future jumps in reasoning capabilities could “invalidate” their current safety conclusions. For now, the “smartest” model on the market remains its own most significant safety experiment.