OpenClaw’s security issues are well-documented. Over 40,000 exposed instances are leaking API keys. More than 800 malicious skills on ClawHub. A one-click remote code execution exploit. An agent that published a hit piece on a developer who rejected its pull request.
Security researcher Niels Provos watched the chaos and asked: What would a personal AI agent look like if security were taken seriously from the start?
His answer, launched February 26, is IronCurtain — an open-source AI assistant that sandboxes agent code, compiles plain-English policy into enforceable rules, and keeps credentials out of the agent’s reach. It’s a research prototype, not a consumer product. But the architecture addresses problems that every AI agent framework currently gets wrong.
The Chokepoint
Most agent frameworks give the AI broad system access and hope nothing goes wrong. IronCurtain takes the opposite approach. Every action the agent takes funnels through a single trusted process that acts as an MCP proxy. That proxy holds the policy engine and makes one of three decisions: allow, deny, or escalate to the human.
The architecture supports two sandbox modes. Code Mode runs the LLM’s TypeScript in a V8 isolate with no filesystem, network, or environment access. Docker Mode puts a full autonomous agent — including Claude Code — inside a container with no network access. The agent gets its own shell and filesystem but has exactly two ways out: a Unix socket to the MCP proxy and another to a TLS-terminating proxy for LLM API requests.
Credential separation falls out naturally. In Docker Mode, the container receives a fake API key that passes format validation but does nothing. The proxy intercepts outbound requests, replaces the fake key with the real one, and forwards them upstream. The real key never enters the container.
Policy in Plain English
Writing a security policy is hard. The languages are difficult, edge cases multiply, and most people give up and open everything. That’s how most frameworks end up with all-or-nothing permissions.
IronCurtain borrows an idea from a Microsoft Research paper on privacy compliance at Bing. You write a “constitution” for your agent in plain English — no DSL, no YAML, no regex. Something like: “The agent may read and write files in the project directory. It must ask me before pushing to any remote. Never delete permanently.”
IronCurtain compiles that into deterministic rules enforced on every MCP call. Unlike guardrails that rely on the LLM itself, the policy engine operates entirely outside the model. Security requires determinism, and that has to live outside the model.
An optional auto-approver handles cases where the user’s explicit instructions already cover the action. If you say “push my changes to origin,” it approves the git push without interrupting you. Vague instructions always escalate to a human. Every decision is logged.
According to Mitch Ashley, VP and practice lead for software lifecycle engineering at The Futurum Group, “IronCurtain demonstrates a divergence in agent security architecture. Most frameworks rely on the model to self-govern, placing the enforcement burden on AI and then on the user. IronCurtain moves enforcement outside the model to a single deterministic point, separating reactive permission fatigue from proactive containment. The credential separation and plain-English policy model addresses the governance gap driving today’s breaches. Deterministic enforcement outside the model is verifiable by auditors, and the bar agent deployment necessitates transparency and trust.”
Why This Matters Now
OpenClaw’s security crisis has forced a reckoning about how AI agents handle permissions. Cybersecurity researcher Dino Dai Zovi, who tested early IronCurtain versions, identified the core problem: most agents put all the burden on the user to approve each action. Users tune out, click “yes” to everything, and eventually skip permissions entirely.
IronCurtain flips that model. Capabilities like deleting files can be placed entirely outside the agent’s reach — it can’t do it, regardless of what it’s told or what prompt injection it encounters.
The project currently supports 14 filesystem tools, 28 git tools, web fetching, web search, and Signal integration for end-to-end encrypted task management. Claude Code integration is working, but still has rough edges.
The Honest Caveat
Provos is transparent about limits. The project’s homepage includes a disclosure that essentially says: if someone claims something is secure, you should mistrust it.
Prompt injection remains unsolved. LLMs drift from instructions over multi-turn conversations even without adversarial input. IronCurtain can’t prevent either. What it can do is contain the blast radius. When the policy engine denies an action, it returns the constitutional reason — a corrective signal that re-anchors the model toward the original intent rather than just blocking the request.
The name comes from the theater, where an iron curtain is a fireproof barrier between the stage and audience. The agent performs on stage. Your files, credentials, and systems are in the audience.
What to Watch
IronCurtain won’t replace OpenClaw’s 200,000-star ecosystem overnight. But the architecture — a single enforcement chokepoint, deterministic policy from natural language, credential separation by design, sandbox isolation as default — addresses the exact gaps making headlines.
The question is whether security-first design can deliver high enough utility to compete. Provos is betting that security and usability reinforce each other. If the constitution model works at scale, it could become the standard pattern for constraining AI agents in production — not through permission fatigue, but through policy that’s easy to write, deterministic to enforce, and transparent to audit.
Install with `npx @provos/ironcurtain`. Code at github.com/provos/ironcurtain.

