Walmart is way beyond agentic experimentation, and has found a formula for safely maintaining fleets of AI agents it is currently rolling out.
The company’s blueprint is simple: “Stable interfaces, deterministic workflows, and pre-execution checks,” said Jake Mannix, Walmart U.S. Tech technical fellow, in an e-mail interview with Techstrong.AI.
In short, the probabilistic behavior of agents requires new forms of IT management.
The retail giant has dived headlong into using agents. The company has devoted a set of tool developers to opening data access and creating services for agents. But a much wider group of users within the company – analysts, merchandisers, financial operations – can build agents to support their own duties.
Agents bring users an unprecedented degree of autonomy in getting things done, but they also introduce a fair amount of risk, in that they aren’t always predictable, especially en masse.
“At that point, scale isn’t just about the number of use cases. It’s about what happens when those agents and tools start interacting,” Mannix said.
Agents Running Amok
Walmart has found that agents can, in some circumstances, create unintended side effects across the entire global system. Cascading failures can happen frequently, where small misunderstandings are amplified across iterations, leading to mistakes with large blast radii.
“At scale, everything becomes multi-agent by default. Agents interact whether intended or not, and local decisions accumulate into system-wide effects,” Mannix said.
A single-agent system is not a meaningful unit of measurement for the company. Yet most quality, security, and compliance evaluations that have been created by the industry are designed for single agents, not multi-agent interactions.
When interactions are unstructured, determinism is lost. Operational debt accumulates. System evaluation becomes impossible.
Traditional monitoring assumes you can reconstruct an incident after the fact. This isn’t the case with multi-agent systems, especially where agents are granted a fair amount of autonomy. “Manual review and shared context don’t scale in systems that are probabilistic, distributed, and constantly changing,” Mannix said.
Walmart required that agent actions be verified, which would require strong protocols. “Without strong protocols, it becomes harder to trace, harder to govern, and harder to trust,” Mannix said.
Declared Intent and Runtime Enforcement
Planners concluded that agent management would require articulating the intent of the agent, and separating it from the actual execution code.
What was needed was a registry of governance rules that could be reasoned against. Governance has to be built into the system itself. Every interaction must be machine-readable so it can be enforced.
Governance defines what an agent is allowed to do and what it can promise end-users. These rules are spelled out on agent cards, which can be thought of as contracts with the agent, providing schemas, data classifications, provenance, and capabilities. What sensitive information can an agent access? What access does an agent have to external systems?
Agent cards define inputs and outputs, data constraints, approvals, and identity access (though don’t tie these requirements to a specific implementation).
With Walmart, the agents still rely on old-school deterministic applications such as Business Process Automation (BPA) software. Such apps “provide determinism in an increasingly hard-to-predict system,” he said. So agents will need instructions and guidelines to work with these apps as well.
At build time, agents are checked with linters to ensure they meet required conditions, and that they won’t execute potentially unsafe actions.
“That shifts observability from something reactive to something designed into the system from the start,” Mannix said.
The orchestration layer, which is basically a data plane, ensures consistent behavior of the agent. “At that point, agent cards stop being documentation and start acting as contracts, and registries become sources of truth,” Mannix said.
Mannix did not divulge how many agents Walmart currently has in production, but you can read more details about the company’s approach in a blog post he posted this week.
“At scale, mechanical enforcement is the only way to balance autonomy with safety without centralizing control,” he wrote.

