Software engineering has for the past two decades been defined by explicit instructions. Whether writing code or writing tests, the paradigm was to tell the machine exactly what to do, step-by-step. That model worked because the bottleneck in software delivery was human execution speed, and explicit instructions gave machines the precision they needed to assist without overstepping. But agentic AI has changed the nature of the bottleneck entirely, and the instruction-based architecture built around the legacy models is now the very thing slowing teams down.

The shift reverberating across enterprise engineering organizations is not merely a tooling upgrade. It is a fundamental reorientation of how software systems are told what to do, moving from detailed step-by-step instructions toward high-level intent combined with engineered context that makes that intent executable. While most organizations have the data they need to make this shift, they lack the infrastructure to make that data usable by an agent operating autonomously.

“People spent years optimizing for scripts,” says Mayank Bhola, Co-Founder and Head of Product at TestMu AI, an AI-native software testing platform formerly known as LambdaTest. “They built massive libraries of rigid instructions, like click this button, wait five seconds, check this div. But that era is ending.” Bhola argues that the rise of agentic AI is forcing a fundamental shift from instruction-based architectures to intent-based ones, where you do not tell the AI how to do a task but why, providing the necessary guardrails through context rather than through scripts. But while organizations have plenty of data, most lack the context engineering required to make that intent executable and secure.

Intent Without Context Is Dangerous

The promise of enterprise AI is an agent that can reason autonomously. You ask it to deploy a microservice or test a checkout flow, and it figures out the steps. That promise is real, but it depends entirely on the quality of the context the agent is operating within, and most organizations have not yet built the infrastructure to provide that context reliably.

“The biggest mistake leaders make is assuming the model knows what you know,” Bhola explains. “An agent might know how to write Java, but it doesn’t know your architecture’s unspoken rules. It doesn’t know that your checkout service has a dependency on a legacy inventory system that fails if you hit it too hard.” This is the context gap, and when an agent tries to execute intent without sufficient context, it either hallucinates or executes destructive actions that a human reviewer would have caught immediately. “We see this in automated testing constantly,” Bhola notes. “If you tell an AI to test the login, but you don’t provide the context of which environment or what test data to use, it will either fail or start creating junk users in your production database. Intent without context is dangerous.”

The context gap is not just a performance problem. It is a trust problem, and until organizations build the infrastructure to close it, every new agent they deploy will require more human supervision rather than less, which defeats the purpose of deploying agents in the first place.

Ahmed Zaidi, CEO of Accelirate, a company that builds automated software and security testing solutions, approaches the same failure mode from the security side and identifies a more consequential version of the same problem. Model drift, prompt injection, and poisoned training data are finding their way into the software supply chain, and development teams integrating AI agents into their workflows frequently do not account for the possibility that those agents could be compromised at the model level before executing a single test case. “With AI, attackers have gained so much power that they cannot be easily detected with the naked eye,” Zaidi explains. “We are no longer testing code, we are testing behavior, which cannot be validated with a static checklist.”

The context gap Bhola identifies and the security exposure Zaidi describes are two dimensions of the same underlying problem. An agent operating without a structured context layer is both less accurate and more vulnerable, because the same absence of boundaries that allows it to wander into the wrong database also allows an attacker to redirect its behavior without triggering any obvious alarm.

Context Is an Infrastructure Problem Not a Prompt Problem

Treating context as a prompt engineering problem is why most organizations have not made meaningful progress on it. Prompt engineering is a developer-level concern. Context engineering is an infrastructure concern, and the teams that have made the shift are the ones that have started building, versioning, and securing their context the same way they build, version, and secure their code.

“We are moving from a world of network hops to context hops,” says Bhola, drawing a parallel to how TestMu AI optimized their HyperExecute grid. “In traditional infrastructure, latency killed speed. In AI infrastructure, noise kills intelligence.” For enterprise leaders, Bhola outlines two architectural mandates that make context engineering operational rather than theoretical. The first is flattening the knowledge graph, because most enterprise knowledge is deeply nested in Confluence pages, PDF specifications, and ticket threads that an agent cannot navigate efficiently. “You need to flatten your data models,” he advises. “Make your business logic retrieval-ready, so when an agent asks how we handle refunds, it gets a single definitive truth, not a link to a wiki.” The second is building a context control plane that decides what context an agent receives, restricting access not just for security reasons but for accuracy reasons. “Does the CI/CD agent need access to the billing database? Probably not. By restricting context, you don’t just improve security. You improve accuracy by reducing the noise.”

Flattening the knowledge graph and building a context control plane are not one-time implementation decisions. They are ongoing disciplines that require ownership, governance, and the same kind of maintenance investment that organizations apply to their API infrastructure and their access control policies.

Zaidi frames the same principle in operational terms that security teams will recognize immediately. “In security, logs without normalization are useless noise,” he explains. “In AI agent testing, context without structure is the same problem. Flattened test data, precise retrieval, and cached reasoning paths are not optimizations. They are controls.” His recommended approach treats context management as a SIEM for testing LLMs, where agents retain what worked, discard what did not, and never reprocess the same mistake twice. The practical starting point is conducting a context inventory to identify what data agents can currently access, then establishing tiered context budgets where critical processes receive only the minimal context needed to validate specific boundaries, preventing agents from wandering into production databases or legacy code repositories that have nothing to do with the task at hand.

Security Has to Be Built Into the Context Layer

Bhola frequently cites recent global IT outages as a warning sign for the AI era. “The world broke because of a single update,” he reflects. “Now imagine agents pushing updates autonomously at 100 times that velocity.” The warning is not hypothetical. In 2024, a single faulty software update from a cybersecurity vendor caused an estimated 8.5 million Windows devices to crash simultaneously, grounding flights, taking hospital systems offline, and disrupting financial institutions across multiple continents. That incident was caused by a single human-initiated update that bypassed adequate validation. Agentic systems operating autonomously introduce the same failure mode at a scale that human oversight processes were never designed to monitor in real time. In an intent-based system, security cannot be a final checklist added after the architecture is already built. It must be embedded in the context itself, enforced at the layer where the agent receives its instructions rather than at the layer where its outputs are reviewed.

“We are building intelligent layers that sit between the agent and the execution,” Bhola explains. “Before an agent’s code is committed, or before a test is run, it must pass through a context filter that checks for security compliance. It’s not just about whether the code syntax is correct. It’s about whether this aligns with our security intent.” Building that filter before the agents go to production rather than retrofitting it after the first incident is the organizational decision that separates teams that trust their agentic systems from teams that are still manually supervising every autonomous action.

Zaidi extends this argument into the proactive threat detection dimension that most organizations have not yet built. Development teams that are still reacting to incidents after the fact are operating with a security model designed for a world where attacks moved at human speed. “It serves companies well to have AI systems in place that tell them an attack is coming,” he explains, because AI-driven analysis can detect insecure behavior during the early testing stage of the pipeline rather than after it has reached production. The speed of automated test generation creates an illusion of comprehensive coverage while the foundational security decisions in the test architecture remain unexamined, and organizations that have invested in proactive prevention architecture have significantly cut incident response times because the AI provides threat signals rather than incident reports.

Proactive threat detection and reactive incident response are not alternatives. They are sequential layers, and the teams operating at scale have both, with proactive detection reducing the volume and severity of incidents that the reactive layer has to handle.

Building the Foundation For High-Reliability

Bhola views context engineering as an unblocking mechanism for organizations that have stalled on AI adoption because they do not yet trust their agents enough to let them operate with genuine autonomy. “My job is to unblock teams,” he says. “Right now, the lack of trusted context is the biggest blocker to AI adoption. Teams are afraid to let agents run free because they don’t trust them.” By treating context as a rigorous engineering discipline, structuring it, versioning it, and securing it, enterprises can move from micromanaging agents with scripts to empowering them with intent.

Zaidi frames the same opportunity in terms of what the architecture unlocks at scale. Organizations that build rigorous context validation, audit trails, and oversight mechanisms into their agentic systems from the beginning will move faster in the long run because they will not have to retrofit those controls after a security incident forces their hand. The teams investing in this architecture now are the ones that will operate at a speed and scale their competitors cannot match, not because they removed human judgment from the loop but because they built the infrastructure that makes human judgment trustworthy at machine speed.

“The future isn’t about writing better prompts,” Bhola concludes. “It’s about building a better foundation so the prompts actually work.”