Enterprise AI agents are getting smarter at building things. But teaching them how your specific organization works? That’s still been largely a manual effort — loading documents, writing custom instructions, or standing up a data science team to run the training cycles yourself.

Microsoft is taking a different approach. The company has introduced closed-loop learning for agents connected to the Power Apps MCP server, starting with the platform’s data entry tool. The idea is simple: When a user corrects an agent’s output, that correction gets stored in memory. The next time a similar task comes up, the agent applies what it learned. Over time, those individual corrections roll up into broader patterns that improve accuracy across the entire organization.

No data pipelines to build. No manual configuration. The feedback loop runs automatically in production.

From Correction to Habit

Here’s a practical example. A finance team has an agent processing vendor invoices. The agent correctly reads “UK” from a supplier document, but the organization’s records require “United Kingdom.” A user fixes it in the Agent feed. After a handful of similar corrections, the agent stops making that mistake. It also generalizes — “USA” becomes “United States of America,” “DE” becomes “Germany.” The principle, not just the specific case, becomes the agent’s default behavior.

Microsoft describes this as two complementary layers working together.

The first is memory-based optimization. User corrections are stored as structured memories that the agent retrieves during inference. It’s immediate recall — the agent sees a new invoice, pulls the relevant memories, and applies the same processing steps it used before.

The second layer is Genetic-Pareto optimization, an evolutionary prompt-optimization technique implemented via GEPA, an open-source project integrated with DSPy (Stanford’s framework for programmatic LLM optimization). Where memory-based optimization recalls specific corrections, Genetic-Pareto goes further — it distills those memories into generalized rules that get compiled directly into the agent’s instructions. The principle becomes the default, not just a pattern the agent has to look up.

Both layers are live today for the data entry tool.

Not the Same as AI Personalization

It’s worth being clear about what this isn’t. Most AI assistants have some form of memory that personalizes the experience for an individual user. Closed-loop learning is different — it improves task accuracy for everyone using the organization’s agent. The learning stays scoped to your business process and governed by your tenant.

The distinction matters for enterprise deployments. One team’s corrections benefit the whole organization, not just the person who made them.

Microsoft’s closed-loop learning on the Power Apps MCP server is where the agent control plane competition is heading. The MCP server is becoming the layer where organizations capture institutional knowledge, validate prompt improvements, and govern what agents learn, all scoped to the tenant,” according to Mitch Ashley, VP and practice lead, software lifecycle engineering, The Futurum Group.

“For buyers, this shifts the evaluation criteria for any MCP-connected platform. Connection alone no longer differentiates. Procurement teams must assess where learning data lives, how candidate prompts get validated, and which vendor owns the governance layer that decides what becomes default agent behavior.​​​

Real-World Results

Microsoft tested closed-loop learning against a real enterprise dataset — invoices processed by the UK Electoral Commission, which handles thousands of invoices annually from suppliers across multiple countries. Each invoice requires structured extraction of supplier name, address, country, and total expenditure, with conventions that vary by supplier and source document.

Across 4,277 field instances, closed-loop learning reduced the share of fields users had to manually edit from 64% to 48% — that’s 1,045 fewer fields requiring a human correction. In 10 independent test runs, Genetic-Pareto optimization improved F1 extraction accuracy from 66.4% to 74.6%, an 8.2 percentage-point lift over the baseline.

Country accuracy alone jumped from 11% to 78% after Genetic-Pareto learned to expand abbreviated country names — a clean example of what happens when a specific correction maps to a generalized principle.

The results also showed gains in address formatting (correctly splitting town and postcode fields and handling regional variations), total expenditure extraction (using gross invoice totals rather than partial amounts), and supplier name accuracy (pulling the legal entity name rather than brand tags or billing sections).

How the Loop Actually Closes

What makes this different from typical prompt optimization is that the entire cycle runs automatically within your tenant. The system generates a candidate prompt, runs a shadow experiment — the current prompt handles the user-facing result while the same input is scored in parallel against the candidate — and uses statistical validation to decide whether the candidate is measurably better. When it clears the threshold, it automatically becomes the new baseline.

That’s the loop. The agent acts. A user corrects. The system learns. The next action improves. It’s the kind of cycle a data science team would normally run manually, and it’s now built into the platform.

What Comes Next

Microsoft says closed-loop learning will extend to more agentic workflows on the Power Apps MCP server in the coming weeks. For now, organizations can get started by adding the data entry tool to an agent on the Power Apps MCP server.

The gap between a capable agent and a useful one has always been institutional knowledge — the specific conventions, formats, and preferences that vary from one organization to the next. Closed-loop learning is Microsoft’s answer to closing that gap over time, without requiring significant upfront effort to teach the agent everything it needs to know.