Microsoft Corp. on Monday launched a multi-model Critique feature for its Copilot Researcher agent, allowing OpenAI’s GPT and Anthropic’s Claude models to work simultaneously to verify data and reduce artificial intelligence (AI) hallucinations.

Under the new workflow, GPT drafts initial responses while Claude reviews the output for accuracy and citation quality, a collaborative approach Microsoft claims improved its research benchmark scores by 13.8% over standalone rival tools from Google and Perplexity.

The tech giant also expanded access to Copilot Cowork, an agentic AI tool designed to delegate complex, multi-step tasks across the Microsoft 365 ecosystem without constant human prompting.

Currently available through Microsoft’s Frontier early-access program, the tool acts as an orchestrator for long-running workflows such as budget reviews and calendar management as Microsoft races to boost adoption among its 450 million commercial users amid intensifying competition in the enterprise AI market.

The centerpiece of the update is a new feature called Critique within the Copilot Researcher agent. In an unusual blending of competing technologies, Microsoft will now utilize both OpenAI’s GPT and Anthropic’s Claude models for a single query. In the initial rollout, GPT drafts a response while Claude acts as a reviewer, checking the output for accuracy, completeness, and citation quality before it reaches the user.

Microsoft executives say this “model council” approach is designed to tackle the industry-wide challenge of AI hallucinations.

“Having various models from different vendors in Copilot is attractive, but we’re taking this to the next level where customers get the benefits of the models working together,” said Nicole Herskowitz, corporate vice president of Microsoft 365 and Copilot.

According to Jared Spataro, Microsoft’s chief marketing officer for AI at Work, this collaborative layering has already yielded results. The company reported a 13.8% improvement on the DRACO benchmark — a key industry measure for deep research quality — outperforming standalone tools from Google, Perplexity, and even the individual models from OpenAI and Anthropic.

Beyond research, Microsoft is pivoting toward agentic AI with the expanded rollout of Copilot Cowork. Unlike previous iterations of Copilot that focused on generative tasks like summarizing emails, Cowork is designed for delegation. It can manage “long-running, multi-step tasks” such as conducting a monthly budget review across Excel, Teams, and SharePoint without requiring a human to prompt every individual step.

“It’s about taking real action,” said Barton Warner, senior vice president at Capital Group Companies Inc., an early adopter. “It’s connecting steps, coordinating tasks, and following through across everyday workflows.”

The system operates under the Work IQ framework, which grounds the AI in an organization’s specific data while maintaining security protocols. While the agent operates autonomously to create and execute plans, Microsoft emphasized that “human-in-the-loop” oversight remains central, allowing users to monitor progress and steer the agent if it veers off track.

The aggressive rollout comes as Microsoft faces mounting pressure to convert its massive user base into paying AI subscribers. While the company reported 15 million paid Copilot seats in January, that represents only about 3.3% of its 450 million commercial Microsoft 365 users.

By integrating Anthropic’s Claude — specifically building on the viral Claude Cowork technology — Microsoft is signaling a more pragmatic, platform-agnostic strategy to fend off competition from Google’s Gemini and OpenAI’s own enterprise offerings. The new features are currently available to members of Microsoft’s Frontier program, which serves as an early-access testing ground for enterprise customers.