Signaling a significant strategic turn, Microsoft Corp. on Wednesday unveiled three foundational artificial intelligence (AI) models built entirely in-house, marking the company’s first definitive step toward independence from its primary partner, OpenAI.

The new suite of MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 targets high-value commercial sectors of speech-to-text, voice synthesis, and image generation. The release represents the first fruits from Microsoft’s superintelligence team formed six months ago under the leadership of Mustafa Suleyman to pursue what he calls “AI self-sufficiency.”

The flagship release, MAI-Transcribe-1, reportedly sets a new industry standard. According to Microsoft, the model achieved a 3.8% Word Error Rate (WER) on the FLEURS benchmark, surpassing OpenAI’s Whisper-large-v3 and Google’s Gemini 3.1 Flash across 25 languages.

Suleyman emphasized the gains weren’t just about accuracy, but economic efficiency. “We’re able to deliver the model with half the GPUs of the state-of-the-art competition,” Suleyman told news site VentureBeat, noting the audio and image teams each number fewer than 10 people. This lean approach challenges the industry’s “brute force” scaling norms and aims to improve Microsoft’s margins as infrastructure costs skyrocket, he said.

The launch follows a critical, previously undisclosed shift in Microsoft’s legal relationship with OpenAI. Until October 2025, Microsoft was contractually barred from independently pursuing artificial general intelligence (AGI). Following a renegotiation sparked by OpenAI’s move to seek compute outside of Microsoft’s Azure, Microsoft is now free to build its own frontier models while retaining licensing rights to OpenAI’s technology through 2032.

While Suleyman insisted to VentureBeat that the OpenAI partnership remains “phenomenal,” the subtext is clear. Microsoft is hedging its bets. By developing its own clean lineage of models with strictly licensed data, the company is positioning itself to offer enterprise clients lower legal risks and lower costs.

The announcement comes at a volatile time for the $3 trillion giant. Microsoft recently closed its worst fiscal quarter since 2008, with investors demanding proof that massive AI spending will yield a return on investment (ROI).

To answer this pressure, Microsoft is pricing its new models aggressively: MAI-Voice-1 is $22 per 1 million characters; and MAI-Image-2, $5 per 1 million input tokens.

“We’re pricing them to be the cheapest of any of the hyperscalers,” Suleyman said, specifically citing Amazon.com Inc. and Google as targets.

The current models are specialized for audio and images, but Microsoft intends to compete across all modalities, including large language models (LLMs). With CEO Satya Nadella’s backing and a multi-year roadmap for GPU clusters, the goal is to ensure Microsoft is never beholden to a single provider.

Whether this humanist AI approach and lean-team philosophy can match the reasoning capabilities of GPT-5 or Gemini remains to be seen, but Wednesday’s launch confirms that the era of Microsoft acting solely as a “distribution layer” for AI is over.