OpenAI Reclaims Benchmark Lead with GPT-5.4 Release

OpenAI entered a new phase in the artificial intelligence (AI) arms race with the launch of GPT-5.4, a versatile jack-of-all-trades model designed to consolidate the company’s lead in reasoning, coding, and autonomous computer navigation.

Arriving on the heels of GPT-5.3 Instant, the new model effectively succeeds the larger GPT-5.2. It is now available across ChatGPT, OpenAI’s API, and the Codex programming environment. A premium Pro tier has also been introduced, promising “maximum performance” for high-stakes enterprise workloads, according to OpenAI.

The standout feature of GPT-5.4 is its dominance in knowledge work. On the GDPval benchmark, which evaluates AI performance across 44 professions including accounting, sales, and engineering, GPT-5.4 outperformed or matched human professionals in 83% of tasks. This represents a significant leap from GPT-5.2’s 71% success rate, suggesting the model is rapidly closing the gap in complex, industry-specific reasoning.

While competitors like Anthropic introduced computer use capabilities as early as late 2024, GPT-5.4 marks OpenAI’s first general model to natively navigate operating systems.

By interpreting screenshots and executing mouse and keyboard commands, the model achieved a 75% success rate on the OSWorld-Verified benchmark. This score not only eclipses the 47.3% seen in previous versions but also surpasses the human baseline of 72.4%, edging out Anthropic’s Claude 4.6.

For developers, the update introduces tool search within the API. Rather than processing every available tool definition simultaneously, the model now retrieves specific tools only when needed. Early testing across Model Context Protocol (MCP) servers shows this just-in-time retrieval reduces token usage — and by extension, costs — by 47% without sacrificing accuracy.

GPT-5.4 “excels at creating long-horizon deliverables such as slide decks, financial models, and legal analysis, delivering top performance while running faster and at a lower cost than competitive frontier models,” Mercor CEO Brendan Foody said in a statement.

OpenAI claims GPT-5.4 is its most “factually grounded” model to date. Internal metrics indicate individual claims are 33% less likely to be incorrect compared to GPT-5.2. Furthermore, a new/fast mode for Codex reportedly boosts token generation speeds by 50%.

Despite the impressive figures, industry analysts view GPT-5.4 as an evolution rather than a revolution. The AI landscape has settled into a predictable cycle of benchmark leapfrogging between Google, Anthropic, and OpenAI. While GPT-5.4’s native computer use and agentic workflows offer meaningful refinements, the release highlights a broader trend toward feature parity among the industry’s titans.

OpenAI is currently rolling out the model to Plus, Team, and Enterprise users. The previous GPT-5.2 Thinking model will remain as a legacy option until its scheduled retirement on June 5.

OpenAI Reclaims Benchmark Lead with GPT-5.4 Release

SHARE THIS STORY

FOLLOW US

OpenAI Reclaims Benchmark Lead with GPT-5.4 Release

TECHSTRONG AI PODCAST

SHARE THIS STORY

RELATED STORIES:

FOLLOW US

NEWSLETTER SIGN UP