Arm has unveiled a new platform, Lumex, that promotes a clear message to phone and PC makers: the future of AI shouldn’t live in the cloud. It should run locally—fast, private, and always on. To get there, Arm’s Lumex isn’t offering a stand-alone neural unit. It’s doubling down on the CPU, expanding its role with a fresh instruction set and a next-gen software stack designed to make on-device AI a routine element.

“AI is no longer a feature, it’s the foundation of next-generation mobile and consumer technology,” said Chris Bergey, SVP and GM of the Client Line of Business at Arm. “Users now expect real-time assistance, seamless communication, or personalized content that is instant, private, and available on device, without compromise.”

A Compute Subsystem

At its core, Lumex is a compute subsystem, not a single chip. It’s a blueprint targeting 3nm nodes. The headline feature is SME2 (Scalable Matrix Extension v2) for Armv9.3 CPUs, which Arm says delivers up to a 5x uplift on AI workloads. The company claims 4.7x lower latency on speech tasks and 2.8x faster audio generation. Arm has been on a run of yearly Instructions Per Cycle (IPC) gains. Lumex extends these gains and builds them into a platform strategy that spans phones, tablets, and Windows on Arm PCs.

The C1 CPU cluster arrives in four tiers: C1-Ultra and C1-Premium as performance cores, C1-Pro and C1-Nano for efficiency. The split advances Arm’s long-standing big.LITTLE idea into a more granular approach, giving OEMs headroom at the top while conserving area and power elsewhere. Arm says Premium can stand in for Ultra in sub-flagship designs at roughly 35% less area, and Pro improves power efficiency versus prior generations at the same frequency. The company has a big ambition: SME and SME2, by Arm’s projection, could exceed 10 billion TOPS across more than 3 billion devices by 2030.

While CPU is the biggest part of the story, the GPU also gets a refresh. The new Mali G1-Ultra promises 20% higher graphics performance, 2x ray-tracing throughput via a redesigned Ray Tracing Unit (RTUv2), and 20% faster AI inference. Arm touts a 40% frame rate lift in ray-traced scenes on heavy benchmarks. The GPU’s placement on its own power island cuts energy leakage at idle, which is useful for any device that juggles gaming visuals and on-device model inference.

The developer story is the other advance. Lumex includes KleidiAI, a library stack that integrates SME2 across major frameworks (PyTorch ExecuTorch, Google LiteRT, Alibaba MNN, and ONNX Runtime) so apps can benefit without code changes. That’s a move intended to support the reality of phone and PC software: developers want portability and fewer knobs to turn. Google’s apps, including YouTube, are slated to be SME2-ready as Lumex hardware ships, and Arm emphasizes that optimizations built for Android will carry to Windows on Arm.

AI Inference on the Device

Lumex offers potential cost and efficiency gains. Shipping more AI inference to the device reduces cloud costs and sidesteps latency and privacy limits. That’s why Arm is promoting examples like a virtual gaming coach that runs locally or a payment provider moving tasks on-device during peak hours. In demos, a yoga-tutor app saw 2.4x faster text-to-speech, and a partner reported a 40% cut in LLM response time.

Bottom line: Lumex is Arm’s move to make AI the centerpiece of its product line, and to promote its primary product, the CPU, as capable of serving the emerging AI world.

TECHSTRONG TV

Click full-screen to enable volume control
Watch latest episodes and shows

Tech Field Day Events

TECHSTRONG AI PODCAST

SHARE THIS STORY