Synopsis: In this Techstrong.ai Leadership Insights interview, GMI Cloud CEO Alex Yeh explains how the rise of neocloud providers is reshaping access to GPUs required to train and run AI models. He discusses how alternative cloud infrastructure models can improve cost efficiency, optimize resource utilization, and help address sustainability challenges associated with large-scale AI workloads.

AI may be moving fast, but the infrastructure underneath it is still a bottleneck. Alex Yeh discusses how “neocloud” GPU providers are emerging to fill the widening gap between soaring AI demand and constrained supply.

Yeh argues that simply offering GPU capacity is no longer enough. As enterprises and builders juggle multiple models and modalities, the real challenge becomes stitching together workflows, not picking a single model. GMI’s approach is to abstract that complexity behind an API layer and a workflow builder (GMI Studio) that lets teams string together models, tools and services into agentic applications without needing deep ML or infrastructure expertise.

The other major theme is scarcity. Yeh calls 2026 a “year of constraints,” with everything from GPUs and memory to data center capacity under pressure and prices rising. His advice is blunt: plan ahead. AI teams can’t wait until after a funding round to “find a couple thousand cards.” GMI, he says, is locking in memory pricing and data center capacity years out to ensure supply.

On compute strategy, Yeh expects NVIDIA to continue dominating training workloads, while inference opens the door to alternatives, mostly for very large organizations that can afford the engineering overhead. For most startups, he recommends sticking with NVIDIA’s CUDA ecosystem to avoid costly debugging and tooling gaps, and leaving hardware diversification decisions to infrastructure providers.

Yeh also outlines how enterprise buying is shifting. Data science teams may own training, but IT organizations increasingly drive inference and production decisions, where security, isolation, reliability and burst capacity matter. He warns that “just buying a few servers” often ends in crashes and operational pain once multiple internal users and workloads collide.

Looking ahead, Yeh predicts an explosion of inference-driven applications and a move from “middle-to-middle” AI tools to end-to-end agents that execute tasks across screens and systems, with humans increasingly acting as approvers rather than operators.