Synopsis: In this Techstrong.ai Leadership Insights video, OverClock Labs CEO Greg Osuri explains why a mismatch between the demand for the latest graphics processing units (GPUs) and actual supply is creating an imbalance in the artificial intelligence (AI) ecosystem that might be better addressed by relying more on distributed computing.

Are we building enough AI compute—or the right kind?

Short answer: it depends on the week. Utilization swings are real—one month H100s run ~60%, the next they spike to ~98%. The bigger story isn’t a simple shortage or glut; it’s a misalignment problem. NVIDIA ramps one generation (H100/H200), then announces the next (B300), and demand whipsaws toward the new while supply is still geared to the old. That leaves cloud owners heavy on last-gen inventory while everyone chases the “year-one” chips. Prices tell the tale, too: training-grade GPU costs typically drop ~20% annually, with the steepest dip from year one to year two—great if you got in early, rough if you didn’t.

Homogeneity makes it worse. If you trained on H100s, you can’t casually mix in idle A100s without paying a performance penalty. At scale, that rigidity is expensive. Meanwhile, the real constraint isn’t just silicon—it’s power. The operative metric is energy-per-FLOP. Newer chips extract far more compute from each kilowatt-hour, which matters when training clusters push into hundreds of megawatts. Think 300 MW projects coming online, 600 MW by 2027–2028, and gigawatt-class needs by 2030. In the U.S., that points to nuclear for truly dense capacity—hardly a fast-track option. No surprise some sites supplement with LNG because the grid can’t deliver what’s needed where it’s needed.

Could “good enough” models run on older GPUs? Sometimes—but you’ll feel it in response times and user experience. Efficiency is improving (prompt-level energy has fallen by an order of magnitude in a few years), yet credible studies still peg AI’s future share of U.S. electricity in the low-teens—possibly higher under less conservative assumptions. And we’re early: medicine, billing, diagnostics—huge domains—haven’t even flipped the switch at scale.

There is hope: asynchronous and low-communication training, better fault tolerance, and heterogeneous GPU use could unlock distributed capacity—from gaming rigs to smaller sites—without being bottlenecked by the slowest node. The closing advice is refreshingly blunt: buy GPUs and buy solar. Decentralize compute and energy, reduce single-site risk, and get ahead of the power economics that will define AI’s next decade.