Keeping the Token Flywheel Spinning with Upscale AI

Modern data centers are no longer just repositories for web pages and databases. They’ve evolved into token factories. This shift in purpose demands a radical departure from traditional networking logic, which is exactly why Upscale AI is gaining traction. During Networking Field Day 40, they laid out a vision that treats the network not as a general-purpose utility, but as a specialized manufacturing floor for intelligence.

The primary currency in this new economy is the AI token. The industry is moving away from metrics like dollars per bit and toward tokens per watt and tokens per second. In practice, this creates a token flywheel. Operators spend capital on power and compute to generate tokens, which produces revenue used to buy more power and compute. If the network is suboptimal, your expensive GPUs sit idle while waiting for data synchronization. That idleness kills your tokens per watt efficiency and grinds the flywheel to a halt.

The Architecture of Synchronization

Traditional data center networks were designed for north-south traffic, where thousands of individual clients make independent requests. AI clusters flip this model on its head, generating massive bursts of east-west communication. In these environments, a small number of GPUs generate symmetric, heavy flows that must be perfectly synchronized. Standard switches, built to load balance millions of tiny web requests, frequently suffer from hash collisions when faced with these elephant flows.

Taken together, these requirements necessitate a clean-sheet approach to silicon and software. AI traffic isn’t just data, it’s memory transactions. Technologies like RDMA allow for zero-copy data transfers that bypass the CPU and kernel entirely, reading directly from a remote GPU’s memory. This removes the inherent tolerance for packet loss and latency that we’ve lived with for decades. In a synchronized cluster, a single dropped packet can stall the entire computation. You can’t rely on the reactive nature of TCP anymore. You need a lossless underlying fabric that uses proactive congestion mechanisms to stop drops before they even happen.

Defining the Domains: Scale-Up vs. Scale-Out

Upscale AI splits the networking problem into two distinct domains, scale-up and scale-out. Each has its own traffic characteristics and hardware requirements.

Skyhammer and the Scale-Up Domain

Scale-up networking ties GPUs together, usually within a single rack, to function as one massive, flat memory space. This domain is all about load and store operations. To handle this, Upscale AI built the Skyhammer architecture from the ground up. It’s a clean-sheet design that focuses on one mantra: performance.

Skyhammer delivers sub-microsecond round-trip latency, typically between 500 and 700 nanoseconds. It achieves this by stripping away the bloat of traditional enterprise features. By optimizing packet headers and minimizing the inter-packet gap, it reduces latency both on the wire and during ASIC processing. Unlike traditional switches that might allow for oversubscription, Skyhammer guarantees a 1:1 ratio. This ensures that the network never becomes the bottleneck for the memory-intensive tasks happening within the rack.

Spectrum-X and the Scale-Out Domain

The scale-out domain connects these high-performance racks across the broader data center. Here, the focus shifts to memory copy operations, moving data between distant memory spaces. Rather than reinventing the wheel, Upscale AI partnered with NVIDIA to use their Spectrum-X ASIC for this layer.

They build open, Ethernet-based systems around this silicon, running an AI-optimized operating system based on SONiC. This allows them to bring enterprise-grade operational resilience to the AI cluster. In practice, this means hitless upgrades, removable supervisors, and strict software signing. They’ve also added specialized circuitry for real-time telemetry, providing packet-level visibility at the microsecond level. This is a far cry from the 60-second SNMP polls we used to rely on.

Future-Proofing the Fabric

The AI landscape is volatile. New protocols and specialized chips, from TPUs to LPUs, are entering the market at a breakneck pace. If you build a network that’s too rigidly tuned to today’s standard, you’re buying a legacy system. Skyhammer addresses this by incorporating flexible semantics. It can interpret and route a variety of emerging scale-up protocols, providing a level of investment protection that’s rare in high-performance hardware.

Ultimately, this supports a heterogeneous compute future. By leaning into open standards like the Open Compute Project and Ultra Ethernet, Upscale AI ensures the network remains a common substrate. It prevents vendor lock-in and allows operators to mix and match compute components as the technology evolves.

Bringing IT All Together

The transition to AI-centric networking is a fundamental pivot, not a minor tuning exercise. Traditional switches are ill-equipped for the synchronized, zero-loss demands of memory-level transactions. Upscale AI’s Skyhammer represents a move toward purpose-built silicon for the scale-up domain, offering the sub-microsecond latency required to keep the token flywheel spinning. By combining this architecture with open standards and AI-optimized operations, Upscale AI is building a fabric that treats the network as a first-class citizen in the compute stack. Ultimately, the success of an AI data center is measured in tokens, and you can’t generate tokens if your GPUs are waiting on a congested, general-purpose network.

To learn more about Upscale AI and their Skyhammer architecture, make sure to check out their website at https://upscale.ai. To see the entire Upscale AI presentation from Networking Field Day, head over to the presentation appearance page.

Keeping the Token Flywheel Spinning with Upscale AI

The Architecture of Synchronization

Defining the Domains: Scale-Up vs. Scale-Out

Skyhammer and the Scale-Up Domain

Spectrum-X and the Scale-Out Domain

Future-Proofing the Fabric

Bringing IT All Together

SHARE THIS STORY

FOLLOW US

Keeping the Token Flywheel Spinning with Upscale AI

The Architecture of Synchronization

Defining the Domains: Scale-Up vs. Scale-Out

Skyhammer and the Scale-Up Domain

Spectrum-X and the Scale-Out Domain

Future-Proofing the Fabric

Bringing IT All Together

TECHSTRONG AI PODCAST

SHARE THIS STORY

RELATED STORIES:

FOLLOW US

NEWSLETTER SIGN UP