
NVIDIA Corp. has introduced the Rubin CPX, a specialized context processing GPU designed for ultra-long context applications exceeding 1 million tokens. This new architecture addresses growing demand for extended context processing in advanced artificial intelligence (AI) workloads.
Rubin CPX delivers 8 exaflops of NVFP4 computational performance, with projected revenue generation of $5 billion for every $100 million in capital investment. This performance-to-investment ratio demonstrates the platform’s potential for transforming AI infrastructure economics.
Several leading AI companies have already expressed support for the Rubin CPX platform, including development tools provider Cursor, inference platform Fireworks.ai, and coding assistant company Magic.
“The Vera Rubin platform will mark another leap in the frontier of AI computing — introducing both the next-generation Rubin GPU and a new category of processors called CPX,” NVIDIA CEO Jensen Huang said in a statement. “Just as RTX revolutionized graphics and physical AI, Rubin CPX is the first CUDA GPU purpose-built for massive-context AI, where models reason across millions of tokens of knowledge at once.”
Alongside Rubin CPX, NVIDIA revealed new giga-scale reference designs for AI factories, providing blueprints for organizations implementing large-scale AI infrastructure with optimized performance and efficiency.
NVIDIA also disclosed that Blackwell Ultra architecture has achieved breakthrough performance on the latest reasoning inference benchmarks. The company maintains dominance across all MLPerf Inference per-accelerator categories while establishing new records on emerging evaluation frameworks.
The Blackwell Ultra platform is purpose-built for Mixture-of-Experts and reasoning workloads. Its architecture integrates full-stack codesign principles with specialized components including NVIDIA High-Bandwidth Interface, NVFP4 Tensor Cores, NVLink72 connectivity, and comprehensively optimized software environment.
The platform has demonstrated exceptional performance across multiple benchmarks. Blackwell Ultra achieved top results on the DeepSeek-R1 evaluation, while standard Blackwell systems established new Interactive performance standards for Llama 3.1 405B. The implementation of disaggregated serving architecture has yielded nearly 50% improvements in per-GPU performance metrics.
The financial benefits of inference optimization are substantial. A GB200 NVL72 system requiring a $3 million investment can potentially generate $30 million in token-based revenue, demonstrating a 10x return on investment enabled by superior inference performance.
Through NVFP4 technology, NVIDIA delivers enhanced computational performance without compromising model accuracy, making these inference capabilities fundamental to optimizing the economic efficiency of AI factory operations.