Oracle has unveiled a new category of supercomputers that it says will provide unprecedented compute power for advanced large-scale model training and inferencing in enterprises.
The cloud computing cluster, an addition to the Oracle Cloud Infrastructure (OCI) Supercluster running on private, public and sovereign clouds, was released on September 11 at the Oracle CloudWorld Conference, with the provider calling it the” largest AI supercomputer in the cloud” the world has seen till now.
AI’s thirst for compute power has been a dominant discussion at the Cloud Field Day events hosted by Tech Field Day, an arm of The Futurum Group. As organizations acquire more compute-intensive AI workloads, they need access to groundbreaking compute speed and power to run these applications at scale and obtain better returns for their investments.
The OCI supercomputer which packs a whacking 131,072 NVIDIA Blackwell GPUs at the top end will give Oracle Cloud Infrastructure (OCI) customers zettascale computing power and performance.
“NVIDIA’s full-stack AI computing platform on Oracle’s broadly distributed cloud will deliver AI compute capabilities at an unprecedented scale to advance AI efforts globally and help organizations everywhere accelerate research, development and deployment,” said Ian Buck, VP of Accelerated Computing at NVIDIA, on the company website.
According to Oracle, the top model accelerated with the latest generation of NVIDIA GPUs provides a peak performance of 2.4 zettaFLOPS. Extraordinary volumes of complex calculations can be performed every second at that speed.
The supercomputer establishes Oracle as the first and only cloud provider to offer systems of zettabyte scale and capacity. Oracle noted that the max model of OCI Supercluster surpasses all existing hyperscaler products with 6x more GPUs. The closest contender, the Fortier Supercomputer, an exascale machine that is touted as the number one supercomputer in the Top 500, packs only 38,000 AMD GPUs delivering a peak performance of 1,200 ExaFLOPs. The new OCI supercluster offers three times as many GPUs, says Oracle.
Besides Blackwell, Oracle Supercluster also supports other variants of NVIDIA GPUs – the H100, H200 and B200 Tensor Core Processors. The machine carrying H100 scales to over 16,000 accelerators offering a max performance of 65 ExaFLOPS. The H200 model one-ups that with four times more accelerators and a performance of up to 260 ExaFLOPs.
The clusters include OCI Compute Bare Metal with communication happening over ultra-low latency RoCEv2 or NVIDIA Quantum-2 InfiniBand-based networks, and storage systems supporting high-performance computing (HPC).
“With Oracle’s distributed cloud, customers have the flexibility to deploy cloud and AI services wherever they choose while preserving the highest levels of data and AI sovereignty,” commented Mahesh Thiagarajan of OCI.
Oracle is currently taking orders for clusters with H100/H200 Tensor Core chips with the Blackwell-based systems arriving in H1 of 2025.
As for who needs machines this enormous, and to what end, Thiagarajan wrote in a blog that the intended users are AI companies with the most advanced compute requirements. In other words, AI organizations developing trillion-parameter models and doing real-time inferencing at scale can leverage their extreme horsepower to accelerate demanding AI workloads.