
A recent survey by Futurum Intelligence revealed that 80% of CIOs from major companies are either looking to conduct AI pilots or already implementing AI in some capacity. While the excitement for AI is palpable, enterprises face considerable challenges, including cost, scalability, performance, and, arguably the most important, deployment tradeoffs: cloud or on-premises and AI accelerator hardware.
A key focus for enterprises is AI inferencing infrastructure, which involves deploying trained AI models to make predictions or decisions. This differs from AI training, which is a separate, more computationally intensive process.
Signal65: Independent Testing Unearths Critical Insights
To provide clarity in the rapidly evolving IT technology landscape, Signal65, an independent performance testing and benchmarking arm of the Futurum Group, conducts rigorous product testing and data validation.
Mitch Lewis, representing Signal65, presented their recent AI performance testing results, specifically focusing on Intel’s Gaudi 3 AI accelerators, at a recent Cloud Field Day event. For a comprehensive overview and detailed results, the presentation can be viewed in the video Intel Gaudi 3 AI Performance Testing with Signal65 on the Tech Field Day YouTube channel.
The Testing and Key Findings
Signal65 undertook two main testing projects, both focused on AI inferencing: one on-premises and the other on IBM Cloud. The tests measured throughput and involved varying input/output token shapes (e.g., 128 input/128 output tokens for short question-answering, up to 4096 input/2048 output for multi-turn chat or RAG applications) and batch sizes. Signal65 co-developed the testing suite with Kamiwaza, a company focused on enterprise AI platforms.
On-Premises Testing: Intel Gaudi 3 vs. NVIDIA H100
The initial tests evaluated the price and performance of Intel Gaudi 3 against NVIDIA H100 in an on-premises setting, utilizing Meta’s Llama 8B and 70B models.
For the smaller Llama 8B model, Gaudi 3 generally outperformed the H100 in three out of four tested configurations. With the larger 70B model, the performance was more competitive, with Gaudi 3 slightly better in one configuration but slightly less in others, demonstrating, according to Mr. Lewis, a “fairly competitive” standing rather than a “huge drop off.”
Gaudi 3 truly shined when Signal65 considered price. While nailing down exact on-premises pricing can be complex due to discounts and configurations, publicly available data indicated a significant cost difference: an eight-GPU system with H100s cost over $300,000, while a comparable Gaudi 3 system was over $150,000. When factoring in this price difference, the price-performance for Gaudi 3 saw a massive increase, ranging from 10% to 150% better than NVIDIA H100. For the Llama 8B model, even in the configuration where NVIDIA had better raw performance, Gaudi 3 showed a “pretty big advantage” in tokens per dollar. Similarly, for the Llama 70B model, Gaudi 3 won in all scenarios when Signal65 factored in the price.
While not directly measured, Mr. Lewis noted that Signal65 estimates that Gaudi 3 is 40% to 50% more power-efficient than H100, which would further enhance its price-performance advantage for on-premises customers.
IBM Cloud Testing: Gaudi 3 vs. NVIDIA H100 vs. NVIDIA H200
Following the on-premises tests, Signal65 explored Gaudi 3’s performance on IBM Cloud, which officially announced its support for Gaudi 3 in May. Testing in the cloud involved a broader set of models, including Granite, Mixtral, and a very large Llama 405B model, and compared Gaudi 3 against both NVIDIA H100 and H200. Signal65 used vLLM (a virtual LLM inference server that improves generative AI output speed through better GPU memory utilization) as the only optimization of otherwise out-of-the-box configurations.
The Gaudi 3 performance results were:
- For Granite (smaller model), Gaudi 3 performed well, especially as batch sizes increased.
- With Mixtral (medium use case), which could not even run on a single H100 due to memory constraints, Gaudi 3 was highly competitive with the H200.
- For the very large Llama 405B model with large input/output, Mr. Lewis said that H100 “basically isn’t competitive at all” due to KV cache issues, while Gaudi 3 was “very very competitive” with H200, taking the lead at the largest batch size.
A key finding was that Gaudi 3 instances on IBM Cloud cost $60 per hour, while both H100 and H200 instances were $85 per hour, representing a 30% lower cost for Gaudi 3. Factoring in cloud pricing, Gaudi 3 showed a significant price-performance advantage: up to 335% more than H100 and 92% more than H200. According to Mr. Lewis, while Gaudi 3 might not outperform in every single configuration, its cost advantage “evens out” the playing field and provides a compelling reason for adoption.
Why Enterprises Should Explore Intel Gaudi 3 and IBM Cloud
Signal65’s findings present a compelling case for enterprises to broaden their AI hardware deployment considerations beyond the traditional NVIDIA dominance on the big three clouds. In essence, for enterprises navigating the complexities of AI adoption, Intel Gaudi 3, particularly when deployed on IBM Cloud, stands out as a powerful and cost-effective alternative to traditional NVIDIA/big three cloud deployments, offering competitive performance with a clear financial advantage.