Solid-state drives (SSDs) have been viewed as the holy grail of storage technology – they make computers run faster, cooler and quieter. The drives have played a pivotal role in bypassing performance standstills, and overcome significant energy concerns in data center computing.
But cost has continued to remain a thorny issue that operators have brought up again and again. SSDs sells at 3 to 4 times the price of HDDs, gigabyte for gigabyte, making the cost of acquisition significantly spiky.
Now vendors have begun to wonder if it is possible to tame the runaway TCO with clever implementation.
In a recent study, a team at Solidigm has shown that certain SSDs can be the basis of storage for AI workloads in data centers potentially capable of storing large volumes of data at roughly 30% lesser TCO than HDDs.
At AI Data Infrastructure Field Day, a Tech Field Day event, they reported their latest total cost of ownership (TCO) analysis, and the findings were astonishing. They served to remind about SSD’s potential as a storage medium in the data center, and especially made a case for high-density QLC SSDs as the preferred solution for AI.
The analysis demonstrated that high-density QLC SSDs, under certain scenarios, can shrink down cost pools – and are able to offer significantly better price-to-performance ratio compared to traditional choices, said Solidigm.
“TCO has become more critical in the AI era,” noted Manzur Rahman, product marketing engineering manager. “It helps evaluate cost-effective, high-performance hardware like GPUs, storage, and AI chips, especially for AI processing.”
“Incorporating energy costs into TCO enables selecting energy-efficient solutions for the data centers, and at the same time guides decisions on scaling with AI workloads ensuring that resources are right-sized without being over or under-provisioned,” he said.
To understand better, here is a breakdown of the TCO cost model. TCO comprises of two types of costs – direct and indirect costs. Cost drivers such as building, shipping, storage, compute, energy, cooling, and maintenance, come under one of the two categories.
These are further grouped into two cost systems – activity-based cost or ABC, and time-driven, activity-based cost or TDABC.
The cost of acquisition or CapEx, and the expense incurred across the product’s life cycle or OpEx make up the total cost of a technology. More specifically, one-time costs like purchase of storage and compute systems, are considered CapEx, whereas recurring expenses like overheads – labor and maintenance for example – are OpEx.
It’s no surprise that calculating and normalizing TCO of individual assets gets so much complex. With this many variables to be accounted for, factoring in workload mixes and changing scenarios, the math is not easy.
Solidigm’s sensitivity study incorporated a combination of variables and optimizations that tests the TCO and performance of the drives under various scenarios.
“When we are normalizing everything altogether, we normalize with respect to one cost unit which is dollar per TB effective per month per rack,” Rahman explained.
“The major purpose of normalization is when you are looking at the TB effective, the vendors and customers know that the upfront cost they are paying for the drives is not the end of the cost. We are incorporating everything – the operating cost, energy cost, etc. with respect to that TB effective so that they know how much it is actually costing them when they are maintaining their data center.”
The team began by analyzing the sensitivity of the standard TCO model to the drives. In two separate racks, the team loaded up HDDs and high-density SSDs. Keeping a set of variables – capacity utilization, compression and replacement cycle – equal, they started to calculate the TCO for both.
To put SSD’s potential to test, SSDs’ performance was adjusted to 4 times the HDDs, and Average Sales Price (ASP) and density were also set to 4x and 5x respectively.
They ended up with a 25% TCO improvement on the spot, said Rahman.
Further tuning the model, the team attempted to optimize the numbers, changing one variable at a time.
The results came back more and more surprising. When SSD capacity utilization was set to 95% compared to HDD’s 30 to 70%, the TCO gains ranged between 25% to 70%, said Rahman. Similarly, with 5x inline data compression, the benefits jumped to up to 84%.
“In summary, we found all the variables that impact the TCO heavily when using high-density SSDs. They impacted the CapEx, and as a result our TCO improved significantly compared to HDD.”
The study also found that OpEx like active and idle power, and duty cycle, have low to medium impact on the TCO, dollar per TB effective.
When applied to scenarios – low, base and high cases – the test showed significant efficiency and TCO gains. At the lowest case, a 120 TB QLC array offered 20% improvements in TCO compared to a 24 TB HDD array.
It keeps climbing upward. For base cases, they saw the numbers jump to 30%, reaching 70% in high case scenarios, Rahman told.
“Based on this analysis, we believe upcoming high-density QLC can deliver the standard storage SSD value.”
SSD sales have been brisk in data centers, and the market outlook is overall positive. Solidigm sees SSDs picking up adoption in AI-specific use cases starting 2025 as performance trends upward, replacement cycle grows longer, and TCO plummets, making high-density QLCs a perfect match despite the higher upfront costs.
Solidigm has recently added two new SSDs to its D7 family – D7-PS1010 and D7-PS1030 targeting AI storage use cases and mid-endurance workloads, respectively. Powered by SK Hynix’s 176L 3D TLC NAND, the drives come with PCIe Gen 5 interface, and deliver up to 3.1M random read IOPS. Compared to the previous generation D7-P5x20, Solidigm says that the new drives offer up to 70% better performance per watt, and up to 46% faster speeds for AI training.
Check out more presentations from Solidigm’s appearance at the AI Data Infrastructure Field Day event to learn more.