Breaking the AI Storage Bottleneck: Solidigm's Strategic Approach to Each Pipeline Stage

Efficient management, accessibility, and sheer volume of data in AI workloads necessitate robust and intelligent storage solutions. As AI models grow in size and complexity, every watt of power and every square inch of data center space become paramount, making storage choices fundamental to cost-efficiency and performance.

Navigating the AI Data Pipeline with Solidigm

At AI Infrastructure Field Day, Solidigm explained that AI work is not a monolithic task; it’s a multi-stage pipeline, each with distinct goals, workload characteristics, and hardware requirements. This pipeline often involves different organizations, with hyperscalers typically handling initial model creation and enterprises deploying and fine-tuning these models. Throughout this process, AI workflows continuously generate incremental data sets, emphasizing the omnipresent need for storage.

Solidigm defines six main steps in the AI pipeline, each presenting unique storage challenges and opportunities:

Ingestion—requires vast amounts of raw data, often many petabytes. Network Attached Storage (NAS) solutions are crucial, providing the high capacity needed to house these massive datasets. The Solidigm D5-P5336 122TB U.2 drive dramatically reduces NAS footprint and power consumption, theoretically shrinking 50 petabytes of effective storage from nine racks consuming 54 kilowatts to just one rack consuming 1.7 kilowatts.
Data Preparation—the cleaning, transformation, and preparation of raw data for training. This is primarily a CPU-intensive activity, where the workflow reads data from the NAS across the network into the Direct Attached Storage (DAS) within GPU servers for processing. Solidigm SSDs come in multiple form factors—including Enterprise & Datacenter SSD Form Factor (EDSFF E1.S, E3.S), and traditional U.2 and M.2—for inclusion in these servers.
Training—typically random-read intensive, interspersed with sequential writes for checkpointing (saving the model’s state). These workloads demand high performance and low latency, making DAS products like Solidigm’s E1.S and E3.S form factors ideal due to their direct PCIe connections to GPUs.
Fine-tuning—adapting pre-trained foundation models with proprietary data for specific enterprise solutions. Like training, this stage heavily leverages DAS within the GPU servers, requiring fast access to adapt models efficiently.
Inference—the deployment stage where models generate outputs and solve problems, often involving real-time data interactions. Training is beginning to shift from DAS to NAS to support massive Retrieval Augmented Generation (RAG) databases, which are economically infeasible to manage solely in memory. Many AI models are so massive they exceed the memory capacity of a single GPU. Offloading model weights to Solidigm SSDs is a necessary technique, allowing complex models with billions of parameters to run on existing hardware. This significantly reduces the amount of high-cost DRAM required, leading to substantial savings.
Archiving—This final stage involves storing results and data for future retraining or audit purposes. It primarily involves sequential read/write operations and utilizes NAS solutions for cost-effective, high-capacity long-term storage.

Impact on AI TCO and Sustainability

Solidigm’s high-density and high-performance SSDs deliver significant impacts on the Total Cost of Ownership (TCO) and sustainability of AI infrastructure.

High-density drives like the 122TB P5336 can lead to dramatic reductions in physical footprint, network and power consumption. Solidigm claims a theoretical 90% reduction in rack space and network connections, freeing up space for servers and GPUs and reducing the number of network ports consumed for storage.

Perhaps most importantly, the increase in storage density includes a concomitant reduction in power consumption—a theoretical 90% reduction.

Solidigm engineered the 122TB drive with firmware that ensures 24/7 reliability for five years without wearing out under continuous write operations. This extended endurance is crucial for write-intensive stages like training, significantly reducing maintenance and replacement costs.

While larger SSDs can consume more power, Solidigm focuses on the “every watt matters” concerns with power management at the drive level. The company is actively innovating in cooling, including single-sided cold plate attachment for liquid cooling. This design allows for the removal of fans from storage enclosures and enables hot-swap ability in fully liquid-cooled systems, enhancing density and energy efficiency in high-power AI environments.

Solidigm views storage as an active enabler of AI innovation, leading to greater model accuracy, expanding deployment possibilities, and delivering substantial TCO and sustainability benefits. Solidigm is addressing the complex storage requirements of each AI pipeline stage with intelligently designed, high-density and high-performance SSD solutions.

Breaking the AI Storage Bottleneck: Solidigm’s Strategic Approach to Each Pipeline Stage

Navigating the AI Data Pipeline with Solidigm

Impact on AI TCO and Sustainability

SHARE THIS STORY

FOLLOW US

Breaking the AI Storage Bottleneck: Solidigm’s Strategic Approach to Each Pipeline Stage

Navigating the AI Data Pipeline with Solidigm

Impact on AI TCO and Sustainability

TECHSTRONG TV

Tech Field Day Events

TECHSTRONG AI PODCAST

SHARE THIS STORY

RELATED STORIES:

FOLLOW US

NEWSLETTER SIGN UP