AI news

Skymel today emerged from stealth to preview an adaptive inferencing platform for artificial intelligence (AI) workloads that enables engines running on a local machine to invoke additional graphical processor unit (GPU) resources in the cloud as needed.

Dubbed NeuroSplit, the company claims this approach has the potential to reduce GPU costs by as much as 60% at a time when the cost of those resources is limiting the number of AI projects that an organization can launch.

Skymel CTO Sushant Tripathy said, in addition, IT organizations will be able to more flexibly employ different classes of GPUs. For example, an AI application that typically requires multiple Nvidia A100s at an average cost of $2.74 per hour can use NueroSplit to alternatively invoke either a single A100 or multiple V100s at 83 cents per hour when using NeuroSplit.

Scheduled to be made available in a private beta in the third quarter, NeuroSplit makes it possible to both improve performance using pipeline caching and simultaneously run multiple models per application.

The overall goal is to enable organizations to deploy larger AI models without being constrained by the GPU resources available on a local machine, said Tripathy.

Alternatively, IT organizations could opt to run inference engines on other types of processors, but those alternatives don’t provide the same ability to process data in parallel, he added. In effect, organizations wind up paying a performance penalty when they don’t make use of GPUs, so a more flexible approach to maximizing utilization of GPU resources across a hybrid IT environment will provide a better alternative, said Tripathy.

It’s not clear to what degree the current GPU shortage is slowing down the pace at which AI applications might be developed, but it’s apparent that organizations will need to prioritize which projects gain access to what today is a constrained resource. There may come a day when there are multiple providers of GPUs but, for now, supplies are constrained because the semiconductor companies that manufacture these processors are not able to keep pace with demand.

IT teams, as a result, are being asked to commit to leasing GPU resources in the cloud, or acquire servers that include GPUs to train AI models. The inference engine that gets created, however, will typically need to be deployed as close as possible to the data that drives an AI application. Skymel is making a case for a hybrid approach that augments local GPU resources with cloud resources that organizations can use to extend multiple applications as needed.

Regardless of approach, IT teams are increasingly being tasked with managing the underlying infrastructure needed to train and deploy AI models. Data science teams will still build the AI model, but the deployment of the inference engines is being shifted to IT operation teams that are being asked to minimize the total cost of AI.

The challenge, as always, is finding a way to achieve that goal without adversely impacting the performance of AI applications that typically require access to compute and storage resources at unprecedented levels of scale.