AI news

ClearML today made available a fractional graphics processor unit (GPU) technology that makes it possible for multiple workloads to share the same processor.

At the same time, the company is launching a resource Allocation and Policy Management Center through which compute resources can be allocated based on policies defined by either IT or data science teams, along with a dashboard for monitoring model endpoints, data outflows and infrastructure consumption.

The ClearML approach to virtualization takes advantage of time-slicing technology that NVIDIA has made available within it GPUs to enable organizations to increase their return on investment by increasing utilizations rates by as much as 32% using quotas defined by rules and pre-emption capabilities that dynamically control access to compute resources.

IT teams can also identify more easily which models are either underperforming or consuming too many compute resources via a single console. IT teams can also opt to license ClearML Enterprise to take advantage of dynamic multi-instance GPU capabilities, as well as robust policy management that applies sophisticated logic to managing quotas and over-quotas, priorities and job queues. A ClearML Autoscaling capability ensures cloud compute is only used when needed and automatically spun down after a pre-determined amount of idle time.

GPUs have been in short supply for years now, but the rise of generative artificial intelligence (AI) has further exacerbated the issue. Unlike earlier instances of AI models, generative AI consumes a lot more GPU resources. The challenge is consumption of those resources tends to be spiky, so organizations in the absence of any ability to share GPU resources find themselves overprovisioning infrastructure resources, says ClearML CEO Moses Guttman.

The fractional GPU technology developed by ClearML makes it simpler to toggle between smaller jobs to large, demanding jobs involving generative AI training, he adds. The ClearML approach to virtualizing GPUs, for example, enables multi-tenancy with partitions that offer secure and confidential computing with hard memory limitation to ensure predictable and stable performance. In addition, driver-level memory level monitoring ensures jobs run without impacting each other. “We do it at the driver level,” says Guttman.

That set-it-and-forget-it capability also reduces the cost of labor that would otherwise be required to optimize consumption of GPUs resources, he adds.

A recent survey of IT leaders from 1,000 organizations conducted by ClearML found nearly three quarters ( 74% ) are dissatisfied with their current job scheduling and orchestration tools, with only 19% actually have a scheduling tool that supports the ability to view and manage jobs within queues and effectively optimize GPU utilization. A total of 40% plan to use orchestration and scheduling technology to maximize their current investments in existing AI infrastructure.

In the meantime, more than half (52%) are actively looking for cost-effective alternatives to GPUs for inference in 2024m while 27% are looking for alternatives for training.

Alternatives to GPUs are, of course, not likely to arrive any time soon, so the priority now is to maximize limited infrastructure resources that are not likely to become any more widely available any time soon.