A survey of 788 leaders responsible for managing IT infrastructure in the U.S. whose organizations either are deploying or plan to deploy artificial intelligence (AI) workloads in production environments finds only about a third (34%) judge them to be highly predictable.

Published this week by Virtana, a provider of an observability platform for IT environments, the survey also finds two-thirds (66%) are leading IT teams that are managing AI infrastructure without reliable performance baselines, even though 54% report their organization is running AI workloads at scale.

The challenge is that many of those IT teams are struggling to manage those workloads cost efficiently, says Virtana CEO Paul Appleby.

For example, 69% report lacking the ability to automatically discover the root cause of an issue across domains, with 25% still relying on manual investigations across disconnected consoles as their first response. Well over a third (38%) said there is a need for unified visibility across AI and infrastructure layers.

Additionally, tracking cost and efficiency metrics (57%), followed closely by graphics processing unit (GPU) utilization, are identified as the top two monitoring challenges. Other blind spots include data pipeline visibility (52%), storage and throughput (47%) and network bottleneck detection (44%).

AI workloads are also having a substantial impact on how IT budgets are allocated, with a full 80% noted that the cost of premium AI hardware is reshaping infrastructure decisions. As a result, well over half (56%) report IT teams are deferring legacy infrastructure modernization, followed closely by 54% that have deprioritized cost optimization initiatives even as 60% shift workloads across hybrid environments as part of an effort to accelerate platform consolidation (58%). Half (50%) have also deprioritized security and compliance reviews, the survey finds.

Ultimately, the goal is to build and maintain an AI Factory, but most IT teams are a long way from being able to achieve that goal, says Appleby. Unfortunately, too many business leaders have yet to fully appreciate the IT issues that need to be addressed in order to build and deploy those AI factories, he adds.

The good news is that most of these issues can be addressed in time. “It’s not like it’s a problem without a solution,” says Appleby.

At the moment, however, too many business leaders are assuming their IT systems are more able to run AI workloads than they actually are, he adds.

Each organization will need to determine what AI workloads to prioritize based on whatever level of IT infrastructure they can actually access. GPUs continue to be in short supply. The paradox is that many of the applications that could be deployed on those GPUs are not yet ready to be deployed in a production environment. In many cases, there’s also only one unpredictable application that has been deployed, resulting in single-digit GPU utilization rates.

Eventually, there will come a day when organizations are able to operationalize AI more effectively. In the meantime, however, the pace at which those innovations are occurring, despite all the potential, appears to be glacial despite the massive investments that many organizations have already made.