Crusoe Adds Managed Inference Service to Optimize AI Model Performance

Crusoe today added a managed service to its cloud portfolio to optimize deployment of inference models using memory cache technology it gained by acquiring Atero earlier this year.

In addition, Crusoe has added Crusoe Intelligence Foundry, a unified hub that makes it simpler to discover open source AI models and create application programming interface (API) keys to invoke them in minutes

The Crusoe Managed Inference is based on proprietary MemoryAlloy memory fabric, a cluster-wide KV cache for graphical processor units (GPUs) that reduces latency and overall scalability by fetching prefix caches from local and remote nodes instantly. That approach also serves to improve time-to-first-token (TTFT) speeds that are nearly 10x faster, says Erwan Menard, senior vice president for product management for Crusoe.

Crusoe is one of several so-called neocloud providers of IT infrastructure that have emerged as an alternative to hyperscalers to provide infrastructure service specifically for AI workloads. Open source models that are available on its platform include Llama 3.3 70B Instruct, Gemma 3 12B, Gpt-oss-120b, Qwen3 235B A22B Instruct 2507 and models developed by Decart, a provider of video and multimodal AI models.

That focus enables Cruso to provide AI teams with superior support as they look to deploy and optimize AI models, says Menard. “We do AI and AI only,” he says.

In the case of Crusoe, there is also a major emphasis on using renewable energy sources to enable organizations to deploy AI workloads at scale.

Like most providers of cloud services for deploying AI models, Crusoe is trying to serve the needs of two distinct constituencies. On the one hand, there are the data science and application development teams that build AI models. On the other are the IT teams that are now in many organizations assuming responsibility for deploying and optimizing AI inference models in production environments.

Each of those teams is provided with a unique set of APIs and consoles to make it simpler to manage tasks, says Menard. The overall goal is to provide a flexible set of IT infrastructure services in a way that makes costs more predictable, he adds.

In general, open source AI models are gaining more traction as organizations realize that they can distill smaller models from them that they can then optimize for specific domains, notes Menard. In time, most organizations will be running small language models in a way that reduces dependencies on invoking APIs to access large language models (LLMs) that require payment for using expensive tokens for every input and output, adds Menard.

Ultimately, many organizations will be relying on a mix of platforms and services to access GPU resources that, given demand, continue to be relatively scarce. The issue then becomes to what degree many of them are willing to rely on emerging cloud service providers such as Crusoe versus a hyperscaler that they may have already set up an enterprise agreement through which steep discounts are made available based on how much infrastructure they consume.

Crusoe Adds Managed Inference Service to Optimize AI Model Performance

SHARE THIS STORY

FOLLOW US

Crusoe Adds Managed Inference Service to Optimize AI Model Performance

TECHSTRONG AI PODCAST

SHARE THIS STORY

RELATED STORIES:

FOLLOW US

NEWSLETTER SIGN UP