The AI token is the new cell phone minute, Rafay Systems has found. And it will require its own billing infrastructure.
Extending its Kubernetes-based Platform-as-a-Service software, Rafay has launched a ‘Token Factory’ to provide AI service providers with an access control layer to bill for their services, handling metering, pricing, and quotas.
“Our job is to deliver said models in the easiest, most user-friendly way to consumers out there, and to track consumption,” said Haseeb Budhani, co-founder and CEO of workflow automation system provider Rafay Systems, in an interview.
The company is betting that the AI market will vastly expand beyond its present state of being led by a few giant AI frontier labs such as OpenAI and Anthropic. According to Research and Markets analyst firm, the market for GPUs-as-a-Service will hit $7.36 billion in 2026 and, by 2031, explode to $26.43 billion.
In its view, the company sees the Token Factory as a delivery plane for AI models, to be used by continually-running agents such as OpenClaw and NemoClaw. The provider exposes a set of AI services by way of API endpoints, which are metered by usage.
The Token Factory has been validated to work with OpenClaw and NVIDIA NemoClaw.
Expanding the AI Market
Rafay Systems offers an infrastructure management platform, one that packages Kubernetes deployments as a set of Platform-as-a-Service (PaaS) capabilities.
Rafay has enjoyed a surge in support for AI workloads, as more AI startups, and even IT incumbents, enter the market to serve various vertical and specialized domains. In his talks with telecommunications companies, Budhani found they are very interested in boosting their user revenue by adding token-based AI services into the portfolio.
Recently, GPU service provider Argentum AI has based its operations on Rafay. Argentum, which manages thousands of GPUs, sells to the world’s largest AI operators, including hyperscalers, neoclouds and enterprise-scale resellers. Rafay will provide a single unified software orchestration layer that Argentum can use to provide customized compute environments to its customers.
“We are accelerating the time to market for all these AI use cases,” Budhani said. “We are all a little surprised, in a positive way, how many of these customers are out there.”
Billing Services
Rafay moved to provide a Token Factory to its PaaS platform after a meeting with NVIDIA. Unlike Kubernetes itself, company execs learned, the AI ecosystem is built on a direct consumption model. AI service providers need billing infrastructure.
For users of such AI services, “if you make it easy for them to use it, they’ll use it. If you make it hard for them to use it, they’re not going to use your infrastructure,” Budhani said. Entities such as OrbAI, and the open source Lago have already rushed to fill the void.
Rafay’s platform could already count hours of user usage of CPUs and virtual machines, so it was an easy jump for the company to add in token billing as well. “The token unit is a primitive that the platform understands,” Budhani said. “The token is the new cell phone minute.”
Rafay tracks token usage at the Kubernetes API or through the Envoy service mesh software, depending on the setup. On the provider side, it can connect with NVIDIA’s Dynamo inference orchestration software, or the vLLM inference engine, depending on what model the customer wants to use.
Bills can be directed either to individual users, to teams, or at the enterprise level. Token price is selected by the operator, and the bill is compiled by Rafay to be submitted to the operator’s back-end billing engine, be it Monetize360, Amdocs or some other engine.
Kubernetes Is Important but also Invisible
“Kubernetes is a good platform to build such a token factory,” Budhani said. AI services are largely container-based, built on a cloud-native microservices architecture. So it was natural that “Kubernetes became the standard in this market,” he said.
At the same time, however, most of Rafay’s current customers use a “serverless pod” SKU (stock-keeping unit), meaning they are relying on Kubernetes from a cloud provider, such as AWS’ Elastic Kubernetes Service (EKS), and not running Kubernetes themselves at all.

