
Enterprises dabbling in artificial intelligence (AI) have run back into a problem that has survived in IT through three decades – data gravity. Based on the Newtonian principle of gravity that states that more mass an object has, more powerfully it can pull other objects towards it, data gravity is the tendency of large datasets to attract those of their kind towards them.
It is the reason why moving data between disparate platforms has become such a painful and exorbitant chore. Moving large chunks of data for processing is fraught with latency and throughput challenges, and is in fact, according to many experts, counterintuitive considering that movement takes time and AI applications need access to data in near real-time.
Kamiwaza at the AI Field Day event last month explained how it is helping companies side-step the acute problem of data gravity in AI with two of its constituent technology – Inference Mesh and Locality Aware data Engine.
“Our customers are typically in the petabyte, exabyte scale arena and this [technology] allows the inferencing to be taken to the next level,” said Luke Norris, CEO and founder of Kamiwaza. “You truly can deploy AI anywhere – on-prem, cloud, core, edge, everywhere.”
AI applications, in order to process a single request, run scores of inference cycles at the backend. For a simple chatbot with retrieval augmented generation (RAG), it’s somewhere between 5 to 30 inference runs per request.
For agentic AI however, which is fully autonomous and therefore, more complex, the work is staggeringly more. “Real agents that run real autonomous actions doing real data processing can get into 10,000 plus inferences just to accomplish a simple request,” Norris explained at the presentation.
A simple agentic request, he said, can kick off massive amounts of compute and masses of data through the phases of data ingestion, copying, reading, etc., he said. “This is why we come up with the concept of data has gravity.”
In an increasingly hybridized IT environment where data is distributed across public and private infrastructures, a large language model (LLM) has to go through multiple touch points to collate the data required to process one single request.
“It’s nearly impossible for the enterprise to stitch all that together or to move all of that data into one location where the inference engine can actually attach to it and run that.”
And this, according to Kamiwaza, is one of the biggest reasons why so many generative AI (GenAI) projects never move past the pilot stage. “Through our research that we ran for about a year and a half, we saw this as the number one impediment for the enterprise adoption in general, especially in agentic resources,” he highlighted.
Norris explained that the Kamiwaza Inference Mesh and Local Data Engine provide “the ability for you to move inference load based off of the underlying hardware and the models that the individual stacks are running on, and then add the third component which is the data.”
Here’s how it works in a nutshell. When an inference request is made, it is routed to the Kamiwaza stack – which is delivered via a Docker container – in the cloud. This is where the RAG process happens.
“We add layers of metadata and put those into our own global catalog service.” This catalogue tracks all the metadata across all of the locations.
“The global catalog service is then shared between two clusters. The individual clusters know not only what data they could actually process in the RAG, but they have the affinity of the location to the actual Docker container that did the processing.”
For an inference request that involves data living in multiple locations, typically companies would have to haul big datasets from location to location to allow the processing to happen in one place. This costs time and money.
Kamiwaza leverages its two features to cut down data movement. Data, depending on where it lives, gets processed locally and only the inference results are passed off to the first location. This ensures that data is not moved “beyond the firewalls of the enterprise” or at least not “too much” of it is moved.
“In doing this we’ve effectively split the data gravity paradigm because we’ve processed all the data local enough and only sent back the actual result tokens,” Norris emphasized.
These tokens, he said, are only “very very finite amounts of data” that could be sent even over a dial-up line.