Mike Vizard: Hello and welcome to the latest edition of the Techstrong.ai video series. I’m your host, Mike Vizard. Today, we’re with Nima Negahban, who’s CEO for Kinetica, and we’re talking about the impact generative AI will have on real-time analytics. Nima, welcome to the show.
Nima Negahban: Hey, how are you? Thanks for having me.
Mike Vizard: I think we were just getting our arms around the whole notion of real-time analytics when generative AI suddenly popped up, so it feels like one and one is going to equal something that is four or five-plus, but walk us through how you see all this playing out for us because I think we understand the idea that there’s going to be some sort of natural language interface, but how fast is fast?
Nima Negahban: I think generative AI really plays two parts in the real-time space. One is acting as this co-pilot for users, where it now basically unlocks a whole new level of efficiency and insight for a much broader set of users, and the second part of that is it’s going to change the compute requirements of the analytic ecosystem in every enterprise because today a lot of that query workload is pre-planned, data engineered, pipelined. So, they can do a lot of things ahead of time to make sure that the queries are performant with generative AI in the mix, and this need to be able to do language to query in a wide variety of applications and use cases, those queries are going to be complicated.
You’re not going to have total control of what they look like. They’re going to be generated by the LLM, so you have limited control there, and you need them to be performant for the use cases to work because it’s either a user interacting with an interface or it might be two LLM agents talking to each other, and so you need those queries to run quickly. And so, there’s going to be just a much greater demand for compute intensive ad hoc query capability where you can have the assurance that I can run my most important data sets quickly and that they’re up-to-date and their results are taking into account the latest and greatest information.
Mike Vizard: And there’s a certain amount of heavy lifting with that. It’s just not a simple matter of, let me go launch a query against ChatGPT. I think as I understand it, there’s a lot of connective tissue here, starting with, I don’t know, vector databases and all kinds of other stuff that needs to get in front of that, so I can show an LLM the data that I’m currently working with to get some sort of relevant answer. Who’s going to build all that? Is that a data science team? Is it the data engineering team? Is it the analytics folks, or is it just something I go out and buy?
Nima Negahban: In the beginning, there’s going to be a lot of more bespoke efforts, but it’s going to go to a place where I think different companies are going to have solutions for allowing generative AI to leverage real-time data and have all in one capabilities. That’s what we’re working on at Kinetica, just a shameless plug, but that is something where you’ve got a lot of challenges. One is I’ve got these data streams that are high volume, how do I tokenize them, embed them, generate vectors for them and store all that in a database? How do I make sure my LLM understands potentially some eccentricities about my use case to understand how to generate the right queries or the right analytics as a response to natural language? How do I make sure that this is all happening in a synchronized pace where it’s able to take advantage of the most recent data?
That’s right now what is really being figured out, A lot of the generative stuff up to this date has been around static data sets. You’ve got a knowledge base of text documents. You do a one-time tokenization and embedding generation. You generate your vectors, you put that in, and then your large language model is pre-trained on some huge static dataset of English, and it’s going to be able to do a chat style interaction, but I think for the enterprise, you’re going to see a bigger need for, how do we take this capability and unlock it for large structured data sets and make that real-time? And so, that’s what we’re working on in Kinetica. There are a number of challenges that surround that, but I think for the enterprise, it’s going to be really an important piece of the generative AI story for them.
Mike Vizard: It seems like the building of the LLMs themselves are becoming more accessible.
Nima Negahban: Definitely.
Mike Vizard: One of the paradoxes of this whole thing is ChatGPT took a massive amount of public data and spent years building out this whole LLM, but it turns out if you narrowly focus the LLM on a subset of data, you might get more accurate answers and you’re prone to less hallucinations. So, do you think we’re going to see this massive proliferation of LLMs?
Nima Negahban: Yeah, I think it’s already happening, and there’s a lot of LLM frameworks that are coming out, like NVIDIA has also put out the NeMo framework for leveraging their GPUs and their cloud platform, their NeMo cloud platform that’s coming out, and I think it’s definitely going to explode. I think there’s also new tuning methods where obviously there’s the pre-training, which can take a ton of compute and data and take a lot of resource, but once you have that, we all know about fine-tuning, but there’s also new methods around P-tuning that can also do some really impressive improvements in LLM performance for specific use cases. So, there’s a lot of innovation on the training and tuning front. I absolutely agree that there’s going to be a huge proliferation of LLMs. We all know about Hugging Face and that ecosystem, and there’s just a huge proliferation of focused generative approaches, whether it’s an LLM or just an encoder or a decoder.
There’s focused LLMs around healthcare and biology, and there’s going to be a ton more, and really part of what we’re working on is, how do you help enterprise who are trying to build maybe focused LLMs against their data sets, how do you do that in a way that makes sense for the modern ecosystem where there’s a huge proliferation of different training environments and software frameworks for training and where that training framework runs, whether it runs on AWS SageMaker or runs on-prem or runs NVIDIA cloud? So, there’s a lot of hashing out to be done and opportunity to make that whole workflow and tool chain just easier because right now you’re almost overwhelmed with all of the options you have because there’s so much happening right now.
Mike Vizard: I think a lot of folks have it in their head that you ask something like ChatGPT a question and it comes back with an answer, but how is all this going to get embedded into workflows? Because a lot of this stuff, as I understand it, you’re going to create some sort of ongoing series of queries and the results of that might drive a process, and that may be moving faster than we humans can comprehend. So, where will the human interaction be in those workflows and how do we figure all that out?
Nima Negahban: So, the first iteration of the LLM interaction and application was kind that chat interface, but I think the bigger, longer term value workflow and use case is this LLM agent or chain of thought use case where you’ve got LLMs doing a multi-step workflow, which may or may not involve users. The user may give it a task, and then that agent is going to go start a multi-step workflow that may incorporate other agents, may incorporate data processing systems. And I think for the enterprise, beyond the knowledge base chatbot, that’s the Hello, World use case of generative AI that a lot of enterprises are making sure they can do right now, that’s the kind of simple co-pilot stuff.
The next phase is, how do we do this chain of thought advanced workflows where LLMs can independently do multi-step workflows and do some decisioning and incorporation of other knowledge bases and other LLMs to complete a task that’s more complicated. I wouldn’t say that it’s something where it’s going to be a human out of the loop for complex tasks, but I think there’s going to be a focus on, how do we optimize complex tasks for human workflows where the LLM can really help… Not the LLM, but the generative AI workflow can really help quite a bit?
Mike Vizard: How do we govern all that? As [inaudible 00:09:30] once said, “It’s one thing to be wrong, it’s another thing to be wrong at scale.” So, how do we kind of put some guardrails around this thing?
Nima Negahban: It kind of goes back to… The challenge there is there’s such a proliferation of tools and access to data that I think right now it’s difficult because not like there’s one training framework that everyone’s using that you could put guardrails or a guardrail system in that, that developers or data scientists could use, but there are generative AI frameworks that are incorporating that now, like NVIDIA’s…
I forget, I think it’s called Guardrails, I think, but they’re incorporating that capability where you can have some fencing and some governance capability, and I think there it’s early days, but you’re going to have to put it in kind of up and down the tool chain, I think, where even your training framework is going to have to have some governance and orchestration, so that you can have an audit log to understand how this LLM was trained, what its decisioning was, what guardrails were put in place at that level versus inference time. So, there’s a lot to be figured out and it’s going to take… Even before generative AI and LLMs, people were grappling with that, with kind of the world of the AI models beforehand in neural networks. Neural networks inherently, it’s tough to understand sometimes why they inferred what they inferred. So, it was a challenge before, it’s just more of a challenge now, honestly.
Mike Vizard: Where is all this going to take place? And I ask the question because if I look back in time, we had a lot of batching and processing, we moved that up to the cloud. We might train the AI model there and create the inference engine out to the edge, and there’s a lot of lag time in that, which kind of precludes real-time. So, are we going to get to the point where the AI model is learning at the edge where the data is being created and consumed, and how far away is that?
Nima Negahban: That already happens a little bit today with having a vector database that’s continuously being added to where it can essentially, as an interaction comes on, they can do a vector search against that database to pull things to hydrate the context with. So, I think that’s already happening. I think there’s definitely going to be a place where… The amount of improvement you can do with just that approach is limited. So, there’s probably going to be frameworks out there that start to help do… You can’t do really fine-tuning and P-tuning at the edge because it is so compute intensive. So at least right now, that’s kind of the limit of the real-time improvement where you can add to your database, your vector store, and the agent can immediately take advantage of that. As far as fine-tuning and P-tuning, I think as the compute gets better and as these model frameworks get more efficient, then that can start to get pulled down and get to… I wouldn’t say real-time, but in maybe a micro batch style interaction, and that might be five years from now.
Mike Vizard: How democratized is all this going to get? Historically, we’ve had everything from back in the day SQL and spreadsheet jockeys to then we hired business analysts and data analysts to drive all this. At some point, does that layer still need to exist between the business user and the data or will we with natural language eliminate that requirement, and maybe we still need those folks to do other things, but I’ll just interact with the data directly?
Nima Negahban: I think there’s going to be a swath of users that are now immediately enabled to have interactions with their data directly using these co-pilot style approaches where it will take care of 90% of their need, and then for the most advanced 10%, they may need to talk to their data engineering team or their SQL analyst team, and a lot of that also precludes on the infrastructure they have beyond the LLM. The co-pilot will generate you the query you need, but if you don’t have the right infrastructure to run that query, then you’re still stuck. So, I do think that in this kind of co-pilot use case, there is going to be a tremendous enabling of productivity where people can interact much more directly with their data, get insight much more quickly from their data without having to bring in teams of people, and I think that’s something that we’re really focused on at Kinetica.
Mike Vizard: What’s your best advice to folks about how to get started? I think a lot of folks are looking at all of this and the pace of change and they’re a little overwhelmed and they’ve kind of got that deer in the headlights look about them. So, how do they get down the path to, I need hire a bunch of data scientists and start experimenting, or is there some other way to think about this?
Nima Negahban: It kind of depends on what you want to do. If you want to explore the generative AI space just in itself, there’s tons of frameworks out there that you can start to learn that can help you build your own LLM pretty quickly and much more simply than just even two years ago. As far as leveraging prebuilt stuff, there’s tons of platforms that are really pushing on that. I know obviously Kinetica, we’re working at that in our cloud platform, but NVIDIA, Databricks, Azure, they’re all coming out and AWS all coming out with generative AI, LLM platforms that let you do more sophisticated workflows leveraging their generative AI services and infrastructure. So, there’s just a lot happening right now, and there is a race to set up AI infrastructure, but there’s also a lot of, well, we haven’t really decided what it really needs to look like or what it needs to be. It’s kind of being figured out, building the plan as we go.
Mike Vizard: Speaking of that though, we used to tell people to invest in data analytics to get a competitive edge, and I can’t help but wonder if we’re now on the other edge of that curve where frankly, if you’re not investing in this stuff now you might not be around to see how it all turns out because everybody’s making the same investments. And so, are we involved in an AI arms race now?
Nima Negahban: Definitely there’s an arms race mentality going on. Data and analytics are definitely still very important, and I see generative AI as this kind of great enabler of unlocking more value out of data and analytics. So without the data, without the analytics, generative AI still is not going to really be able to do much for an enterprise. You need to have the data infrastructure, the analytic infrastructure to really take an AI capability and really have it drive value for your enterprise.
So, they kind of live together hand-in-hand. At the end of the day, there’s definitely going to be a need to understand how this can help your enterprise, whether it’s on understanding your data side or helping your business run more efficiently, or actually just helping your folks that are in the field be more productive. There’s just a lot of different dimensions to the opportunity and a lot of the focus right now and what I see out there has been focusing on helping people in the field be more productive, like this kind of co-pilot use case has been, I think the kind of first focus area, at least from the enterprise perspective, but I think the next phase is more of, how do we leverage this to understand our business better, generate insight faster? And that part is going to be hand-in-hand with the data and analytics infrastructure that exists in an enterprise.
Mike Vizard: All right, folks, you heard it here. We need to make better decisions faster. One without the other is kind of good, but without both, you’re probably not getting the full value out of that investment. Nima, thanks for being on the show.
Nima Negahban: Thank you.
Mike Vizard: Thank you all for watching the latest episode of the Techstrong.ai video series. You can find this one and others on our website. We invite you to check them all out. Until then, we’ll see you next time.