AI Leadership Insights: AI Hallucinations

Synopsis: In this AI Leadership Insights video interview, Mike Vizard speaks with Sujay Rao, chief product officer of Sirion, about hallucinations in AI.

Mike Vizard: Hello and welcome to the latest edition of the Textstrong.ai video series. I’m your host, Mike Vizard. Today we’re with Sujay Rao, who is chief product officer for Sirion, and we’re talking about, well, hallucinations in the land of AI. It turns out there’s plenty of them and we’re not quite sure what to do about it. Hey, Sujay, welcome to the show.

Sujay Rao: Thank you for inviting me here, Mike. Glad to be here.

Mike Vizard: So when a machine is wrong, it’s called the hallucination, and when you and I are wrong, we’re just flat out lying. So the question is, how do we approach this in a way that can limit this activity that we’re seeing amongst all these different large language models out there? Is it avoidable or is it just going to be ultimately the cost of doing AI business?

Sujay Rao: Great question and brand new term, right? AI hallucinations. Nobody ever had dreamt of this maybe a year ago, max. It’s now suddenly become common for everybody to talk about AI hallucinations. But it has existed even prior to LLMs coming on board, it’s just that the large language models have amplified some challenges with AI because now it’s in the hands of every consumer. And what has happened is that when ChatGPT came on and everybody started using ChatGPT and asking all kinds of interesting questions or quirky questions, a lot of the answers were right and a lot of the answers were actually wrong. So that’s when consumers started questioning, “Hey, how much of this is right? How much of this is wrong?”
Consumers, Mike, when they’re using it for, say, questions like, “Give me a recipe for baking a cake.” It’s okay if the chatbot comes back and gives you an answer, maybe, that’s a little off you thought, but that is certainly not something that your enterprises are going to be okay with. For enterprises, reliability, accuracy end up becoming super important.
So what has happened in the last 9 months to a year has been enterprises coming in and then thinking about, “Well, what do I do if there’s questions that are coming back, rather, answers that are coming back that seem to be off mark, how do we actually handle it?” So now it’s become a big movement in itself in terms of how do you take these large language models and make it hallucinate less? You’re probably you’re going to start hearing things about AI lifecycle governance, data model governance. So there’s quite a few techniques in there that these LLM providers have started to come up with that can actually mitigate these hallucinations.

Mike Vizard: So are they always going to be there? Because some folks I talked to are saying, “Well, we’ll just build an LLM on a narrower base of data and we’ll make sure all that data is vetted so there won’t be these hallucinations.” So is part of the effort maybe just to build a better LLM that’s domain-specific? Or are we always going to be dealing with some level of governance requirement?

Sujay Rao: I think there’s going to be some level of governance requirement no matter what, but let’s first dive into exactly what you said. What is it that we say when we say we’re going to look at a more curated dataset? Let’s actually look at that. Let’s actually take a step back and then think about out your favorite LLM. It doesn’t matter if it’s from Google or Meta or OpenAI. All of them have been indexed on your internet data. It’s called the pile, several petabytes of data, and these LLMs have been indexed based on that dataset, and that dataset, no surprises there, has all kinds of data in it. It has fiction in it, it has sarcasm in it, it has humor in it, there’s bias in it. All kinds of data has actually made it into that pile.
So what enterprise-specific AI players have started to do is actually now say, “Hang on a second. We don’t really need all of that data with the bias that comes in, with the risk that comes in for my use case. Let us go and then build out an LLM that is very focused on a data set for my domain.” And there’s actually good public data out in the domain in the internet, if I can say that.
For example, I work for Sirion. We are a contract lifecycle management company and we also use LLMs in our own product suite so users can come in and ask questions regarding a contract. Let’s just take that example for a second. There’s a lot of good publicly available contract data that actually could end up forming a good dataset. Let’s take another example. Think about, let’s just say you’re a company in the financial services space, there’s a lot of good publicly available financial data, for example, SEC, right? So if you were to take that dataset, the chances of the AI model hallucinating reduces. I’m not saying it’s going to be completely wiped out, but if you can actually look at these LLMs that have been indexed based on a good dataset with the right kind of data model governance, I think you have a better shot at reducing your AI hallucinations.

Mike Vizard: It seems like, and maybe I’m hallucinating, but I feel like a lot of the general purpose LLMs are becoming less accurate over time because there’s more people using it, and the prompts, the quality thereof starts to drop and the models are learning from the prompts. And so are we seeing something that feels like a law of diminishing returns?

Sujay Rao: Yeah, interesting how you put it, because as more users start to give different kinds of prompts, I think there’s that cycle where your LLMs are starting to feed off of some of these prompts and starting to produce worse and worse answers over time. That’s where governance really comes into play. But I would like to tell you this. There are absolutely ways enterprise AI companies that have already understood that this is a problem.
Look, let’s just take a step back. What are LLMs really good at? I would like to simplify it and say they’re great at sentence completion. That’s kind of where it really started. I mean, at the end of the day, at its very core, these LLMs are based on neural networks and deep learning, and they have been indexed and they have identified patterns in terms of, if I give the first three words in a sentence, it’ll fill in the blanks. They’re really good at that. So as long as you’re able to feed in the right kind of data sets to these LLMs, I think we will actually have a much better outcome over the next few months to the next couple of years.

Mike Vizard: We talk about governance, but is security a subset of governance, because we hear a lot of folks talking about the poisoning of the data models and all kinds of things? So how do we make sure that what’s going into those models is something that we actually want in those models?

Sujay Rao: Such a great question, and it’s a real challenge for every [inaudible 00:08:12] out there. Let’s just say I was an enterprise. Would I be okay with my own data, my company data, being used to train a data model that can be used by the public? I would tell you that the answer is no. Would I be okay with my own data being used by an LLM for my own use cases? A hundred percent. So therein lies the walling off of company private data. There are companies out there that are starting to take that to heart, and there are techniques out there, one of them being what is called a retrieval augmented generation, RAG, where you can make sure that your company’s data sits in your own private data store, wherever that is, and it is being used to augment the answers being given by the LLM as opposed to your data certainly now becoming visible to all, or honestly even updating the parameters of your base foundational models, which are your LLMs.

Mike Vizard: How do we prevent our intellectual property from inadvertently being included in some sort of AI model out there? I know there are settings, but they’re not the default. Is there a set of best practices that organizations need to think through as well?

Sujay Rao: There are best practices. Some of it are best practices that have been around for a long time, and some of it is coming down to newer best practices when it comes to data governance, so let’s talk about both. Let’s talk about good security and privacy, cybersecurity and private and confidentiality practices that have existed for a long time with us and amongst us. I would tell you this. As long as companies are being careful with how their data is being used to fine-tune a model or prompt engineer a model, and if that data doesn’t leave the boundaries of your security parameters, you are already off to a great start. The second level of governance and security comes into play when we are now thinking about what actually is the data set that actually is being used to train your models. So as long as you are aware that your data is not being used to update the parameters of the base model where everybody else also has access to that base foundational model, you’re then pretty much covered in terms of making sure that your data is secure and private.

Mike Vizard: Will I ultimately need an AI model to keep track of my AI models to make sure I’m governing these things correctly? It kind of seems like it’s going to head that way eventually.

Sujay Rao: It’s actually already happening. You have LLMs that are now validating other LMSs. So, as crazy as it sounds, it’s already happening, and I would actually see that becoming more and more prevalent. Because look at the challenge that we have. The challenge that we have is that for you to validate the answers that are coming out of an LLM, you have to look at both qualitative and quantitative techniques. So the quantitative are the easy ones. Those are ones that people in data science have been familiar with in the past. You are able to look at your recall scores, your precision scores. Those are things that are well-accepted and well-used in the industry.
But the RLHF, that is the new hit on the block. Reinforcement learning based on human feedback. These LLMs are now so capable in so many use cases, in so many areas that all of your large foundational models, all the big tech companies out there that are pushing out these models have to be spending more time to make sure that there’s some kind of a human in the loop that can actually validate at least a sample set of these answers. And that’s not going to be easy.

Mike Vizard: I think in terms of technologies, generative AI went up the hype cycle faster than anything I’ve seen in a long time. But does that also mean it’s going to come down on the other side of the trough of disillusionment just as fast?

Sujay Rao: That’s such a great question. If you are asking if there’s going to be a pop, a bursting of the hype cycle that we’ve seen in generative AI, I would tell you that there’s a lot of goodness that’s already come out of LLMs, and some of those strong use cases are here to stay. For example, give it a document and ask it to summarize that document or ask a question and it comes back with answers. Those are things that now feel so easy for us that it’ll be forever hard for us to think about how we would live our lives without actually those kinds of capabilities. So I think that’ll stay.
Where I think the industry is already starting to see the limits of how much you can stretch an LLM, I think there have been some good use cases. For example, let’s now just get a little bit into the technical aspects of the LLMs, what’s the context size? How big of a document or how big of a text can an LLM process? How quickly can it process it to give you accurate answers? Those are clearly things that the LLMs are not able to jump the hurdle yet. So there’s a lot of pre-processing and post-processing that’s already being done to actually help LLMs give you the right kind of answers with these large volumes of text. So we are already starting to see some constraints and limits being put into place that I think will need further solutioning before we can actually really start to see the benefits of LLM. So my answer, in a nutshell, is, I think it’s a 50/50. There’s clearly areas where LLMs are here to stay and there’s clearly areas where a lot more work needs to be done.

Mike Vizard: All right, folks. Well, you heard it here. When it comes to hallucination, the one thing you want to remember is, well, it’s one thing to be wrong. It’s quite another to be wrong at scale. Hey, Sujay, thanks for being on the show.

Sujay Rao: You got it. Thank you, Mike. Thanks for inviting me.

Mike Vizard: Thank you all for watching the latest episode of the Techstrong.ai video series. You can find this episode and others on our website. We invite you to check them all out. Until then, we’ll see you next time.

AI Leadership Insights: AI Hallucinations

SHARE THIS STORY

FOLLOW US

AI Leadership Insights: AI Hallucinations

TECHSTRONG TV

TECHSTRONG AI PODCAST

SHARE THIS STORY

RELATED STORIES:

FOLLOW US

NEWSLETTER SIGN UP