Mike Vizard: Hello, and welcome to the latest edition of the Techstrong.ai video series. I’m your host, Mike Vizard. Today, we’re with Christoph Börner, who is the senior director for digital at Cyara, and we’re talking about what are the misconceptions around generative AI, because well, it’s either the greatest thing since sliced bread or the most over-hyped technology that ever came down the pike. And it’s probably somewhere in the middle of that. Christoph, welcome to the show.
Christoph Börner: Hi, Mike. Thanks for having me. Hi to everyone out on the call.
Mike Vizard: I think everyone has at the very least, played with some generative AI tool, whether it’s open AI or some other flavor that somebody else is offering somewhere. And I guess, my question to you is, as a technology professional and you read and see everything that’s going on and what people are talking about, what’s your general impression of where we are with generative AI? Is the reality going to match the hype or are we a little bit on the extreme of this conversation at the moment?
Christoph Börner: Well, it definitely generated a huge hype that we feel not only in the chatbot domain where we with Cyara are very present, we feel it everywhere in the entire machine learning AI universe and so on. Yeah. Look, the conferences I went to in the last three or four months, it’s the number one topic everywhere. And really the question marks, “Will we keep on building bots the traditional way like we did it in the past? Or will generative AI take over everything?”
I’m personally honestly a huge fan of this technology. It can improve things a lot but there are several risks that are simply introduced by large language models and generative AI, are simply due to the way how they work. And we all have seen this stuff, these deep fakes, these videos of … I don’t know, Obama saying something that he would never say. This is simply there. And stuff like truthfulness and the courtesy, we have all heard about these hallucinations. We saw what happened when Google introduced Bard, how quickly things can go wrong. And I think this is a huge issue for our society because this generative AIs are producing answers and responses in a charming and very convincing way. And a lot of people simply will take it for gospel. And that’s a huge issue that we will have to work on in the future. And apart from this, everyone can imagine copyright is an issue, yeah. Large language models are usually trained on sources like the internet, Wikipedia.
I remember in February or something on a conference, we discussed about the Amazon Kindle store being flooded by books written by ChatGPT at the end of the day. And of course it can happen that you have their texts that are more or less completely copied somehow from the internet. And so copyright, huge issue. One thing that I really fear is a huge increase in bias. Ask an AI, ask a large language model, “Name 10 great leaders in Northern America.”
It will be 10 men, there will be no woman. And this is again, simply due to the way how these things are working, how they’re trained, there are petabytes of training data behind them usually. And there’s simply more data around men on the internet than on women. These are things we really have to deal with. And then there’s misuse. Imagine a large language model being not trained on the public internet, imagine one being trained on the dark net. That could create huge issues for us. And then finally of course, we also have to think about a certain risk of unemployment. Yeah, technology, a computer also removed some jobs but I think we all agree also created new ones. The same will happen with large language models. But coming back to your main question, I’m a huge fan. I think it’s a big chance for all of us.
Mike Vizard: Will organizations have to build their own large language models using their own data to get the right experience? Because it seems like a lot of the general purpose platforms out there have some significant flaws in them that are really going to disrupt the customer experience if they actually wind up not being caught early enough in the process.
Christoph Börner: Yeah. Absolutely, absolutely. I talked to some engineers of Google last week and they made a super nice comparison of using the generic version of a large language model. Now, they compared it to training a dog and fits very good for me because I have a golden retriever puppy. It’s pretty easy to train him the stuff like sit and come and wait and so on, yeah. And this will be good for most of the dogs out there but we all know that there are special dogs, special training dogs, hunting dogs, guide dogs and so on. And those dogs need a special training. And this is actually what I tell our clients when they come back with, “Hey, we have seen this generative AIs and large language models. Can we just throw away everything we did in the past to build our bots and then just use GPT4 or something in the background and it will work for all seamlessly?”
No, it won’t. These pre-trained large language models are built to solve common language problems, yeah. They’re super good in those things, in basic Q and A, in document summarization, and they can generate text in general across all … Or at least across a lot of various industries, yeah. But when it comes to your risky business cases, you don’t want to have answers being given on some data that flies around on the internet. And a tangible example would be you are a big healthcare provider and your virtual engine is powering your customer support channels. You can route everything that is small talk or collecting data from clients and so on through a large language model, through generative AI, but when a client asks, “How many pills of aspirin can I take a day to not die?”
You don’t want to have an answer that could be not accurate enough or that could be a hallucination. And by the way, typically issue would be here missing context. If I ask, “How many pills of aspirin can I take?”
I might get back something like six because this is maybe what a grownup human being can take, but it could be also a kid asking in this moment or a lady with 40 kilograms or something, where the answer should be completely different. And therefore, what I see at the moment is the so-called Few-Shot model that works pretty good. Few-Shot means you take this generic pre-trained version of an LLM and you fine tune it for your own purpose, meaning you fine tune it for your risky business cases. And this works out pretty good because due to the nature how these LLMs are built, they’re just super good in small talk, in keeping the context and so on yeah, but not to answer these high risk questions that you have, versus this dream of the industry, let’s call it the Zero-shot model where you do absolutely no training on your own, no fine tuning, everything works just seamlessly. I don’t see this happening at the moment, maybe in the future but at the moment this simply doesn’t work, due to these hallucinations and these risks we were talking about before.
Mike Vizard: Do you think as we go along, it will be easier to build LLMs based on a smaller amount of data sets and we might wind up with more accurate generative AI platforms using a narrower range of data?
Christoph Börner: Yeah, I totally think so. I think there will be in future a lot of … Let’s call it smaller subsets of these LMSs. Alone already from the fact that if you take something like GPT4 or Lambda or Bloom or whatever, they’re just huge, it’s petabytes of data. And this introduces first of all, high costs. If I think about our big clients who have a few hundred thousand conversations in their bots, some of them every day to be honest yeah, routing all of that through GPT4 will just cost a lot of money to be honest. And the other thing is it’s super resource intensive if you have petabytes of data there and therefore, in application where you need realtime answers, realtime feedback, it’s also a challenge to use these overall big models.
And therefore, I really see something like domain specific LLMs in future, something that is pre-trained for telecommunication, for finance, for insurance or let’s say for customer support overall. And that will be way much smaller and will be fine tuned already for your domain. And yeah, I still believe that you have to do in addition, some fine tuning, but it will reduce the amount of work you have to do by a lot. And actually, if we look at the big auto building platforms, the big NLP engines and so on, everyone introduced in the meanwhile a feature going into this direction, yeah. We have generative AI included, push one button and it will give you some pre-trained model.
Mike Vizard: What do you think ultimately the customer experience is going to be then when we have these generative AI capabilities? Is it going to be just everything is going to be a natural language summarization? Or am I going to want to engage with something that looks more like a human? And where will the humans be in this process?
Christoph Börner: I think it will highly depend on the quality assurance. This is actually something where the Google spokesman played a lot into our cards at Cyara. We are a seeks assurance platform for IVR systems, for agent-based phone systems, and especially also for conversational AI. And after Bard, and I guess all of our listeners have seen this in production, the answer was not super accurate, let’s call it like this to phrase it in a nice way. And then in other tweet, it was simply wrong and market immediately reacted, Alphabet shares going down 6% or something, which I think means 150 billion dollars or something. Market immediately lost trust there. And on the next day, the Google spokesman said something very important. He said, “It’s a great technology and a great tool that we have here but we will need rigorous testing. We will need to make sure that these things come back with the right answers and the high accuracy.”
And this will highly depend on the CX that odds being built with LLMs and generative AI will deliver. If it’s then more natural language based or less, this highly depends actually on the use case. We have clients where you are in a very strict and serious domain. I just tested a bot for a car brand, and they have this SOS system, you have an accident, the car immediately calls the support center and the conversational AI picks up there the phone. And this is a situation, imagine you just had an accident, maybe you are hurt or your family in the car is hurt or whatever, you don’t want to have a lot of natural language, small talk or whatever, yeah. You want something that is very straightforward, very strict, no joking, straight conversation. And might be a situation where you want to speak to a human being instead of a machine in such a critical situation of your life. This would be a case where a human handover would be very important to work seamlessly. To answer your question, I don’t think that humans will be gone. Now, there will be always cases that are still too complex and we’re going back to this famous 80/20 rule in information technology in general. Aim for the 80% because the last 20% will cost you way much more than the other 80%.
And that’s actually something I see a lot, or one of the big issues our clients have, they’re trying to make the bots to do too many things at the same time, yeah. If you cover your top 10 intents, usually you cover already 80% of your incoming requests. And the stuff that is super complex lets the AI root it to a human being. And because to answer all of this completely automated, it’ll cost you too much and conversational AI is not there to get rid of human beings or support centers and so on, it’s there to support them to scale and to make their life better, yeah. When we started to test bots back in 2018, our first client was actually a big telecom company. And I went there to see how the support center is working, and I was sitting next to a young lady and she had to answer … I think I stopped counting at 500 or something. She had to answer 500 times a day, “Is the new iPhone … ”
I don’t know, it was six or seven or something back in those days, ” … in the store available?”
It was completely insane, a job you don’t want to have, yeah. And this is something that a bot can answer so easily, so quickly. It can answer it on scale. And with generative AI, it can also answer it in a charming way using actual language.
Mike Vizard: What’s your best advice then for folks who are thinking about playing with this and implementing it? Because we have seen a lot of implementations of bots over the years that didn’t necessarily result in a better customer experience. How do we not make the same mistakes over again?
Christoph Börner: Yeah, my biggest advice is don’t trust them. It sounds weird because I said before I’m a big fan, but that’s the thing, yeah. Your critical business cases have to be tested, you want them to work. And if you’re not the healthcare provider, if you are a bank, well then your critical business cases may be, “I want to lock my credit card because it was stolen.”
This is something that has to work and you don’t want to trust simply a trained model that might be not accurate enough. And therefore, my advice is always testing, testing, testing. Like the Google spokesman said, “We need rigorous testing in place to make sure that LLMs and generative AI are doing what we want them to do.”
And apart from that, I’m clearly a fan of this Few-Shot model, Zero-shot is simply not working out. These things are just too big, too wide. And also in terms of testing, our clients are coming in and asking me, “If we use Zero-shot, can you test it?”
Well, the answer is it’s a bit like testing the internet, everyone can imagine that’s not so easy to test the entire internet, yeah. But what we can do is we can test your critical business cases. We are implementing at the moment something that we call a fact checker for example, where the healthcare provider could upload terabytes of documentation for all the pharmacies they offer, and we would simply generate again, using an LLM out of this data, test cases, and send it to the bot and check if the generative AI comes back with the right answers. These things we can really support. We are also working on a bias testing functionality, so we can tell clients, “Your bot that is leveraging a large language model behind is biased or not.”
These things. And at the end of the day again, it’s testing, testing, testing. I’m still shocked when I hear from clients when I’m asking the question, “How do you test your conversational AI?”
“Well, we have one guy sitting somewhere a week that sends a few text messages to the bot.”
Yeah, that’s the wrong way to do it but you can see it in the quality and production.
Mike Vizard: All right, folks. Well, even in the age of AI, trust is hard to win and easily lost, so don’t forget that. Christoph, thanks for being on the show.
Christoph Börner: It was a big pleasure for me. Thanks for having me, Mike.
Mike Vizard: And thank you all for watching the latest episode of Techstrong.AI video series. I’m your host, Mike Vizard. You can find this and other episodes on the Techstrong.ai website, we invite you to check them all out. Until then, we’ll see you next time.