AI Leadership Insights: AI Voice Technology

Synopsis: In this AI Leadership Insights video interview, Amanda Razani speaks with Yan Zhang, COO of PolyAI, about AI voice technology and its role in customer service and beyond.

Amanda Razani: Hello, I’m Amanda Razani with Techstrong.ai, and I’m excited to be here today with Yan Zhang. He is the COO of PolyAI. How are you doing today?

Yan Zhang: I’m doing great, Amanda.

Amanda Razani: Wonderful. So I believe we first met at the VOICE & AI Conference, and there was a lot going on at that conference, but first before we get started, can you share with our audience a little bit about PolyAI and what services you provide?

Yan Zhang: Sure. So PolyAI makes enterprise voice assistants and automate phone calls largely for customer service and guest care. We were founded in 2017, we are based in London, but have about 150 people between the UK and the US. Really about PolyAI, there’s three things that we’re all about. So we’re a conversational AI company. Most people know conversational AI through chatbots, so they’ve probably interacted with chatbots. There are thousands of chatbots in the world. We’re one of the few companies that really specialize in voice, voice automation. Voice automation is a lot more difficult than chat automation just because we speak in a way that’s more complicated than when we type. So obviously some of us speak with accents, some of us use a lot of words to say what we mean. Some of us use just very few words. Some of us get into the middle of a sentence and they forget something or they change their mind. So they’re conversational U-turns.
And for all of these reasons, the way we speak presents a much larger challenge for AI and to understand what people mean. And so we’ve taken on that challenge and we work with some of the leading brands of the world, providing enterprise voice assistants to them.
The second thing is that we’re all about the enterprise. So we work with companies like FedEx, Marriott, Volkswagen, the City of Amsterdam, to automate conversations that otherwise would put the user on hold for a long time to wait to speak to a human. And I think a lot of people listening and probably have had experiences calling into automated voice systems, to be honest, it’s not a very good experience for most people. A lot of people here tell me in a few words what you’re calling about today, and they all they want to do is press zero and say, “I want to want to speak to a human.”
And that’s because when we call, and we choose the voice channel, we often have the desire to explain the problem in our own terms with our own words, explain context. And so when that greeting is restricting us to speak only in words that the system understands, it is really disappointing, which is why we want to speak to a human. And so table stakes for us is to open up every conversation with how can I help, in an open bid to get people to say whatever it is that they want. And you’ve got to pick up what people are throwing at you, so we have to understand the customer turn after turn until we get a conversation sorted for the customer.

Amanda Razani: Absolutely. So at the VOICE & AI Conference, what were some of the key topics that were discussed? What were some of the main reasons that business leaders were there?

Yan Zhang: So as you can imagine this year, it’s all about generative AI and the type of influence that generative AI will have in the enterprise. I think that there is a tremendous amount of excitement. There’s a lot of board mandates or executive level mandates for transformation teams, innovation teams and customer experience teams to experiment with generative AI. I think everyone has a little bit … There’s a lot of hype, there’s a lot of fear of missing out, there’s a lot of FOMO. And so everyone’s excited, but there’s also a lot of trepidation as well. I mean, it is a technology that everyone has experienced their ChatGPT, so you know that it is incredibly smart for certain instances. If you ask ChatGPT to write you a fairytale about a family of foxes living in space in a certain literary style, it will do just that. But then it also gets really basic things wrong as well.
And so we all know how ChatGPT can hallucinate answers as confidently as it ventures a correct one. And one example that I can give is that we have experimented with generative AI with one of our hotel properties. And one of the things that generative AI is really good at is answering long tail questions. So those not so frequently asked questions that in a traditional automation system, you wouldn’t necessarily program the bot to answer. And so this particular bot for this hotel could answer all sorts of questions about the parking garage, which is a very minor part of the hotel. So how tall the parking clearance is, what kind of vehicles can fit underneath, the day rates, the evening rates. But there’s only one problem, which is that that particular hotel didn’t have a parking garage at all. And so it is really important, it raises our awareness that it’s really, really important to make sure that you not only train automated systems, talk about what they do know about, but also for them to not engage in answers where they know nothing about.

Amanda Razani: So from your experience in dealing with business leaders, what are some of the big roadblocks that they have when they’re trying to implement this technology and speaking with some of those problems you just addressed? And what advice do you have for them?

Yan Zhang: Yeah, I think generally, I’ll talk generally about AI and I’ll talk a little bit about … So I’ll talk a little bit about the technology that’s been on the market for the last five years, and then I’ll talk about the technology that’s been around for the last year or so. So in general, in AI for natural language understanding models, traditionally when enterprises have wanted to build either chatbots or voice assistants, they’ve had to leverage a lot of their data to do it. And the problem with a lot of enterprises is that one, there are regulatory issues that come up when you’re trying to use customer data, your past data to train. And a lot of our customers just don’t have that data. They never transcribed the conversations that they had with their customers, or it would be very, very costly to get that data in place.
And so what’s really important is that you have to make something work with very little data, and that is actually a very hard ask for the technology point of view, and that is what PolyAI is particularly good at. We can solve the cold start problem for a lot of enterprises, which is that we don’t need any of your data to make sure that we can build a bot that mimics the behavior of your best agents. And that’s because our models are data efficient.
I think that in the past year where generative has really upscaled the capabilities of bots, the challenge really has been how do you build the guardrails so that the experience is controllable? A lot of people talk about safety. I think that’s table stakes. You can’t have your bot talk about things that are off-brand or that are irrelevant to the experience that you want to provide to the customer. But at the same time, it’s also about curating … Automated agents are an extension of your brand. And so it’s really important for the enterprise to be able to control that experience. And I think that there’s been not as much control, in the sense that generative, there’s definitely a trade-off between capability and control. And so enterprises are looking to have that control, and that is what companies like PolyAI are working towards for enterprise customers to be able to have the capability but also be able to make sure that it’s on brand.

Amanda Razani: So one of the biggest concerns these days, I’m hearing a lot about, and especially with this newer technology and this voice, AI technology: deepfakes. So can you weigh in on deepfakes and how do companies ensure safety? How do they ensure their customers are safe with this big risk out there? I mean, I’ve seen some of these deepfakes and I wouldn’t have known that they weren’t the real thing. So that’s a little bit scary.

Yan Zhang: Well, I think that generative AI certainly, I mean, we’re maybe just stepping outside of customer service and guest care a little bit. But just in general, it certainly calls into question our notion of authenticity of content, that I think that a lot of other companies outside of our domain of expertise are working on things like watermarks that indicate that something is generative. There’s AI that can now tell you whether an image is generated or not. But I’d like to kind of bring that conversation a little bit back to the customer service space and kind of raise a problem that maybe isn’t getting enough airtime, which is that a lot of companies now, especially banks and financial institutions, use voice biometrics to authenticate their customers. This is a technology that’s been particularly exciting in our field for the last 15 years or so. So the customer’s voice is the customer’s identity. Well, now with deepfakes, that kind of throws a huge wrench into that identification.
So now if you can fake my voice, you can maybe access my Chase bank account on the internet or through the customer service phone line. So I think, on the one hand, you’re talking about deepfakes being able to call into question what’s authentic and what’s not. But on a more practical level, it kind of makes authentication a big challenge. So something that we’re advocating for is for enterprises to kind of look at changing their authentication methods from something that’s based on something that’s really deeply fakeable to a conversational format, how a real human agent would do it, which is asking you a little bit about something that you would only know, plus a multi-factor type of approach where you get sent a message to your phone. That then can do a better job than just having a biometric signature of your voice.

Amanda Razani: Excellent. Sounds like a good idea. And I do know just yesterday the government released some new mandates around AI, so hopefully we’re going to see a little bit more safety measures put in place there. So can you share some use case examples or some business journeys that you could share with us where they saw great success in implementing this technology?

Yan Zhang: Yup. So I think a couple of stories … A couple of years ago we started helping FedEx, and one of the first cases that we worked on was helping FedEx resolve a very tricky situation that they had here in the UK, which is that they were delivering passports on behalf of the government. It was around COVID. And for those people who are familiar with contact centers, imagine you’re taking 10,000 calls a day in your contact center. So you have then enough employees to staff in your contact center to take 10,000 calls. And then if you know you’re going to take 20,000 calls over Christmas, that’s also fine because then you can start hiring folks in October and meet that demand when that demand goes up in November and December.
Now, if all of a sudden you have unexpected increases in your volume, very, very difficult to solve that problem with more employees because who are you going to get to answer those calls? And that’s exactly what happened during COVID when travel policies were changing. “Oh, you can travel now. Now, nobody’s allowed to leave the country, but now you can travel.” And so a lot of people were asking about their passports. They hadn’t traveled for a long time. And so all of a sudden that volume went up and it was really, really difficult to maintain that level of service that was acceptable. Short wait times, you get to speak to people because they simply didn’t have the people.
So they asked us to help. And one of the things that we did was look at what people were calling about. It’s actually really easy, well, relatively easy for us to, for example, change a delivery date. But people were actually calling in to see if they could have that passport delivered to their loved one or their doorman because they weren’t going to be home, and if the delivery could be scheduled for a certain time, so on and so forth. And unfortunately, the answers to all of those questions was going to be no, because a passport is actually a very sensitive document. It could only be delivered to yourself. You have to sign for it, and it’s kind of an annoying policy, but you could see why it exists from a government policy point of view.
So when we built that bot, it was not just to be transactional. It was also to understand a lot of the desire of the customer to talk about these requests. So we’re not just answering, “No, sorry, we can’t do that.” It’s about, “We understand that you want a certain time, but because of just how the delivery method works, we can’t nominate a particular hour. Sorry, you can’t have the driver’s number. We know that you’d like to get in touch with them to arrange a time, but it is for their safety and your safety.”
And so we would automate these five, six minute long calls where people are just throwing everything to try to bargain for a better outcome. And even if they didn’t leave that call with having all of their requested agreed to, they didn’t ask to speak to a human because the bot was able to understand and sympathize with all of the requests. “We understand why you’re asking but this is something that’s really difficult to do.” And so when you are saying no to a customer, if you do it with sympathy like a human would, you get some understanding from their perspective as well. So this is a temporary pressure on that particular customer at that particular time, but we managed to solve that problem and leave a lot of happy customers from that point of view.

Amanda Razani: That’s awesome. So looking forward, this technology is advancing so quickly, where do you see customer service and the enterprise in a year from today?

Yan Zhang: Yeah. Well, a year is not usually as long as we think it is, but maybe looking a little bit further, but technology moves fast, but maybe a year to five years. I think voice AI is going to transform the way that we interact with our brands, the brands that we love. I think that for the last 15 years, the mobile app has been a principle way of customer and brand engagement. And a lot of brands have invested into the mobile app because it’s just one tap away on your home screen. Well, there’s another channel that’s just also one tap away from your mobile phone to any brand that you might patronize, which is their toll-free customer service number. And actually a lot of people choose to tap that number and call the brands. I think about 70% of all customer interactions still happen over the phone channel, and that number has stayed pretty constant in the last 10 years despite all of the digital investments that brands have made. I think people want to talk to brands.
But it’s not something that a lot of brands put front and center because the voice channel is actually really expensive to serve. It is sometimes frustrating because there’s long wait times, sometimes it’s not the best customer experience. So brands tend to kind of begrudgingly offer an 800 number, but hide it somewhere on their website.
So I think that what generative AI and voice AI will do to that experience is to make the toll-free number, again, one of the principle ways that brands now interact with their customers. So you will have a branded voice assistant or even kind of a voice avatar that can answer any questions about the brand 24/7, and it’ll be a lot smarter than the voice IVR. It’ll be a lot smarter than what even Alexa or Siri can do for you. So you can ask the brand, it’ll remember your profile and what you’ve talked about before, and offer customized service for you, and you can continue that conversation in other channels as well. And so imagine having kind of a personal assistant from every brand that you work with. I think that’s the vision for voice AI in the enterprise space.

Amanda Razani: Wonderful. Yes. I cannot imagine where it’s going to lead us. I know just a few years ago when I got my little Google Dot, I ask that thing so many questions now, and I’m always talking to my phone and asking Siri questions, and now I just can’t imagine not being able to say, “Hey, Siri. Hey, Google, help me with this. Or what is this? Or what’s the weather?”

Yan Zhang: Yes.

Amanda Razani: And there you go. There is an example, and my phone was on silent, so I don’t know even how it did that. But yeah, so as you see, I’m always asking it questions and it answers, so I can’t imagine where the technology is going to go from there, especially in the customer service. If there’s one key takeaway that you can leave our audience with today, what would that be?

Yan Zhang: That’s a great question. So the one key takeaway is that I think that voice is important and that to be bullish on voice as a channel of engagement and service, I think that a lot of attention has been paid in the last, again, 15 years or so on digital channels and digital engagement. The app is great, chatbots are great, websites are great. But at the same time, I’m talking to you, I’m having a great conversation, we’ve exchanged a lot of ideas and content in just these 20 minutes. Because voice again, is the most natural and it’s the most information rich channel for people to communicate. And the fact that we’ve limited ourselves to be tapping on the screen to engage with either our friends or brands is less natural than just actually speaking to them. But voice has really disappointed people because of the limitations of technology. Well, those limitations are getting removed now. And so the way that we interact with machines and we interact with computers might be very, very different in five years from now.
And it’s great because for brands also, just as a side note, that conversation that you’re having with your customer carries so much data and insights, and usually that’s not really a channel that brands are tracking very closely to customer sentiment, what people might want. But again, automation is going to be able to capture all of that, digitize all of that, and create a ton of data and insights for brands to work from.

Amanda Razani: Interesting. Thank you for coming on our show today and sharing your insights.

Yan Zhang: Thank you, Amanda.

TECHSTRONG TV

Click full-screen to enable volume control

Watch latest episodes and shows

AI Leadership Insights: AI Voice Technology

TECHSTRONG TV

TECHSTRONG AI PODCAST

SHARE THIS STORY

FOLLOW US

AI Leadership Insights: AI Voice Technology

TECHSTRONG TV

TECHSTRONG AI PODCAST

SHARE THIS STORY

RELATED STORIES:

FOLLOW US

NEWSLETTER SIGN UP