We still may not be sure if androids dream of electric sheep, but we do know that generative chatbots like ChatGPT and Bard can hallucinate.
The problem with AI hallucinations – essentially the large-language models (LLMs) that the chatbots are based on, like OpenAI’s GPT-4 and Google’s PaLM returning inaccurate or false responses and presenting them as accurate – has come to the forefront over the past year with the mainstreaming of generative AI since OpenAI’s release of ChatGPT 10 months ago.
Some of the situations can be comical, but AI hallucinations also can have serious consequences in the professional world. The most high-profile case occurred in May, when a New York attorney faced sanctions for filing a brief that was produced by ChatGPT and included a host of errors, including citations of court cases that never existed.
“AI hallucinations are part of a growing list of ethical concerns about AI,” workflow automation company Zapier wrote in a blog post earlier this month. “Aside from misleading people with factually inaccurate information and eroding user trust, hallucinations can perpetuate biases or cause other harmful consequences if taken at face value.”
The company added that “despite how far it’s come, AI still has a long way to go before it can be considered a reliable replacement for certain tasks like content research or writing social media posts.”
OpenAI CEO Sam Altman at a tech event in India earlier this year said it will take years to better address the issues of AI hallucinations, adding that “I probably trust the answers that come out of ChatGPT the least of anybody on Earth.”
What AI Users are Seeing
As Altman noted, it’s a problem that isn’t going away anytime soon, according to Tidio, a company that sells an AI-powered customer service platform.
“Large language models are becoming more advanced, and more AI tools are entering the market,” Maryia Fokina, a content specialist with Tidio, wrote in a blog post this month. “So, the problem of AI generating fake information and presenting it as true is becoming more widespread.”
To get a better handle on how widespread the AI hallucination situation is, Tidio surveyed 974 internet users about their experiences and views regarding the issue. What the company found was that people are well aware of and have experienced AI hallucinations, though there were some incongruity in the results as well.
According to the survey, 96% of the respondents said they knew of AI hallucinations, 86% percent have experienced them, and 93% believe they can harm users.
About 46% said they frequently encounter AI hallucinations, 35% occasionally do so, and 77% have been deceived by them. Another 33% of those who haven’t experienced them believe they could be misled and 96% have seen AI content that made them question their perceptions.
Also, the hallucinations are seen in many models. About 53% said they ran into AI hallucinations when using AI-generated art and imaging tools, like Midjourney and OpenAI’s DALL-E, 45% said it was through virtual assistants like Siri and Alexa, and 40% pointed to AI chatbots, like ChatGPT and Bard.
In addition, 72% trust AI to provide reliable and truthful information, even though 75% have been misled by AI at least once.
Spotting the Hallucinations
So how do people spot AI hallucinations, what can be done about them, what are the dangers, and who’s to blame? About 57% cross reference with other resources to smoke out hallucinations, while 32% say they rely on their instincts and 32% look for verification from experts or authorities. In addition, 59% say AI hallucinations carry privacy and security risks and 46% said they can help spread inequality and bias.
Understandably, given today’s political climate, there were some answers with conspiracy overtones. While 27% put the blame for AI hallucinations on those writing prompts, 22% said they are the fault of governments looking to push an agenda. Almost a third of respondents said hallucinations can lead to brainwashing in society.
Tidio noted there are a range of types of hallucinations, including those that contradict the prompts given, sentences, facts, sources and calculations, and the reasons behind them can include insufficient training data of the LLM, output that’s accurate for the training data but not new data and improperly encoded prompts.
The idea of AI hallucinations isn’t new – the term was first used in 2000 – though it has accelerated with the rapid adoption of ChatGPT, and Tidio said the situation with AI hallucinations is improving, noting OpenAI’s strategy to reward themselves for each correct step of reasoning when coming to an answer rather than just reward the answer.
Things are Getting Better
Philip Moyer, global vice president of AI and business solutions at Google Cloud, agreed that things are getting better, though he wrote in a blog post in June that humans play an important role in ensuring accuracy is important.
“Models, and the ways that generative AI apps leverage them, will continue to get better, and many methods for reducing hallucinations are available to organizations,” Moyer wrote. “But general-purpose generative AI assistants and apps are collaborators in most use cases – ways to save time or find inspiration, but not opportunities for full, worry-free automation.”
He added that “a person using AI often outperforms AI alone or humans alone, so the default assumptions for new generative AI use cases should include a ‘human in the loop’ to steer, interpret, refine, and even reject AI outputs.”
The respondents in Tidio’s survey seem to agree, with 48% saying improving user education about AI-generated content is important, though many said the responsibility was shared by AI developers (47%).
The company also laid out steps for preventing AI hallucinations, including being more precise with prompts, filtering the LLM’s parameters to reduce the randomness in the model’s responses, being wary of using AI for calculations, and using multi-shot prompting.
“LLMs don’t know words or sentences. They know patterns,” Fokina wrote. “They build sentences by assessing the probability of words coming after other words. So, to help them with this task, you can provide a few examples of your desired outputs. This way, the model can better recognize the patterns and offer a quality response.”
She also suggested users tell the LLM what they don’t want – “don’t include fictional mentions” – and give the AI system feedback, telling the system when an answer is incorrect.
“Just like humans, AI learns from experience,” she wrote.