When OpenAI’s ChatGPT, the first large language model, was released in November 2022, it became the fastest-growing consumer application in history in just five days, hitting a million users, and that unprecedented growth was sustained through two months, when it reached 100 million active users, according to a UBS study, but even as it continues to grow, and bigger and better models arise, the level of scrutiny, especially with regard to bias and inaccuracies, has also risen.
In a recent survey by Applause, a leader in testing and digital quality, of 6,361 consumers, software developers and quality assurance testers, based on their usage and attitudes towards AI tools and software, half of the respondents reported that they experienced bias, and 38% said they saw inaccuracies.
“It’s clear from the survey that consumers are keen to use Gen AI chatbots, and some have even integrated it into their daily lives for tasks like research and search, said Chris Sheehan, senior VP of strategic accounts and AI at Applause. “Chatbots are getting better at dealing with toxicity, bias and inaccuracy – however, concerns still remain. “To gain further adoption, chatbots need to continue to train models on quality data in specific domains and thoroughly test across a diverse user base to drive down toxicity and inaccuracy.”
While there is substantial concern, there is also optimism that things are improving, as 75% of the respondents felt that chatbots and other AI tools are getting better at managing toxic or inaccurate responses. ChatGPT remains the market leader, with 91% usage, followed by Gemini, at 63%, and Microsoft Copilot, at 55% usage. The main reason for usage is research, according to the survey, with 91% using it for that purpose – a third of them using it daily.
Minimizing hallucinations, bias and toxicity, in all usages, is the focus of the U.S. Government’s AI Executive Order, signed last October. In particular, it addresses health care.
“The Administration is pulling every lever it has to advance responsible AI in health-related fields,” according to a statement released by Lael Brainard, National Economic Advisor, Neera Tanden, Domestic Policy Advisor, and Arati Prabhakar, the Director of the Office of Science and Technology Policy.
In the release, dated December 14, 2023, the co-authors said there were voluntary commitments from 15 leading AI companies to develop models responsibly, and by 28 health care providers and payers, such as Boston Children’s Hospital and Duke Health, to “align industry action on AI around the FAVES principles.” FAVES is an acronym for Fair, Appropriate, Valid, Effective and Safe. “Under these principles, the companies commit to inform users whenever they receive content that is largely AI-generate and not reviewed or edited by people,” the co-authors stated.
“We must remain vigilant to realize the promise of AI for improving health outcomes. Health care is an essential service for all Americans, and quality care sometimes makes the difference between life and death. Without appropriate testing, risk mitigations and human oversight, AI-enabled tools used for clinical decisions can make errors that are costly at best – and dangerous at worst. Absent proper oversight, diagnoses by AI can be biased by gender or race, especially when AI is not trained on data representing the population it is being used to treat,” the co-authors stated.
As part of the government’s focus on fairness, it has developed a blueprint for an AI Bill of Rights, that includes “Algorithmic Discrimination Protections.”
“You should not face discrimination by algorithms and systems should be used and designed in an equitable way,” stated the press release announcing the blueprint, a non-binding guide. “Algorithmic discrimination occurs when automated systems contribute to unjustified different treatment or impacts disfavoring people based on their race, color, ethnicity, sex, religion, age, national origin, disability, veteran status, genetic information or any other classification protected by law.”
Whether the blueprint actually translates into some form of binding legislation, remains to be seen.
At the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory, Postdoctoral Researcher Hongyin Luo, said there are emerging ways to reduce bias. “Although stereotypical reasoning is a natural part of human recognition, fairness-aware people conduct reasoning with logic rather than stereotypes when necessary. We show that language models have similar properties. A language model without explicit logic learning makes plenty of biased reasoning, but adding logic learning can significantly mitigate such behavior.”