Google’s Gemini Stumble Should Be a Lesson for AI Companies

Google, having temporarily pulled its generative AI image generation tool over questionable depictions, finds itself at the center of only the latest controversy around the still-emerging technology that touches on everything from bias to ethics to so-called “wokeness.”

It’s also sparking conversation around what needs to change with the large language models (LLMs) that underpin these generative AI offerings and the data that is fed into the models. Questions about bias and “hallucinations” – the inaccurate information chatbots and other tools can include when they’re unsure about how to answer a prompt – have haunted the nascent generative AI industry since OpenAI first released its ChatGPT chatbot 15 months ago.

And it’s that fact that people need to keep in mind when a situation that Google is now facing crops up, according to Bob O’Donnell, principal analyst with TECHnalysis Research.

“What it says is that it’s very early in this game,” O’Donnell told Techstrong AI. “People are still learning and there are going to be bumps along the way. It’s hard. There’s no way around it. It’s hard.”

Inaccurate Historical Depictions

Google executives late last week pulled the AI tool that creates images of people due to inaccuracies in some historical depictions just weeks after introducing its as part of its larger Gemini generative AI chatbot. The historical inaccuracies from the Google service included showing people of color in Nazi uniforms. Other criticism, as outlined in a story in The New York Times, included the service’s refusal to generate images of white couples, though it would when asked for images of Black or Chinese couples.

The backlash against Google was harsh, prompting the IT giant to pause the image generation tool and apologize multiple times. In a blog post, Prabhakar Raghavan, senior vice president for knowledge and information at Google, wrote that some images returned by the service not only weren’t what users wanted, but also not what the company wanted.

“This wasn’t what we intended,” Raghavan wrote. “We did not want Gemini to refuse to create images of any particular group. And we did not want it to create inaccurate historical – or any other – images.”

A Tale of Two Missteps

There were two key issues that caused the problems, he wrote.

“First, our tuning to ensure that Gemini showed a range of people failed to account for cases that should clearly not show a range,” Raghavan wrote. “And second, over time, the model became way more cautious than we intended and refused to answer certain prompts entirely – wrongly interpreting some very anodyne prompts as sensitive.”

Both problems “led the model to overcompensate in some cases, and be over-conservative in others, leading to images that were embarrassing and wrong.”

Google engineers are working to improve the tool while it’s turned off and Raghavan said “extensive testing” will be done before it returns, which Google DeepMind CEO Demis Hassabis said this week at the Mobile World Congress event should be within a few weeks.

In the meantime, Google will have to take the hit. The timing was unfortunate, coming only weeks after the company’s high-profile rebranding of its Bard chatbot – its answer to ChatGPT – to Gemini. It also comes as it faces fierce competition in the generative AI field with the likes of Microsoft, Meta, OpenAI and Anthropic, all of which are frantically pushing out more capabilities to a hungry public and business world.

Google’s current headaches also should serve as a cautionary tale to its competitors to keep an eye on possible pitfalls even as they race to get stuff into the market.

A Matter of Trust

TECHnalysis’ O’Donnell said given the newness of the technology and the seemingly insatiable demand, users will give these AI companies a little bit of rope.

“It’s about trust,” he said. “They’ll have a little bit of leeway as long as it doesn’t keep happening.”

It also puts more pressure on the companies to be more transparent about their models and the data that goes into them. Right now, most people don’t know if the image data the models are trained on simply from massive web scrapings or content from Getty Images, the analyst said. At this point, “people are trying to figure out where the data is coming from and what’s happening with it and that’s a factor.”

Google and the others likely will have to be more open to showing where the data comes and how the models use it.

Consider ‘Foreseeable Use’

In a lenghty thread on X (formerly Twitter), Margaret Mitchell, researcher and chief ethics scientist for Hugging Face and a one-time Google research scientist, wrote about the need by developers to consider “foreseeable use,” which includes misuse.

“Once the model we’re thinking of building is deployed, how will people use it? And how can we design it to be as beneficial as possible in these contexts?” Mitchell wrote. “What you first do is figure out ways it’s likely to be used, and by whom. This is harder for some people than others.… When designing a system in light of these foreseeable uses, you see that there are many use cases that should be accounted for.”

For example, historic depictions (what do popes tend to look like?) and diverse depictions (what could the world look like with less white supremacy?), she wrote.

Google erred by taking what Mitchell called a “dream world” approach, where depictions of popes show images in myriad shapes, sizes and colors. Defaulting to the historic biases learned by the model could result in the kind of public pushback the company experienced.

She said what’s needed are more experts in taking an ethics or responsible AI approach to deploying models that relies on expertise and not public relation concerns. That means looking at a tool like Gemini as a system, not a single model, and building multiple classifiers for a user request that can determine intent – and whether that intent is ambiguous – and multiple potential responses.

In addition, its gives users a way to provide feedback.

The high-level point is that it is possible to have technology that benefits users & minimizes harm to those most likely to be negatively affected,” Mitchell wrote. “But you have to have experts that are good at doing this!”

Unfortunately, many of these people aren’t given the power they need, she said, but added that it “doesn’t have to be this way: We can have different paths for AI that empower the right people for what they’re most qualified to help with. Where diverse perspectives are *sought out*, not shut down.”

Google’s Gemini Stumble Should Be a Lesson for AI Companies

Inaccurate Historical Depictions

A Tale of Two Missteps

A Matter of Trust

Consider ‘Foreseeable Use’

SHARE THIS STORY

FOLLOW US

Google’s Gemini Stumble Should Be a Lesson for AI Companies

Inaccurate Historical Depictions

A Tale of Two Missteps

A Matter of Trust

Consider ‘Foreseeable Use’

TECHSTRONG TV

Tech Field Day Events

TECHSTRONG AI PODCAST

SHARE THIS STORY

RELATED STORIES:

FOLLOW US

NEWSLETTER SIGN UP