
AI researchers at Oxford University in England have thrown a red card onto the AI pitch, citing AI as a direct threat to science because of its propensity toward “hallucinations.” On the other side of the pond, however, AI may be receiving a warmer embrace as DARPA, the research arm of the American military, wants an AI-powered “autonomous scientist” on staff and is offering $1 million to anyone who can build one.
In a paper published in Nature Human Behavior, researchers at the Oxford Internet Institute argue that “large language models (LLMs) are designed to produce helpful and convincing responses without any overriding guarantees regarding their accuracy or alignment with fact. The reason for this is that the data the technology uses to answer questions does not always come from a factually-correct source. LLMs are trained on large datasets of text usually taken from online sources. These can contain false statements, opinions and creative writing among other types of non-factual information.” AI may produce science fiction, in other words.
The issue was brought into stark relief with the November 9th publication of another paper that charged ChatGPT with creating fake clinical trial data set to support an unverified scientific claim. Published in the medical journal JAMA Ophthalmology 9, the paper detailed how AI compared two surgical eye procedures and indicated wrongly, that one treatment is better than another. The AI-produced material was so convincing that only a team of specialists conducting authenticity checks were able to spot the telltale signs of fabrication. However, one tipoff was readily apparent—a Johnny Cash-style “A Boy Named Sue” discrepancy between the stated sex and the sex that would be typically expected from a patient’s name. Other clues were much more arcane.
The big worry is that AI hallucinations may be hard to spot, with the creation of fake datasets that look seemingly authentic being the next level of concern. An AI may invent scientific reference papers to support its conclusions. But it’s only when a deep check of those references is done that they are found to be completely fabricated. Or conversely, the reference might be right but the summary information about it might be wrong. Many peer-reviewed journals stop short of data re-analysis. NHB paper co-author Professor Brett Mittelstadt said he views ChatGPT and other LLMs as very unreliable research assistants.
“People using LLMs often anthropomorphise the technology, where they just trust it as a human-like information source,” says Mittelstadt. “This is, in part, due to the design of LLMs as helpful, human-sounding agents that converse with users and answer seemingly any question with confident-sounding, well-written text.”
Middlestadt and fellow professors Chris Russell and Sandra Wachter maintain that to protect science from the spread of bad and biased information, LLMs need to be seen as “zero-shot translators.” AI prompts should only include vetted and factual information. That horse may have already left the lab but Russell hopes a rope can still be thrown around it. “It’s important to take a step back from the opportunities LLMs offer and consider whether we want to give those opportunities to a technology, just because we can.”
DARPA, meanwhile, wants to take a fast gallop forward with the development of an AI-enabled “autonomous scientist.” The idea, according to Alvaro Velasquez, DARPA’s program manager for Foundation Models for Scientific Discovery, is to replicate the success seen in automatic code generation and create a tool that helps automate the process of scientific discovery. Ideally, DARPA’s autonomous scientist will be able to generate unique scientific hypotheses and develop advanced experiments in support of them while applying skeptical reasoning. The autonomous scientist isn’t seen as a replacement for human scientists but as a specialized assistant for time-consuming, data-heavy fields of inquiry like climate change modeling and protein folding in computational biology.
The DARPA proposal was announced in early November and it remains to be seen whether anyone can deliver this item on the agency’s wish list. In any event, it’s clear the scientific communities on both sides of the Atlantic are wrestling with AI and its implications for science. The question remains: Can generative AI help science understand the world better or will AI provide science with a false picture of it? And how will we know if it does?