An AI can unintentionally swallow a poison pill of its own making to induce model collapse is the conclusion of a new research paper that raises new concerns about AI training and long-term accuracy while highlighting a need for an image classification system that may save AI from itself. Once the poison is consumed, the AI never fully heals.

The research paper called “Nepotistically Trained Generative-AI Model Collapse” is authored by Matyas Bohacek of Stanford University and Hany Farid of the University of California, Berkeley. The key conclusion is that when an AI is trained on even small amounts of data of its own creation, generative AI produces distorted images.

Given that generative AI text-to-image models are trained in data scrapped from the internet, future data scrapping will likely involve material created by the AI itself. When this happens, the researchers discovered, it’s as if the AI has consumed a poison pill. The researchers found that when an AI is trained on its own images, the result, while initially yielding a small improvement in quality, quickly turned into a highly distorted picture, a process they call “model poisoning.” The model collapse persisted even when the self-generated images the AI  inadvertently ”retrained” on were as little as 3 percent. The researchers found that “the popular open-source model Stable Diffusion (SD) is highly vulnerable to data poisoning.”

The researchers used a mixture of real and AI-generated images but regardless of changing percentages in the mix, model collapse occurred by the fifth iteration, visible as highly distorted versions of the original one.

The pair also noted that while model poisoning can occur unintentionally, it can come from an adversarial attack where websites are intentionally populated with poison data. “Even more aggressive adversarial attacks can be launched by manipulating the image data and text prompt on as little as 0.01% to 0.00001% of the dataset.”

Once the poison is consumed, the AI can partially heal itself by retraining on new “real” images but artifacts are visible even after many iterations, as if the AI retained some scars from the initial experience. Efforts to retrain the AI using techniques like color matching and the replacement of low-quality images with high quality images yielded no delay in ultimate model collapse. A side effect also was a lack of diversity in terms of appearance in the generated image for an “older Spanish man.” All the faces were similar across latter iterations.

Some open questions remain. Key among them is whether data poisoning can generalize across synthetic engines. For example, will SD images retrained on DALL-E or Midjourney images exhibit the same type of model collapse? Another is whether AI can be trained to be resistant to this type of data poisoning. And while it can be circumvented by the determined, some type of labeling for real versus AI-generated images would be very helpful.

That’s an ongoing issue as it becomes increasingly difficult to tell the difference between the two. AI detection software has a poor record thus far, but image watermarking innovations like Google’s SynthID initiative may help if they scale up effectively. It’s clear, though, that resolving the difference between real and AI-created images may be for AI’s own good in the long term.