Somewhere in all the stuff I’ve been writing about artificial intelligence or AI, I’d mentioned that as more and more AI generated content is on the internet, the training models would increasingly use more of that content which could adversely impact the quality of content generated by the AIs.
In it’s simplest terms, it’s like photocopying a photocopy recursively. The quality decreases. This is not something I made up, it’s just the way things work – but someone proved it in the context of artificial intelligence.
...underpinning the growing generative AI economy is human-made data. Generative AI models don’t just cough up human-like content out of thin air; they’ve been trained to do so using troves of material that actually was made by humans, usually scraped from the web. But as it turns out, when you feed synthetic content back to a generative AI model, strange things start to happen. Think of it like data inbreeding, leading to increasingly mangled, bland, and all-around bad outputs. (Back in February, Monash University data researcher Jathan Sadowski described it as “Habsburg AI,” or “a system that is so heavily trained on the outputs of other generative AI’s that it becomes an inbred mutant, likely with exaggerated, grotesque features.”)
It’s a problem that looms large. AI builders are continuously hungry to feed their models more data, which is generally being scraped from an internet that’s increasingly laden with synthetic content. If there’s too much destructive inbreeding, could everything just… fall apart?…
“When AI Is Trained on AI-Generated Data, Strange Things Start to Happen“, Maggie Harrison, Futurism.com, August 2nd, 2023
It’s not a hard conclusion to come to. The article goes on to mention the peer reviewed paper, “Self-Consuming Generative Models Go MAD” (PDF), which is an interesting read – and goes on to an interview with the authors of the paper.
I think it’s an important article for people to read and understand. It doesn’t suffer too much technical stuff that can’t be glossed over to get to the underlying points.
5 thoughts on “Synthetic Recursion and AI.”