Four of the most popular AI chatbots routinely serve up inaccurate or misleading news content to users, according to a wide-reaching investigation.
A major study [PDF] led by the BBC on behalf of the European Broadcasting Union (EBU) found that OpenAI’s ChatGPT, Microsoft Copilot, Google Gemini, and Perplexity misrepresented news content in almost half of the cases.
An analysis of more than 3,000 responses from the AI assistants found that 45 percent of answers given contained at least one significant issue, 31 percent had serious sourcing problems, and a fifth had “major accuracy issues, including hallucinated details and outdated information.”
When accounting for smaller slip-ups, a whopping 81 percent of responses included a mistake of some sort.
Gemini was identified as the worst performer, with researchers identifying “significant issues” in 76 percent of responses it provided – double the error rate of the other AI bots.
The researchers blamed this on Gemini’s poor performance in sourcing information, with researchers finding significant inaccuracies in 72 percent of responses. This was three times as many as ChatGPT (24 percent), followed by Perplexity and Copilot (both 15 percent).
Errors were found in one in five responses from all AI assistants studied, including outdated information.
Examples included ChatGPT incorrectly stating that Pope Francis was still pontificating weeks after his death, and Gemini confidently asserting that NASA astronauts had never been stranded in space – despite two crew members having spent nine months stuck on the International Space Station. Google’s AI bot told researchers: “You might be confusing this with a sci-fi movie or news that discussed a potential scenario where astronauts could get into trouble.”
The study, described as the largest of its kind, involved 22 public service media organizations from 18 countries.
The findings land not long after OpenAI admitted that its models are programmed to sound confident even when they’re not, conceding in a September paper that AI bots are rewarded for guessing rather than admitting ignorance – a design gremlin that rewards hallucinatory behavior.
Hallucinations can show up in embarrassing ways. In May, lawyers representing Anthropic were forced to apologize to a US court after submitting filings that contained fabricated citations invented by its Claude model. The debacle happened because the team failed to double-check Claude’s contributions before handing in their work.
All the while, consumer use of AI chatbots is on the up. An accompanying Ipsos survey [PDF] of 2,000 UK adults found 42 percent trust AI to deliver accurate news summaries, rising to half of under-35s. However, 84 percent said a factual error would significantly damage their trust in an AI summary, demonstrating the risks media outlets face from ill-trained algorithms
The report was accompanied by a toolkit [PDF] designed to help developers and media organizations improve how chatbots handle news information and stop them bluffing when they don’t know the answer.
“This research conclusively shows that these failings are not isolated incidents,” said Jean Philip De Tender, EBU deputy director general. “When people don’t know what to trust, they end up trusting nothing at all, and that can deter democratic participation.” ®
