Summary AI applications, including Gemini and ChatGPT, had over half of summaries of news stories rated by journalists as having "significant issues."

AI inaccuracies include false statements about health recommendations, current officeholders, and global events.

An upcoming Google TV feature meant to summarize news with AI will involve human oversight.

From misconstruing jokes and memes as facts to outright hallucinating output that's not grounded in any existing information, artificial intelligence applications are infamously poor arbiters of reality. Today, the BBC published the results of a research project that quantifies the issue. In a review of a handful of AI chatbots including Gemini and ChatGPT, BBC found that the applications' summaries of news stories had "significant issues of some form" more often than not, frequently including outright factual innacuracies.

In the study, the BBC allowed ChatGPT, Microsoft Copilot, Gemini, and Perplexity AI access to its news content. It asked for summaries of 100 specific stories, then had "journalists who were relevant experts in the subject of the article" rate those summaries. According to the BBC, 51 percent of the AI summaries were flagged as having "significant issues of some form." Nearly one in five summaries included falsehoods like "incorrect factual statements, numbers and dates."

Specifically, BBC cites some of the following flubs:

Gemini incorrectly said the NHS did not recommend vaping as an aid to quit smoking

ChatGPT and Copilot said Rishi Sunak and Nicola Sturgeon were still in office even after they had left

Perplexity misquoted BBC News in a story about the Middle East, saying Iran initially showed "restraint" and described Israel's actions as "aggressive"

BBC says that Copilot and Gemini "had more significant issues" than ChatGPT or Perplexity. The outlet notes that it typically blocks AI chatbots from scraping its content, but that it allowed access during these tests, which took place in December.

Not a surprising result

If you've been following AI developments, the results of these tests may not come as a shock. After years of seemingly manic development by some of the most highly funded organizations on the planet, AI is still notoriously unreliable for many purposes. AI-powered chatbot apps like Gemini and ChatGPT all carry a disclaimer to check results for accuracy.

In January, Apple pulled an iOS Apple Intelligence feature meant to summarize news stories after users found similar results to the BBC's more controlled study: summaries came through jumbled and, in some of the worst cases, included fabricated details. An upcoming Google TV feature is set to present AI-summarized news stories, but, according to Google, there'll also be human involvement.