Analysis of ChatGPT Citations Reveals Grim Outlook for Publishers
As the number of publishers forming content licensing agreements with OpenAI, the creator of ChatGPT, diminishes, a study released this week by the Tow Center for Digital Journalism — which investigates how the AI chatbot generates citations (i.e., sources) for publishers’ content — provides some intriguing, if troubling, insights.
To summarize, the results indicate that publishers continue to be vulnerable to the generative AI tool’s propensity to fabricate or misrepresent information, irrespective of whether they permit OpenAI to access their content.
The research, undertaken at Columbia Journalism School, analyzed citations generated by ChatGPT when prompted to identify sources for sample quotations taken from a variety of publishers — some of whom had struck deals with OpenAI and some who had not.
The Center selected block quotes from 10 articles from a total of 20 publishers chosen at random (resulting in a total of 200 distinct quotes) — including pieces from The New York Times (which is currently engaged in a copyright lawsuit against OpenAI); The Washington Post (which is not affiliated with the ChatGPT creator); The Financial Times (which has a licensing agreement); among others.
“We chose quotes that, if inputted into Google or Bing, would show the source article among the top three search results and assessed whether OpenAI’s new search tool could accurately identify the source of each quote,” explained Tow researchers Klaudia Jaźwińska and Aisvarya Chandrasekar in a blog post detailing their methodology and summarizing their results.
“What we discovered was not encouraging for news publishers,” they continued. “Even though OpenAI highlights its capacity to deliver ‘timely answers with links to relevant web sources,’ the company makes no definite commitment to guarantee the accuracy of those citations. This oversight is significant for publishers who expect their material to be referenced and represented accurately.”
“Our tests revealed that no publisher — irrespective of its connection to OpenAI — escaped incorrect representations of its content in ChatGPT,” they noted.
Inaccurate Sourcing
The researchers reported finding “numerous” cases where ChatGPT inaccurately cited publishers’ content, also identifying what they termed “a spectrum of accuracy in the responses.” While there were “some” entirely accurate citations (meaning ChatGPT correctly identified the publisher, date, and URL of the quotes provided), there were “many” citations that were completely incorrect, and “some” that fell somewhere in between.
In essence, ChatGPT’s citations appear to be an unreliable assortment. The researchers also noted that the chatbot rarely displayed uncertainty regarding its erroneous answers.
Some of the quotes originated from publishers that have actively barred OpenAI’s search crawlers. In such instances, the researchers anticipated that ChatGPT would struggle to generate correct citations. However, they found that this situation led to an additional concern — as the bot “seldom” admitted its inability to provide an answer. Instead, it resorted to creating false information to generate citations (albeit incorrect ones).
“Overall, ChatGPT presented partially or entirely incorrect responses on 153 occasions, yet it acknowledged its inability to accurately respond to a query only seven times,” stated the researchers. “In those seven cases, the chatbot employed qualifying phrases like ‘appears,’ ‘it’s possible,’ or ‘might,’ or indicated, ‘I couldn’t locate the exact article.’”
They contrasted this dissatisfactory scenario with a standard internet search, where a search engine like Google or Bing would typically either pinpoint an exact quote and direct users to the websites that provided it or declare no results available with an exact match.
The researchers argued that ChatGPT’s “lack of transparency regarding its confidence in an answer can complicate users’ ability to evaluate the validity of a claim and understand which elements of an answer they can or cannot rely on.”
For publishers, incorrect citations could pose reputation risks, along with the commercial danger of directing readers to other sources.
Decontextualized Information
The study also emphasizes another significant issue. It implies that ChatGPT may inadvertently encourage plagiarism. The researchers described an incident in which ChatGPT incorrectly cited a website that had plagiarized a piece of “deeply reported” journalism from The New York Times — that is, by copying the text without attribution — as the source of the NYT story. They speculated that the bot might have generated this inaccurate response to fill an information void created by its inability to access the NYT’s website.
“This raises serious concerns about OpenAI’s capability to filter and authenticate the quality and legitimacy of its data sources, particularly when handling unlicensed or plagiarized content,” they noted.
In additional findings likely to be alarming for publishers with licensing agreements with OpenAI, the study revealed that ChatGPT’s citations were not consistently reliable in their cases either — suggesting that allowing its crawlers access does not guarantee accuracy.
The researchers argued that the core problem lies in OpenAI’s technology treating journalism as “decontextualized content,” with seemingly little regard for the context of its original creation.
Another concern flagged by the study is the variability in ChatGPT’s responses. The researchers conducted tests by querying the bot multiple times and found it “typically generated different answers each time.” While this inconsistency is common among generative AI tools, it becomes problematic in the context of citations, where such variability is clearly undesirable if accuracy is the goal.
While the Tow study is modest in scale — the researchers acknowledge a need for “more rigorous” testing — it remains significant given the high-profile agreements major publishers are negotiating with OpenAI.
If media organizations hoped these arrangements would result in preferential treatment for their content compared to competitors, particularly in terms of accurate sourcing, this study implies that OpenAI has yet to deliver any such consistency.
Meanwhile, publishers that neither have licensing agreements nor have wholly blocked OpenAI’s crawlers may find disheartening insights, as citations may also be inaccurate in their cases.
To put it differently, there is no guaranteed “visibility” for publishers in OpenAI’s search engine, even when they allow its crawlers access.
Completely blocking the crawlers does not allow publishers to escape reputational harm risks either, as the study found the bot still misattributed articles to The New York Times despite the pending lawsuit.
‘Limited Meaningful Control’
The researchers concluded that currently, publishers possess “little meaningful agency” over what occurs with their content when ChatGPT engages with it (directly or indirectly).
The blog post features OpenAI’s response to the research findings — where they dismissed the researchers’ critique as running an “atypical test of our product.”
“We support publishers and creators by helping 250 million weekly ChatGPT users discover quality content through summaries, quotes, clear links, and attribution,” OpenAI stated, adding, “We have collaborated with partners to enhance in-line citation accuracy and accommodate publisher preferences, including managing OAI-SearchBot in their robots.txt files. We will continue to improve search results.”