A major study claiming ChatGPT improves student learning has been retracted

Skye Jacobs

Posts: 1,979   +58
Staff
Facepalm: A high-profile academic paper that once framed ChatGPT as a clear win for student learning has been pulled, nearly a year after it helped shape early narratives about AI in education. Springer Nature removed it last month over "discrepancies" in the meta-analysis that shook confidence in the results. The publisher also noted that "the authors had not responded to correspondence regarding the retraction."

By the time of the retraction, the paper had already traveled far. Published in May 2025 in Humanities & Social Sciences Communications, it attempted to measure ChatGPT's impact by combining results from 51 separate studies. The authors compared outcomes between students who used the chatbot and those who did not, ultimately reporting what they described as a "large positive impact on improving learning performance," a "moderately positive impact on enhancing learning perception," and "fostering higher-order thinking."

Those claims didn't stay confined to academic circles. The paper picked up hundreds of citations – 262 within Springer Nature journals alone and more than 500 overall – and drew close to half a million readers. It also ranked in the top percentile for attention among journal articles, helped along by steady circulation on social platforms.

That visibility is part of what now concerns researchers.

"The paper's authors made some very attention-grabbing claims about the benefits of ChatGPT on learning outcomes," said Ben Williamson, a senior lecturer at the University of Edinburgh's Centre for Research in Digital Education and Edinburgh Futures Institute. "It was treated by many on social media as one of the first pieces of hard, gold standard evidence that ChatGPT, and generative AI more broadly, benefits learners."

But as the paper spread, so did doubts about how it reached those conclusions. Williamson pointed to problems in how the analysis combined its source material. "In some cases it appears it was synthesizing very poor quality studies, or mixing together findings from studies that simply cannot be accurately compared due to very different methods, populations, and samples," he told Ars Technica. "It really seemed like a paper that should not have been published in the first place."

There were also basic timing questions. ChatGPT only became publicly available in late 2022, leaving a narrow window to produce dozens of rigorous, peer-reviewed studies suitable for a meta-analysis. "It is not feasible that dozens of high-quality studies about ChatGPT and learning performance could have been conducted, reviewed, and published in that time," Williamson said.

Others flagged similar issues early on. Ilkka Tuomi, chief scientist at Meaning Processing Ltd., criticized the premise of combining results across studies that may not be directly comparable. He wrote on LinkedIn that studies like this risk combining results that aren't truly comparable, leading to conclusions based on unclear or inconsistent outcomes. He also suggested that such analyses can give a misleading sense of scientific rigor, since statistical tools can produce results that appear credible even when the underlying data is weak.

Williamson said that as the study spread on social media, much of its nuance was lost, leaving only the headline claims to circulate widely. He noted that those simplified takeaways were amplified by users online, helping drive significant attention despite the fact that the underlying research did not fully support the conclusions.

That dynamic may outlast the retraction itself. Researchers who cited or shared the study may not see the update, leaving its core message – that ChatGPT improves learning outcomes – circulating without context.

The episode lands at a moment when schools and universities are still figuring out how to respond to generative AI. Some educators are trying to limit misuse, particularly of AI-assisted cheating, while tech companies continue to roll out features designed to position chatbots as study tools. At the same time, there are signs of pushback against fully digital classrooms, with at least one country moving back toward printed materials and handwritten work.

For Williamson, the frustration is less about a single paper and more about what it represents. He said the situation has been exasperating for researchers trying to understand AI's real role in education, noting that while hype has dominated the conversation in recent years, there is still a lack of rigorous evidence showing how these tools actually affect teaching and learning in practice.

Permalink to story:

 
I think time itself proved that AI does not help with studying and "higher-order thinking" at all.
 
Last edited:
I think time itself proved the AI does not help with studying and "higher-order thinking" at all.
AI like most technology is just a tool, but people insist on using it as a crutch. Sure it can help solve complex problems, but the answers are useless if people are too dumb to understand their significance.

I am an engineer by the time I got to college calculator usage was accepted and a lot of students relied upon them to get the correct answer. Before that time engineers used slide rules and could do complex math in their heads and on paper without using a calculator tool (they really understood what they were doing). I can do complex math in my head but most of the people I went to college resorted to using calculators to do simple addition, subtraction, multiply and divide operations. Every few generations new tools come on the scene and make most of us less capable than the generations before us. I am afraid at what high schools are college will turn out as engineers, lawyers, doctors and future leaders in a decades or two that may no longer have the ability for higher order thinking and instead will rely on a chatbox to the thinking for them (at which time we become AI's pets).
 
AI like most technology is just a tool, but people insist on using it as a crutch. Sure it can help solve complex problems, but the answers are useless if people are too dumb to understand their significance.

I am an engineer by the time I got to college calculator usage was accepted and a lot of students relied upon them to get the correct answer. Before that time engineers used slide rules and could do complex math in their heads and on paper without using a calculator tool (they really understood what they were doing). I can do complex math in my head but most of the people I went to college resorted to using calculators to do simple addition, subtraction, multiply and divide operations. Every few generations new tools come on the scene and make most of us less capable than the generations before us. I am afraid at what high schools are college will turn out as engineers, lawyers, doctors and future leaders in a decades or two that may no longer have the ability for higher order thinking and instead will rely on a chatbox to the thinking for them (at which time we become AI's pets).
I've talked with highschool/collage students... unfortunately you can see it from the way they talk, you can forget about telling them to do maths in their head :)
 
Last edited:
I shudder to think what the level of critical thinking in the Western world will be like in 50 years. All because of a few self-aggrandising techbros. Jesus wept................
 
I kept reading to see why it was flawed. Studies that might not be directly comparable is often normal for studies of studies.

Too much research seems to get taken down because it doesn't match the desired narrative of stake holders.

I learn everyday from AI by working with AI.
 
I kept reading to see why it was flawed. Studies that might not be directly comparable is often normal for studies of studies.

Too much research seems to get taken down because it doesn't match the desired narrative of stake holders.

I learn everyday from AI by working with AI.
In this case it's the opposite, too many shady flawed studies are being allowed, all with the obvious intent of swaying public perception.

"I learn everyday from AI by working with AI." - you didn't grow up with it. you had a proper education and real social interactions when growing up. you don't use AI as a crutch, but as a tool.
 
I've talked with highschool/collage students... unfortunately you can see it from the way they talk, you can forget about telling them to do maths in their head :)
I still can do math in my head (I'm about to retire at the maximum age for social security, so you do your own math, heh heh.) I love doing problems but I don't practice enough to be able to do more than about 2 sets of 2 or 3 digit numbers before I start losing results. I also use my fingers to count and keep track of remainders and whatever else I need during the process. Nobody taught me, I just learned it for fun on my own.
 
I kept reading to see why it was flawed. Studies that might not be directly comparable is often normal for studies of studies.

Too much research seems to get taken down because it doesn't match the desired narrative of stake holders.

I learn everyday from AI by working with AI.
Ah yes, another someone who has lost the forest because they are just staring at trees.
 
Back