| Large field experiment shows big gains while AI is present, then performance drops roughly 17 per cent lower than students who learned without AI throughout. They did not just lose the gains. They ended up worse off than if they had never used it. The OECD describes this as an illusion of learning. |
Strong |
OECD Digital Education Outlook 2026 |
| Three RCTs with 1,222 participants. After 10-15 minutes of AI-assisted work, participants performed worse and gave up more often when AI was taken away than people who had never used it at all. Preprint, not yet peer-reviewed. |
Moderate |
Liu et al. (2026) |
| RCT with nearly 1,000 high school maths students. Students given standard ChatGPT improved practice scores by 48 per cent but scored 17 per cent lower on independent exams once AI was removed. They used it as an answer machine, not a learning tool. A redesigned AI tutor that guided reasoning rather than giving answers largely prevented the decline. Students did not notice any reduction in their own learning while it was happening. |
Strong |
Bastani et al. (2025) |
| Three preregistered experiments with 1,372 participants. When AI gave correct answers, accuracy rose 25 percentage points above baseline. When AI gave wrong answers, accuracy fell 15 percentage points below the scores of people who never used AI at all. AI use also raised confidence by 11.7 percentage points even when the answers were wrong. Introduces the term cognitive surrender: the condition in which a user accepts AI output without critical evaluation, substituting it for their own reasoning. Participants with higher trust in AI and lower need for independent thinking showed the greatest surrender. Preprint, not yet peer reviewed. |
Strong |
Shaw and Nave (2026) |
| Small study (54 adults) finds lower brain connectivity when using AI assistance. Some neural effects persist after AI is removed. This study coined the term ‘cognitive debt’. The sample is small and the mechanism, though plausible, needs replication at scale. It is included because it is the only study examining neural effects directly. |
Moderate |
Kosmyna et al. (2025) |
| Survey finds a negative link between frequent AI use and critical thinking. “AI tools can negatively impact our critical thinking skills.” Cannot establish causation. |
Weak |
Gerlich (2025) |
| RCT with 120 university students. A surprise retention test 45 days after learning found students who used ChatGPT scored 57.5 per cent versus 68.5 per cent for those who studied traditionally. The AI group forgot faster because knowledge was never deeply encoded. Prior AI experience did not protect against the effect. This is the only study that tested effects weeks after AI use ended rather than immediately after. |
Strong |
Barcaui (2025) |
| The only fMRI study to scan children during chatbot interaction. 15 children aged 6 to 7 and 16 adults used ChatGPT in a creative task. Adults showed stronger connectivity in cognitive control and attention networks. Children showed lower engagement across those same networks. Preprint, small sample, replication needed. Included because it is the only neuroimaging evidence specifically in young children. |
Moderate |
Horowitz-Kraus et al. (2025) |
| 35% of children aged 9-17 say chatting with AI feels like talking to a friend. Rises to 50% among vulnerable children. (Survey data only.) |
Moderate |
Internet Matters (2026) |
| Minimal prompts needed for chatbots to produce harmful content when interacting under teen-like conditions. (Not a primary experiment.) |
Moderate |
Common Sense Media / Stanford (2025) |
| Small case series in adults shows AI-driven validation can reinforce and worsen obsessive or delusional thought patterns. Research concerns adults, not children but not unreasonable that effects could be as bad or worse with children. |
Moderate |
Morrin et al. (2026) |
| Word-embedding analysis shows AI models encode negative associations with autism and ADHD before any interaction – words associated with danger, disease, badness, and other negative concepts. |
Strong |
Brandsen et al. (2024) |
| 600 identical essays fed to four major LLMs: female-identified students got affective praise, low-attaining students got mechanical correction, high-attaining students got expansive intellectual challenge. Clear demonstration of entrenched bias across all major LLMs. |
Strong |
Tan et al. (2026) – Marked Pedagogies |
| Brookings study drawing on over 400 studies and interviews across 50 countries. Conclusion: the risks of using generative AI in children’s education outweigh the benefits. |
Strong |
Burns, Winthrop et al. (2026) |
| Analysis of 1.2 million student AI interactions across 1,300 US school districts found roughly 1 in 5 involved cheating, self-harm, bullying or other problematic behaviour. (US data; no equivalent UK study exists yet.) |
Moderate |
Securly / Education Week (March 2026) |