Will Ellis: The Evidence Against AI in Schools – What the Research Actually Says

May 4, 2026

This article presents the research evidence on pupil-facing AI and AI-assisted tools in schools. The studies cover five areas: cognitive dependency and skill loss, degradation of long-term knowledge, reduction of brain engagement in children, emotional and safeguarding risks, and systemic bias against vulnerable pupils. A new entry in this version documents the concept of cognitive surrender – the condition in which users accept AI output without critical evaluation, substituting it for their own reasoning. The evidence is drawn from the EdTech Accountability Framework compiled by Will Ellis / Reclaim Childhood.

Evidence Ratings

● Strong ● Moderate ● Weak ● None

Rating	What it means
Strong	Peer-reviewed findings with sufficient scale and replication to inform policy, including null results.
Moderate	Independent but limited in scale or context.
Weak	Some independent work but too small or methodologically limited.
None	Vendor data only, or no trial found.

Key Terms

Term	Definition
The Instrument Test	A tool does nothing until the child acts on it (e.g. a word processor remains empty until the child types). An agent responds, adapts or produces output on the child’s behalf. The instrument test asks which is directing whom: if the child directs the tool, it may belong in the classroom; if the tool directs the child, it does not.
Architectural Grounds	A platform can be restricted based on its design, regardless of trial outcomes. Where evidence also shows harm, both grounds apply.
Engagement is Not Learning	Clicking, responding and staying on task is behavioural engagement. Struggling with a concept, making connections and building knowledge is cognitive engagement. Platforms built to maximise the first often undermine the second. A child completing 200 Times Tables Rock Stars questions in a session may be behaviourally engaged throughout and cognitively engaged for almost none of it.

Table 1a: Evidence Against AI in Schools

Key Finding	Evidence	Source
Dependency and Loss of Skills
Large field experiment shows big gains while AI is present, then performance drops roughly 17 per cent lower than students who learned without AI throughout. They did not just lose the gains. They ended up worse off than if they had never used it. The OECD describes this as an illusion of learning.	Strong	OECD Digital Education Outlook 2026
Three RCTs with 1,222 participants. After 10-15 minutes of AI-assisted work, participants performed worse and gave up more often when AI was taken away than people who had never used it at all. Preprint, not yet peer-reviewed.	Moderate	Liu et al. (2026)
RCT with nearly 1,000 high school maths students. Students given standard ChatGPT improved practice scores by 48 per cent but scored 17 per cent lower on independent exams once AI was removed. They used it as an answer machine, not a learning tool. A redesigned AI tutor that guided reasoning rather than giving answers largely prevented the decline. Students did not notice any reduction in their own learning while it was happening.	Strong	Bastani et al. (2025)
Three preregistered experiments with 1,372 participants. When AI gave correct answers, accuracy rose 25 percentage points above baseline. When AI gave wrong answers, accuracy fell 15 percentage points below the scores of people who never used AI at all. AI use also raised confidence by 11.7 percentage points even when the answers were wrong. Introduces the term cognitive surrender: the condition in which a user accepts AI output without critical evaluation, substituting it for their own reasoning. Participants with higher trust in AI and lower need for independent thinking showed the greatest surrender. Preprint, not yet peer reviewed.	Strong	Shaw and Nave (2026)
Small study (54 adults) finds lower brain connectivity when using AI assistance. Some neural effects persist after AI is removed. This study coined the term ‘cognitive debt’. The sample is small and the mechanism, though plausible, needs replication at scale. It is included because it is the only study examining neural effects directly.	Moderate	Kosmyna et al. (2025)
Survey finds a negative link between frequent AI use and critical thinking. “AI tools can negatively impact our critical thinking skills.” Cannot establish causation.	Weak	Gerlich (2025)
Degradation of Long-Term Knowledge
RCT with 120 university students. A surprise retention test 45 days after learning found students who used ChatGPT scored 57.5 per cent versus 68.5 per cent for those who studied traditionally. The AI group forgot faster because knowledge was never deeply encoded. Prior AI experience did not protect against the effect. This is the only study that tested effects weeks after AI use ended rather than immediately after.	Strong	Barcaui (2025)
Reduction of Brain Engagement
The only fMRI study to scan children during chatbot interaction. 15 children aged 6 to 7 and 16 adults used ChatGPT in a creative task. Adults showed stronger connectivity in cognitive control and attention networks. Children showed lower engagement across those same networks. Preprint, small sample, replication needed. Included because it is the only neuroimaging evidence specifically in young children.	Moderate	Horowitz-Kraus et al. (2025)
Emotional Dependency and Safeguarding Risks
35% of children aged 9-17 say chatting with AI feels like talking to a friend. Rises to 50% among vulnerable children. (Survey data only.)	Moderate	Internet Matters (2026)
Minimal prompts needed for chatbots to produce harmful content when interacting under teen-like conditions. (Not a primary experiment.)	Moderate	Common Sense Media / Stanford (2025)
Small case series in adults shows AI-driven validation can reinforce and worsen obsessive or delusional thought patterns. Research concerns adults, not children but not unreasonable that effects could be as bad or worse with children.	Moderate	Morrin et al. (2026)
Systemic Bias Against Vulnerable Children
Word-embedding analysis shows AI models encode negative associations with autism and ADHD before any interaction – words associated with danger, disease, badness, and other negative concepts.	Strong	Brandsen et al. (2024)
600 identical essays fed to four major LLMs: female-identified students got affective praise, low-attaining students got mechanical correction, high-attaining students got expansive intellectual challenge. Clear demonstration of entrenched bias across all major LLMs.	Strong	Tan et al. (2026) – Marked Pedagogies
Evidence of Widespread and Documented Harm
Brookings study drawing on over 400 studies and interviews across 50 countries. Conclusion: the risks of using generative AI in children’s education outweigh the benefits.	Strong	Burns, Winthrop et al. (2026)
Analysis of 1.2 million student AI interactions across 1,300 US school districts found roughly 1 in 5 involved cheating, self-harm, bullying or other problematic behaviour. (US data; no equivalent UK study exists yet.)	Moderate	Securly / Education Week (March 2026)

⚠ Prohibited Platforms in This Category ChatGPT / Claude / Gemini (pupil-facing) | Khanmigo (Khan Academy AI) | Century Tech | MagicSchool AI (pupil-facing) | Government AI tutoring pilot (Maths and English) | AI science tutors / adaptive platforms | AI writing assistants (e.g. Copilot, Grammarly AI) | AI image generators (e.g. DALL-E, Midjourney)

Source: Will Ellis / Reclaim Childhood – reclaimchildhoodmedia.substack.com

Disclaimer: We’ve created this overview to help busy parents quickly grasp the key findings. It should not be considered a substitute for reading the original study. For accuracy and complete context, please consult the source document.