The Hollow Shell Problem - how students accumulate cognitive debt

16 Mar, 2026

On what AI does to learning when it gives answers instead of building understanding

Consider two students who have spent a year working with an AI tutoring system. Both have completed hundreds of sessions. Both have seen their grades improve. Now remove the AI and ask them both to solve a problem they have not seen before, without assistance.

One of them can do it. The AI was a sparring partner: it challenged them, scaffolded their thinking, forced them to articulate their reasoning. Working with it made them sharper. The other cannot. The AI was a crutch. It answered where it should have questioned, completed where it should have prompted, reduced friction where friction was precisely the mechanism through which understanding was supposed to form. Remove it and there is nothing there. Not because this student is less capable, but because the capability was never built. This is the hollow shell problem. And it is, I think, the most underacknowledged risk in the current enthusiasm for AI in education.

The evidence

The evidence that AI can hollow out learning rather than deepen it is now substantial. A 2025 study published in the Proceedings of the National Academy of Sciences conducted a randomised trial across high school mathematics and found that students using generative AI without appropriate pedagogical guardrails performed significantly worse than control students on subsequent unassisted assessments.¹ The mechanism was not that the AI gave wrong answers. It was that the AI gave answers, removing the productive struggle through which genuine understanding forms.

A separate study, published in 2025, demonstrated something more troubling: that the damage accumulates gradually, in ways invisible to a session-level observer. Using neuroimaging, the researchers showed that repeated AI-assisted essay writing produced measurable reductions in cortical engagement with the material — what they called cognitive debt.² Each session looked productive. The student completed the task. But the cumulative effect of removing the cognitive work from the student was a decline in their capacity to do that work independently. This is not an argument against AI in education. It is an argument about the conditions under which AI supports learning versus the conditions under which it replaces it. And those conditions are not visible from a single session. They are visible only from a longitudinal model of the learner.

Why this is hard to detect

The hollow shell forms slowly. In any given session, a student who is developing dependence looks like a student who is learning. They get the answers right. They engage with the material. They progress through the curriculum. The divergence between their AI-assisted performance and their genuine independent capacity grows gradually, across many sessions, until it becomes large enough to matter, typically at the moment of an unassisted assessment, or when they encounter a concept that requires the prior material to be genuinely solid.

At that point, the teacher discovers the problem. But the window for easy remediation closed weeks or months earlier, when the pattern of dependence was first establishing itself in the interaction data. The signals were there: increasing reliance on hints, declining engagement with novel problem variants, response patterns that showed procedure-following rather than genuine reasoning. Nobody read them, because no system was designed to read them longitudinally.

This is the structural gap. Not that we lack good AI tutoring systems. It is that we lack systems capable of monitoring how a student's independent reasoning capacity is evolving across their interactions with AI and flagging when the trajectory is heading toward dependence rather than competence.

What the difference looks like in data

A student genuinely building understanding shows a specific temporal pattern in interaction data. Over time, they need less scaffolding on familiar problem types. They engage more readily with novel variants. Their response times on standard problems decrease as procedures become fluent, while their willingness to attempt genuinely hard problems increases. The balance between AI assistance and independent work shifts toward independence.

A student developing dependence shows a different pattern. Hint requests remain high or increase even on familiar problem types. Novel variants produce disengagement rather than productive struggle. Performance on AI-assisted tasks remains stable or improves while the gap between assisted and unassisted performance widens. The student is optimising for completing tasks, not for building capacity. These two patterns are distinguishable in longitudinal data. They are indistinguishable in session-level data. This is why the instrument matters as much as the intervention. Without the longitudinal model, you cannot tell the difference until the exam comes back.

The design implication

The implication for anyone building AI educational systems is precise. The question is not only whether the AI tutoring session is pedagogically sound, whether it uses Socratic questioning, manages cognitive load, adapts to the learner's level. Those things matter. But they are necessary, not sufficient. The question that also needs answering, continuously, over months, is: is this student becoming more capable or more dependent? Is the trajectory of their independent reasoning capacity moving toward the student who remains sharp when the AI is removed, or toward the hollow shell? Answering that question requires a different kind of system than a tutoring system. It requires a longitudinal model of how the learner is developing - one that watches not just individual sessions but the arc of change across them. That system does not yet exist in mainstream deployment. Building it is, I think, one of the most important problems in educational AI.

¹ Bastani, H. et al. (2025). Generative AI without guardrails can harm learning. PNAS, 122(26), e2422633122. ² Kosmyna, N. et al. (2025). Your brain on ChatGPT: Accumulation of cognitive debt. arXiv:2506.08872.