Rethinking how we detect learning gaps

13 Mar, 2026

On why educational AI detects failure too late, and what detecting it earlier would require

Picture this: The time has come and the teacher returns carefully graded exams to his class. A student who seemed to be keeping up, who completed the homework, who raised their hand occasionally and gave reasonable answers, has failed. And looking back, the signs were there: a slight uncertainty three weeks ago, a pattern of errors that seemed random at the time but now, with hindsight, looks systematic. The understanding was becoming fragile long before the exam, however nobody caught it.

This is not a failure of teaching, but rather a failure of the instrument. The instruments available to most teachers, e.g. tests at defined intervals, classroom observation, homework completion, are all retrospective. They reveal what has already happened, yet they do not reveal what is developing.

The question that interests me is whether this is a structural limitation or a solvable problem. I think it is solvable, but solving it requires being precise about why current approaches - including the most advanced AI tutoring systems - still fail to catch the problem early.

Why AI tutoring does not solve this

The evidence for AI tutoring is genuinely encouraging. A 2025 randomised controlled trial at Harvard found effect sizes between 0.73 and 1.3 standard deviations for AI tutoring versus active classroom learning.¹ A 2025 Google DeepMind study found that supervised AI tutoring matched or exceeded human tutors on immediate learning outcomes across five UK secondary schools.² These are real results. The technology works, in the short term, under specific conditions. But notice what neither study measures: what happens to those students six weeks later? Was the misconception that was apparently resolved in the tutoring session had actually been resolved, or had been temporarily suppressed and was quietly reforming. The studies cannot measure this because the systems they evaluated have no longitudinal model of the learner as every session begins fresh. There is no way to look at what has happened over time and ask whether the trajectory is heading somewhere good.

A 2025 paper by Weidlich and colleagues in the Journal of Computer Assisted Learning made this point with precision.³ Auditing a widely cited meta-analysis claiming a large positive effect of AI on learning, they found that the vast majority of included studies lacked the methodological structure to draw causal conclusions, because they were measuring performance during AI use, not durable understanding after it. The field is observing effects it cannot yet explain, because it has no model of what is happening inside the learner over time.

The three things you cannot see without longitudinal data

The first is the difference between resolved and suppressed. A student who gets a misconception correct after a tutoring intervention may have genuinely understood the correction, or may have learned to produce the right answer in that specific format without the underlying conceptual shift. These two states look identical in session-level data. They look very different six weeks later, when the student encounters the concept in a novel context, or when they meet a dependent concept that requires the original to be genuinely solid.
The second is the cascade. Mathematical knowledge is hierarchical. A fragile understanding of algebraic structure does not stay contained — it propagates forward into every concept that builds on it. Catching the fragility at formation is a brief conversation. Catching it after it has cascaded through three dependent concepts is months of remediation, if it is catchable at all. The difference between these two scenarios is not the severity of the original problem — it is when it was detected.
The third is the dependence trajectory. Research published in PNAS in 2025 found that students using AI tutoring without appropriate guardrails performed significantly worse on subsequent unassisted assessments.⁴ A separate study demonstrated accumulating cognitive debt with repeated AI assistance — cortical engagement declining session by session in measurable ways.⁵ Both of these effects are longitudinal. A student who uses an AI tutor three times may benefit. A student who uses it three hundred times over a school year may be developing a dependence that looks like learning in each individual session and only reveals itself when the AI is removed. You cannot detect this from a single session. You can only detect it from a model of how the student's independent capacity is evolving over time.

What detecting it earlier would require

The instrument that would catch these three problems earlier is not a better test or a more frequent test. It is a different kind of representation entirely, one that tracks not what a student scored, but how their understanding is structured and how that structure is changing. One that is updated continuously, not at defined intervals. One that is trained to recognise the temporal signature of developing fragility - the specific patterns in longitudinal interaction data that precede performance failure - across thousands of learners with months of outcome data to learn from.

This is a harder thing to build than a tutoring system. It requires longitudinal data that most educational platforms do not yet have in usable form. It requires an architecture specifically designed for temporal sequence modelling rather than dialogue generation. And it requires the institutional partnerships that would allow such a system to be trained on real learners in real schools over real time, which is a governance and trust problem as much as a technical one. But it is buildable and the technical primitives exist. The data is there, in school databases across Europe, largely unanalysed. The question is whether anyone builds the infrastructure to use it - before the exam comes back, again, revealing what could have been caught in October.

¹ Kestin, G. et al. (2025). AI tutoring outperforms in-class active learning: an RCT. Scientific Reports, 15, 17458. ² LearnLM Team & Eedi (2025). AI tutoring can safely and effectively support students: an exploratory RCT in UK classrooms. arXiv:2512.23633. ³ Weidlich, J., Gašević, D., Drachsler, H., & Kirschner, P. (2025). ChatGPT in education: An effect in search of a cause. Journal of Computer Assisted Learning, 41(5), e70105. ⁴ Bastani, H. et al. (2025). Generative AI without guardrails can harm learning. PNAS, 122(26), e2422633122. ⁵ Kosmyna, N. et al. (2025). Your brain on ChatGPT: Accumulation of cognitive debt. arXiv:2506.08872.