
.png)
This week we begin our second randomised controlled trial of a constrained AI tutor on Eedi, a four-arm study running with 1,525 students across 10 UK secondary schools in years 8, 9, and 10. The diagnostic engine at the heart of this trial was used in our first AI Tutor study in 2025, undertaken in partnership with Google DeepMind's research team. This new trial, also with DeepMind, extends that work and approach - a slow and measured build of evidence rather than a rush to deliver and scale. We'll run it in classrooms over the course of 12 weeks, with learning outcomes measured using Renaissance's STAR Maths assessment.
This trial involves a "constrained" AI tutor, but what does that mean? Our AI tutor is not a general-purpose chatbot that students can engage with to get help on their homework. It is designed to work in a distinctly different way: it activates only when a student answers a diagnostic check-in question incorrectly, and the conversation is bounded to the specific construct the student is learning about. This design is a deliberate response to the growing evidence base on what unconstrained AI does to learning. Bastani and colleagues (2025) found that students using an unconstrained AI tutor improved while learning with it. On post-tests without the tool, they performed significantly worse than students without AI access at all. They concluded that generative AI without guardrails can harm learning. The risk is cognitive offloading: the tool does the work, the student does not.
At Eedi, we instruct the AI to support the student through one specific moment of difficulty, then return them to the lesson.
The reason we can bound the conversation this tightly is the diagnostic engine beneath our testbed tool: Eedi School. Every question in our diagnostic questions library has one correct answer and three incorrect ones, and each incorrect answer (a ‘distractor’) is mapped to a specific, named misconception. When a student picks a wrong answer, we know something precise about their thinking; not just that they got the question wrong, but why. That diagnostic signal is what we pass to the AI tutor, and it is what turns a generic model into something that can speak to a specific student about a specific misunderstanding. Eedi’s diagnostic engine is the intelligence layer for maths teaching and learning, built and refined over nearly a decade of classroom use.
AI tutoring is a crowded marketplace, with a growing number of benchmarks and usability studies. But rigorous causal evidence on efficacy remains scarce. Our first study with Google DeepMind in 2025 took an initial step toward building that evidence base. In fact, the Stanford SCALE Initiative recently included the study in their 2026 review of AI in K-12 as one of only 20 high-quality causal studies identified from over 800 papers reviewed. We kept that initial study small by design, allowing us to focus on rigor: 165 students, five schools, seven weeks. Supervising tutors approved 74.4% of AI-drafted messages without any edits, the safety audit found zero instances of harmful content, and our Bayesian analysis attributed a 93.6% posterior probability to supervised AI tutoring producing greater knowledge transfer than human tutoring alone. We hold those findings lightly; they are signposts, not conclusions. See the study highlights here.
This second RCT is larger, longer, and asks a more specific question: how much does student-level context matter to the quality of AI tutoring?
The trial compares four conditions:
The questions we’re investigating are practical:
Now, onto actually running the trial. We will share the results when they arrive in Summer 2026.