Transforming the US education system
with CZI AI learning science
RightOn! Education ❋ Product Designer ❋ Jan - Mar (2026)
Overview
In partnership with the Chan Zuckerberg Initiative and Learning Commons, we collaborated with schools across the US to apply groundbreaking AI learning science technologies to the American education system.
I spearheaded the end-to-end design process of the project with one other designer, developing a fully deployed dashboard prototype piloted across 4 classrooms in the US. Throughout this project, I took on both design engineering and project management responsibilities.
Toolkit: Cline, Copilot, VSCode, Github
🧩 Challenge
Teachers often lack the time, expertise, and support to draw insights from student work.
Discussions with teachers/education experts taught us that misconceptions surfaced in homework and assessments frequently go unaddressed in upcoming lessons, resulting in missed opportunities to prioritize key learning gaps and apply targeted, evidence-based interventions.
Time-consuming
No time to sift through stacks of student work
Identifying patterns
Difficulty pinpointing the exact learning gap(s) to address
Response-to-data
Lesson planning is both time consuming and inconsistent due to lack of structured support
💥 Opportunity
New technology: Knowledge Graph
Learning Commons recently introduced a groundbreaking Knowledge Graph that maps academic standards, skills, and concepts into an interconnected network. By structuring this information in a machine-readable way, it enables AI systems to understand not just individual topics, but how knowledge develops and builds over time.

We partnered with Learning Commons to bring AI technologies to the classroom. By leveraging the Knowledge Graph, we aimed to standardize education across the US and build the infrastructure to deliver learning science at scale.
Given RightOn’s focus on math education, we centered our dashboard design around the needs and workflows of math classrooms.
How might we design a teacher dashboard that leverages knowledge graph technology and real student work to drive targeted, data-informed instructional decisions in math classes?
🤝 Partnership
Collaborating with schools across the US
We collaborated closely with teachers and leadership at schools across the US to bring the dashboard to life. We spoke bi-weekly with teachers to gain a thorough understanding of their unique education system, allowing us to integrate our dashboard into real classroom settings.

📏 Real-world Constraints
These schools standardize processes for teachers
The schools across the network we worked with follow a unique weekly schedule. In order to pilot our dashboard throughout these classrooms, we had to design with their processes in mind.

💡 Exploration
Iterating fast on AI capabilities⚡
Since we wanted to pilot our dashboard in about a month, we had to dedicate the bulk of our time towards building out the deployable product. We quickly iterated on hand-drawn sketches that we would later feed to AI agents for prototyping.






👷♀️ Prototyping
Building AI tools using AI: vibe-coding with Cline, Claude, and Codex
We set up Cline and Copilot in VS Code to build the dashboard’s frontend, while the AI systems and backend infrastructure were developed by our team's developer. To enable our AI model to identify learning gaps and generate targeted intervention activities, we collected and structured real classroom data for training.

🤔 Decisions, decisions… design decisions
Optimizing for early-career teachers
Most teachers do not yet have the expertise to extract actionable insights from student work to address misconception patterns. During our ongoing discussions with the schools, we continuously iterated on our prototype as we learned more about teacher needs and habits.
Helping teachers understand CCSS (Common Core State Standards)
A key component of the dashboard was helping teachers make sense of learning progressions within CCSS math standards. While these codes are widely used in schools, many teachers are not familiar with what they represent in practice. To bridge this gap, we introduced a hover interaction on CCSS tags that surfaces clear, contextual explanations directly within the interface.
Finding the sweet spot of teacher agency
While the dashboard surfaces multiple learning gaps, many teachers don't yet understand how to prioritize them.
Our final design maintains visibility across all identified gaps while introducing a clear “CORE” recommendation. This approach preserves teacher autonomy while reducing cognitive load, guiding them toward the most impactful next step without limiting their ability to explore further.

Guiding teacher decision-making
We intentionally avoided having teachers rely too heavily on precise percentages when prioritizing learning gaps. Instead, we introduced qualitative prevalence labels to communicate impact at a glance.

🚧 Technical Constraints
LLMs don't generate perfect output
Human-review
We presented teachers with multiple outputs across both learning gaps and suggested activities, leaving the final instructional decisions in their hands.
Quality > quantity
Due to limitations in the volume and diversity of classroom data available for training, we observed that the model would occasionally generate duplicate activities with only variations in wording. We refined our prompts to reduce the number of generated activities and prioritize clearly distinct outputs.
Limiting instructional complexity
We constrained the types of instructional moves generated, narrowing them to whole-class and split-class approaches. Expanding the range of instructional formats would introduce greater variability in model outputs, adding a layer of pedagogical complexity that we were not yet confident deploying in live classroom settings.
De-prioritizing amazing ideas :(
Given that our team only had one developer and less than a month to deploy a working site, the more technically complex features had to be backlogged. Here are some of my favorites:
💡 Pilot Testing
Multiple choice questions are flawed!
PPQs (power practice quizzes) were always made up of five multiple choice questions and one short-answer question. The issue with multiple choice questions is that they don't provide much insight into student thinking, and more importantly, guessed answers falsely reflect an understanding (or lack thereof) of a concept.
Our solution? Leveraging RightOn's existing strategy of implementing confidence checks. We implemented a confidence meter after each question, informing teachers about whether students understood what they were choosing.

What we didn't expect from real-world data
After receiving the first week's quiz results, we were jump-scared by the poor performance across classes.
Why was every other question answered incorrectly?
We quickly realized it was due to the inclusion of confidence checks, which were implemented as questions that tampered with scores.
Why were there combined answers?
Some students erased an answer to select a new one, but was still picked up by scantron as two answers at the same time. We had to remove these scenarios from the dataset.

💡 Insights
More context, please
Teachers need visibility into the specific assessment questions that provide the strongest evidence for each misconception, rather than relying solely on CCSS standards. Without this context, they must reconstruct student thinking on their own, increasing time, effort, and cognitive load during instructional decision making.

Teachers want lesson planning
The dashboard currently functions as a decision-support tool rather than a full workflow replacement. While it provides meaningful insights, teachers were not yet fully self-sufficient and expressed a desire for more structured, step-by-step lesson plans to guide implementation.

🏆 Result
We designed and built deployed dashboards that were piloted in 4 classrooms across the US
Teachers had the ability to review surfaced learning gaps, pick and save recommended activities, and track student progress after the re-assessment.

Names are blurred to protect student privacy.
Classes saw a
Before this, schools across the network had never tracked class progress after implementing intervention activities. During our pilot, participating classes saw an average improvement in concept mastery of 55.3% after intervention implementation (concept mastery was defined as scoring 100% on the re-assessment).

Names are blurred to protect student privacy.
💭 Reflection
Learning to be a designer, engineer, and PM all at once
The final theme, "Where ideas take flight".
I directly communicated with leadership at these schools, which meant managing timelines, coordinating data, and aligning on goals. It was difficult requesting so much data on time, so we had to push back timelines more than once. Furthermore, this project was highly technical and required us to be thoughtful in how we designed the LLM.
Developing domain expertise
Delving into this project was quite overwhelming as I was entering a new design space. There was a steep learning curve, with a significant amount of context, domain knowledge, and specialized terminology; I spent time immersing myself in the education landscape and reviewing existing materials to get up to speed. This helped me move beyond surface-level assumptions and design with greater clarity and intention.