Flat RAG Is Failing Medical Students

I've been doing UWorld blocks between call rooms and conference tables. You get a question wrong, click "Explanation," skim a paragraph you don't retain, move on. The wrong answer is buried, the review is perfunctory, and you'll probably miss the same concept again two weeks later on the shelf.

The study tools we have are fine. But fine isn't the same as good.

The problem with how we index medical knowledge

Most AI-powered study tools — the ones integrating First Aid, Pathoma, or any major resource — use flat retrieval. They chunk the PDF into paragraphs, embed them, stuff them in a vector database, and call it a day. You ask a question, get the three closest chunks.

The problem? First Aid isn't a bag of words. It's a tree.

Every page encodes a hierarchy:

Organ System
  └── Disease/Condition
        └── Pathophysiology
              └── Clinical Buzzwords
                    └── Treatment/Mnemonic

When you chunk by paragraph, you're severing those relationships. A chunk about "anti-GBM antibodies" has no idea it belongs to "Goodpasture syndrome" under "Pulmonary-Renal Syndromes" under "Renal." The retrieval system can surface the right words but miss the entire clinical framework that makes them stick.

This is the silent failure mode. The AI returns technically correct content that's missing the why and where.

The idea: tree-based retrieval for medical content

I came across a paper on PageIndex — a RAG approach that builds a hierarchical index instead of flat embeddings. The key insight: parse the source document by its section structure first, build a tree of nodes, and retrieve at the right level of the hierarchy.

For First Aid, that would look something like this:

Parse by headers: organ system → disease → subsection
Build a tree: each node contains its own content plus a pointer to its parent context
Index at multiple levels: embed both the leaf nodes (specific facts) and their ancestors (the conceptual scaffold)
Retrieve with context: when a query hits a leaf node, you get the full ancestral path — not just the buzzword, but the disease, the system, and the mechanism

The query "anti-GBM" doesn't just return the antibody fact. It returns: here's what it is, here's the disease it causes, here's where that disease lives in the clinical taxonomy, here's the treatment.

The UWorld connection

Here's where it gets interesting as a project.

UWorld already tells you what you got wrong. But it doesn't tell you why you keep getting it wrong or where the knowledge gap lives in your conceptual map.

What if you linked wrong answers back to the tree?

You miss a question about Goodpasture → system tags it as "Renal: Pulmonary-Renal: Goodpasture"
The tree node for Goodpasture surfaces automatically in your next review session
You see: "You've missed 3 questions that touch this node in the last 14 days"
Review resurfaces the full subtree — disease mechanism, buzzwords, and what it gets confused with — not just the isolated fact

This is what spaced repetition would look like if it understood medical hierarchies instead of just correct/incorrect signals.

Why this matters beyond UWorld

The same architecture applies anywhere you have hierarchical clinical documents:

Discharge summaries → tree by admission reason, problem list, interventions
Radiology reports → tree by modality, anatomy, finding, impression
Clinical guidelines → tree by condition, population, intervention, evidence level

For something like ChartLens — a project I've been building that does NLP on clinical documents — flat retrieval was always a ceiling. Tree-based indexing could actually surface the clinical reasoning chain, not just the surface-level match.

What I'd build

If I had a free weekend (and I don't, because surgery clerkship starts March 8), the stack would be:

Python + PyMuPDF to parse First Aid by section headers
spaCy to extract entities and map them to UMLS concepts
A simple tree data structure (JSON, honestly) with parent pointers at each node
ChromaDB or pgvector for embeddings at each level
A retrieval layer that, given a UWorld question topic, walks up the tree and returns the full ancestral context
A review UI — probably just a FastAPI backend + Svelte frontend — that shows the wrong-answer node in its full tree context

The paper does this more rigorously. But you could hack a working prototype in a weekend.

The broader point

We talk a lot about AI in medicine as if the hard problem is getting models to be accurate. But a lot of the actual failure modes are architectural. Flat retrieval is fast and cheap and loses exactly the structure that makes medical knowledge usable.

Tree-based indexing isn't new. It's just underused in medtech — probably because most medtech tools are built by people who haven't actually used First Aid.

I have. I'm going to build this.

I'm a second-year at UCLA DGSOM, leaning radiology, building health tech on the side. If you're working on something similar or have thoughts on better approaches to medical RAG, I'd love to hear it.