AIMedicineNLPProjectsHealth Tech

Flat RAG Is Failing Medical Students

I've been doing UWorld blocks between call rooms and conference tables. You get a question wrong, click "Explanation," skim a paragraph you don't retain, move on. The wrong answer is buried, the review is perfunctory, and you'll probably miss the same concept again two weeks later on the shelf.

The study tools we have are fine. But fine isn't the same as good.

The problem with how we index medical knowledge

Most AI-powered study tools — the ones integrating First Aid, Pathoma, or any major resource — use flat retrieval. They chunk the PDF into paragraphs, embed them, stuff them in a vector database, and call it a day. You ask a question, get the three closest chunks.

The problem? First Aid isn't a bag of words. It's a tree.

Every page encodes a hierarchy:

Organ System
  └── Disease/Condition
        └── Pathophysiology
              └── Clinical Buzzwords
                    └── Treatment/Mnemonic

When you chunk by paragraph, you're severing those relationships. A chunk about "anti-GBM antibodies" has no idea it belongs to "Goodpasture syndrome" under "Pulmonary-Renal Syndromes" under "Renal." The retrieval system can surface the right words but miss the entire clinical framework that makes them stick.

This is the silent failure mode. The AI returns technically correct content that's missing the why and where.

The idea: tree-based retrieval for medical content

I came across a paper on PageIndex — a RAG approach that builds a hierarchical index instead of flat embeddings. The key insight: parse the source document by its section structure first, build a tree of nodes, and retrieve at the right level of the hierarchy.

For First Aid, that would look something like this:

  1. Parse by headers: organ system → disease → subsection
  2. Build a tree: each node contains its own content plus a pointer to its parent context
  3. Index at multiple levels: embed both the leaf nodes (specific facts) and their ancestors (the conceptual scaffold)
  4. Retrieve with context: when a query hits a leaf node, you get the full ancestral path — not just the buzzword, but the disease, the system, and the mechanism

The query "anti-GBM" doesn't just return the antibody fact. It returns: here's what it is, here's the disease it causes, here's where that disease lives in the clinical taxonomy, here's the treatment.

The UWorld connection

Here's where it gets interesting as a project.

UWorld already tells you what you got wrong. But it doesn't tell you why you keep getting it wrong or where the knowledge gap lives in your conceptual map.

What if you linked wrong answers back to the tree?

This is what spaced repetition would look like if it understood medical hierarchies instead of just correct/incorrect signals.

Why this matters beyond UWorld

The same architecture applies anywhere you have hierarchical clinical documents:

For something like ChartLens — a project I've been building that does NLP on clinical documents — flat retrieval was always a ceiling. Tree-based indexing could actually surface the clinical reasoning chain, not just the surface-level match.

What I'd build

If I had a free weekend (and I don't, because surgery clerkship starts March 8), the stack would be:

The paper does this more rigorously. But you could hack a working prototype in a weekend.

The broader point

We talk a lot about AI in medicine as if the hard problem is getting models to be accurate. But a lot of the actual failure modes are architectural. Flat retrieval is fast and cheap and loses exactly the structure that makes medical knowledge usable.

Tree-based indexing isn't new. It's just underused in medtech — probably because most medtech tools are built by people who haven't actually used First Aid.

I have. I'm going to build this.


I'm a second-year at UCLA DGSOM, leaning radiology, building health tech on the side. If you're working on something similar or have thoughts on better approaches to medical RAG, I'd love to hear it.