From Guidelines to Guidance: Building Reliable AI That Thinks Clinically
The Problem: Clinical Guidelines Are Not Designed for Real-World Use
The Integrated Management of Neonatal and Childhood Illness (IMNCI) protocols guidelines, like many clinical protocols, are critical tools for frontline healthcare workers—especially in low-resource settings. But in practice, these guidelines are hard to use.
- They are often distributed as static PDFs/tables.
- They involve nonlinear, complex logic that is difficult to follow on paper.
- They require navigating complex, multi-step reasoning—often under time pressure.
This creates a major gap: how can we make expert-level clinical reasoning accessible and actionable for clinicians at the point of care?
Our mission is to close that gap using structured knowledge, interactive graphs, and LLM-powered assistants.
Why LLMs Alone and Traditional RAG Aren’t Enough
It’s tempting to think that a powerful LLM like GPT-4, maybe with some retrieval augmentation (RAG), can solve this problem. But in real-world clinical decision-making, vanilla RAG has serious limitations:
- No understanding of structure: Embedding-based retrieval methods miss the hierarchical, conditional logic that defines clinical guidelines.
- No guarantee of completeness or fidelity: A model might retrieve semantically similar text, but skip key constraints or misinterpret dependencies like “IF X AND Y, THEN Z.”
- Risk of hallucination: Without strong grounding or over-grounding, LLMs may confidently generate clinically unsafe or unsupported medical advice.
Even vector databases—while helpful for searching literature or notes—do not handle complex, rule-based reasoning well. They can't natively support multi-hop logic chains, personalized filtering, or transparency into why a decision was made.
A Fundamentally Different Approach
Instead of starting with unstructured documents and embeddings, we begin by building a clinical graph.
In our case, we transformed the IMNCI protocol into an interactive knowledge graph
using Neo4j. Each medical concept is represented as a structured node with a clearly
defined role (e.g., symptom, treatment, classification). Complex properties and
relationships are further decomposed using conditional logic with operators like AND,
OR, and IF.
This structure enables:
- Logic-aware retrieval: We query based on patient-specific observations and retrieve only clinically valid reasoning paths.
- Explainability: Every result is grounded in traceable logic from original guidelines.
- Fine-grained control: Queries can be adapted to local constraints (e.g., resource availability, test access) and patient-specific parameters (e.g., age group).
- Multi-hop reasoning: We can follow chains like “Symptom A → Classification B → Treatment C” in a transparent and deterministic way.
By pairing the knowledge graph with large language models, we get the best of both worlds: structured reasoning and fluent, usable explanations.
From Reactive to Proactive: Agentic Flows
Our early work focused on graph-based RAG: structured retrieval plus LLM generation. But clinical decision-making is not just retrieval—it’s a process.
So we’ve taken the next step: designing an agentic workflow powered by modular assistants and orchestrated with pydantic graph.
Our diagnostic agent doesn’t just answer questions—it guides clinicians through diagnostic reasoning, step-by-step, by:
- Determining the next best question for the physician to answer or procedure step for the physician to perform based on the context of the evolving diagnosis process
- Clarifying ambiguity and handling missing data with structured reasoning
- Adapting when information is missing or difficult to obtain
- Explaining why certain results are chosen (e.g., why a medical classification or treatment step was chosen)
This turns a static decision tree into a collaborative diagnostic experience—interactive, adaptable, and grounded.
Why We Believe This Is Worth Investing In
This approach is not just technically interesting—it’s necessary for frontline workers who need to make fast, accurate decisions in high-pressure environments.
In real-world healthcare, decisions are high-stakes, time is limited, and the margin for error is small. Static PDFs can't adapt. Black-box LLMs can't be trusted to reason clinically. And rule-based systems break under uncertainty.
To address these realities, we combine:
- A structured clinical graph as the source of truth
- LLMs for natural language understanding, information synthesis, contextualized response generation, and
- Agentic orchestration for step-by-step clinical reasoning
Together, these form a new kind of diagnostic assistant: interactive, transparent, and tailored to the patient in front of you.
These are tools we’ve built (and continue to improve) specifically for frontline care—especially in low-resource settings, where clinicians need real-time, trustworthy support, not just generic search results or static references.
What’s Next
Our journey so far:
- Modeled IMNCI as a graph
- Used that graph to power logic-aware retrieval and generation
- Built an agentic assistant to guide clinicians through diagnosis
- Generated synthetic data to simulate real-world pediatric encounters using LLMs
- Applied probabilistic reasoning for uncertain or incomplete pediatric cases
- Evaluated GraphRAG vs. RAG pipelines on real-world pediatric messages
Coming soon:
- Longitudinal patient care: Temporal flows for longitudinal care and follow-up plans
- Detecting emerging trends in patient healthcare: Detecting trends in patient data over time, including risk stratification and early warning systems
We’re building toward a future where intelligent assistants help clinicians—not just by answering questions, but by reasoning with them.
If you’re building in this space—or looking to fund work that makes clinical AI safer, smarter, and more equitable—we’d love to hear from you!