Glowing text extraction lines flowing from a medical chart PDF to ICD-10 code nodes on a dark teal-toned background
What It Does

Structured evidence from unstructured charts—on your hardware.

Medicare Advantage risk adjustment requires accurate, evidenced ICD-10 coding from real-world clinical charts. Real-world charts are messy: scanned pages, inconsistent formatting, OCR confusables, negation patterns, and buried diagnoses.

This system processes chart PDFs through a governed pipeline that extracts, detects, maps, and flags—without guessing. Every finding carries a chain of custody back to the source page and character offset.

Input

PDF · ZIP batch

Extraction

Native + OCR

Detection

ICD-10 codes

Mapping

CMS-HCC

Evidence

MEAT + provenance

Deployment

Docker / on-prem

Output

Report + CSV

PHI transit

None by default

How It Works

A governed pipeline from PDF to auditable report.

1

Chart ingestion — single PDF or batch ZIP

Upload one chart or a ZIP of many. The system queues them for processing via Celery workers. Charts are stored locally; no data leaves the deployment perimeter by default.

2

Text extraction with OCR quality diagnostics

Native PDF text is extracted first. For scanned pages, OCR is applied. OCR quality is assessed per-page; low-confidence pages are flagged with a "needs review" indicator rather than silently passed through.

3

ICD-10 detection with negation and confusable handling

The detection layer finds ICD-10 code mentions and clinical descriptions in real-world, messy formatting. It handles OCR character confusables (e.g., "I" vs "1") and explicitly models negation — "ruled out" conditions are not reported as active diagnoses.

4

CMS-HCC mapping per payment year model

Detected ICD-10 codes are mapped to CMS-HCC categories using imported CMS mapping tables. The payment year model is configurable. Per-condition risk summaries are generated using the selected model's coefficients.

5

MEAT evidence extraction with page/offset provenance

For each mapped condition, the system extracts evidence supporting Monitoring, Evaluation, Assessment, and Treatment criteria—with the page number and character offset where the evidence was found. Conservative binding means uncertain evidence surfaces as a flag, not a confident assertion.

6

Auditable report + CSV export

Output is a structured report and a CSV with per-condition findings, flags (ambiguous binding, no evidence found, OCR issues, negation detected), and provenance references. Every row is independently reviewable.

Conservative by design. When evidence is insufficient, ambiguous, or OCR-degraded, the system flags it. It does not fill gaps with inferences. A "needs review" flag is a feature, not a failure mode.

Flag Types in Output

FLAG Ambiguous binding — code detected but confidence below threshold
FLAG No MEAT evidence — condition mapped but evidence absent from chart
FLAG OCR quality issue — extraction from this page has reduced confidence
INFO Negation detected — condition mentioned as ruled out or historical

Synthetic Evaluation Packs

PHI-safe synthetic charts are included for deterministic regression testing. New releases are validated against these packs before deployment, so a code change cannot silently degrade detection behavior.

Real Validation Output

From chart PDF to auditable findings — a live example.

The example below is drawn from a synthetic evaluation chart (PHI-free) representing a Medicare Advantage patient with Type 2 Diabetes and Chronic Systolic Heart Failure. The pipeline detects both ICD-10 codes, maps them to HCC categories, and extracts full MEAT evidence — all bound to source offsets in the original document.

Source Chart (Synthetic PDF)

Synthetic clinical chart PDF showing ICD-10 diagnoses E11.65 and I50.32 with MEAT documentation sections

Synthetic chart — no real PHI. Page 1 of 1. OCR quality: PASS. Chart ID: forge_v2_01_dm_chf_full.

What the Pipeline Detected

E11.65 HCC37 RAF 0.318

Type 2 Diabetes with Hyperglycemia

Monitor Evaluate Assess Treat

HbA1c 7.2% reviewed · Metformin 1000mg BID continued · Confidence 0.85

I50.32 HCC85 RAF 0.323

Chronic Systolic Heart Failure

Monitor Evaluate Assess Treat

Echo EF 35% stable · Furosemide 40mg + ACE inhibitor · NYHA Class II · Confidence 0.85

Binding method

scope_match

RAF total

0.641

Needs review

None

Sentinel flags

None

Pipeline v1.0 (layer_c) · All MEAT criteria met for both conditions · 0 shared evidence across codes.

Full Pipeline Flow — Ingestion to RAF Score

HCC Validator AI pipeline visualization showing 5 steps: PDF ingestion, Layer A ICD extraction, Layer C MEAT binding, validation, and RAF scoring — with evidence cards for E11.65 and I50.32

Each evidence item carries a page number and character offset — every assertion is traceable to its source in the original chart.

Governance & Safety Rails

The strictest safety rails of any product we build.

Healthcare is the domain where a confident wrong answer causes the most harm. Every design decision in this system reflects that asymmetry.

Conservative binding

Detection confidence thresholds are tunable. Below the threshold, a flag is raised—not a binding. Operators can configure threshold levels based on their review capacity and risk tolerance.

Explicit negation modeling

Negation patterns ("ruled out," "no evidence of," "history of, resolved") are modeled explicitly. The system does not report negated conditions as present. Negation detections are logged separately.

Page + offset provenance

Every extracted finding includes the source page number and character offset. A reviewer can navigate directly to the supporting text in the original chart. Nothing is asserted without a citable source.

On-prem by default

The full pipeline — ingestion, OCR, detection, mapping, reporting — runs inside your infrastructure. No chart data, extracted text, or intermediate results leave your perimeter by default.

Deterministic regression testing

PHI-safe synthetic chart packs ship with the system. Releases are validated against these before deployment. Regression-safe by design: detection behavior changes are intentional, not accidental.

Operator-configurable models

CMS-HCC payment year models, detection thresholds, OCR quality flags, and review gate behavior are all configurable by the deploying organization. The system adapts to your policies, not the reverse.

Governed intelligence, not guesswork in healthcare means: the model assists, humans decide.

Detection and extraction are AI-assisted. Binding, review, and any downstream use are operator-controlled. The system is designed to be a rigorous first-pass, not a final authority. Read our Governed intelligence, not guesswork framework →

Regulatory context for healthcare AI

HIPAA AI vendor guidance, CMS enforcement updates, and the EU AI Act's healthcare provisions shift regularly. If you're operating AI in a regulated healthcare environment, tracking these is the first layer of governance — not an afterthought.

Try RuleBrief free →
Deployment Model

Docker-native. Runs in your environment.

The system is packaged as a Docker Compose stack. It runs entirely within your infrastructure. All components are customer-managed.

API Layer

FastAPI

Workers

Celery

Persistence

PostgreSQL

Queue

Redis

Full security & privacy details →
Who It's For

Built for organizations managing risk adjustment at scale.

  • Medicare Advantage health plans and managed care organizations
  • Risk adjustment and HCC coding teams processing large chart volumes
  • Organizations that require on-prem data handling for PHI
  • Teams that need auditable, defensible outputs — not black-box results

Talk to an engineer about your chart intelligence needs.

We'll walk through your chart volumes, OCR challenges, and review workflow—and show you what a governed, on-prem deployment looks like in practice.