unspurious.
‹ The blog12 June 20269 min read

What a positive test result actually means

Your test is 90% accurate and it came back positive. Why the honest answer is ‘about 9%’ — counted out in pictures.

You take a test for a condition you have no symptoms of — a routine screen, a workplace check, an at-home kit. The leaflet says the test is “90% accurate.” It comes back positive. How worried should you be?

Most people, when asked some version of this, answer “about 90% worried.” So, in repeated studies stretching back four decades, do most physicians. For a genuinely rare condition the right answer is usually closer to “mildly concerned, and ready for the follow-up” — often a single-digit percentage. The gap between those two answers is the base-rate fallacy, and closing it is one of the highest-value pieces of statistical literacy a person can own. This post closes it with pictures and a little counting.

Two questions wearing the same words

“Accuracy” quietly bundles together two different promises, and the leaflet only ever makes the first one. Sensitivity says: if you are sick, the test will probably say so. Specificity says: if you are healthy, the test will probably clear you. Both describe the test's behaviour when the truth is already known — the arrow runs from your condition to the result.

But a person holding a positive result needs the arrow to run the other way: given that the test said positive, how likely am I to be sick? That quantity has its own name — the positive predictive value — and it is not printed on the box, because it isn't a property of the box. It depends on who is being tested. When the condition is rare, the two arrows are not even close.

The same word, two opposite arrowsA test with 90% sensitivity, screening a condition with a 1% base rate
THE QUESTION THE LEAFLET ANSWERSYOU ARE SICKgiven / known90% of the timeTEST SAYS POSITIVEthe outcomeTHE QUESTION YOU ARE ACTUALLY ASKINGTEST SAYS POSITIVEgiven / known≈ 9% of the timeYOU ARE SICKthe question
The leaflet's promise vs. your question. The solid azure arrow is what “90% accurate” means: given sickness, the test catches it. The dashed claret arrow is the question a positive result actually poses — and at a 1% base rate, its answer is about 9%. Same test, same numbers, opposite direction.

Count a thousand people, not percentages

Why is the second arrow so weak? Because the healthy massively outnumber the sick, and even a small error rate applied to an enormous group produces a crowd. The psychologist Gerd Gigerenzer showed that this becomes nearly self-evident the moment you stop multiplying percentages and start counting actual people — a format he calls natural frequencies. So let's count.

Take 1,000 people screened for a condition with a 1% base rate, with a test that catches 90% of true cases and falsely flags 9% of healthy people (numbers in the ballpark of real mammography screening, which is where this example was first studied). Ten of the thousand are truly sick, and the test catches nine of them. But the other 990 are healthy, and 9% of them — about 89 people — get flagged anyway. Follow the branches:

The whole calculation as a counting tree1,000 people · 1% base rate · 90% sensitivity · 91% specificity
1,000 peopleSCREENED10TRULY SICK990HEALTHY1%99%9TEST POSITIVE1TEST NEGATIVE≈89TEST POSITIVE≈901TEST NEGATIVE90%10%9%91%≈98 positives, of whom 9 are sick → about 9%the leaflet said “90% accurate” — both are true
Four branches, no algebra. Every number a positive result depends on is on this tree, and the answer falls out by inspection: the circled positive branches collect 9 sick people and about 89 healthy ones. When David Eddy posed essentially this problem to physicians in 1982, most answered around 75%; the tree says 9%.

It is worth dwelling on what just happened, because nothing about the test was bad. It caught nine of the ten sick people — exactly the 90% promised. The problem is purely that it was hunting something rare: the ten true positives are simply outnumbered by the residue of error from the 990 healthy people. Gather everyone holding a positive result into one room and the room looks like this:

The room full of positivesEveryone the screen flagged, gathered
Truly sick False alarm
THE 98 PEOPLE IN THE POSITIVE PILE9 truly sick≈89 false alarms→ the result that frightened you is, by itself, ≈91% likely to be one of the ochre squares
The ratio in this picture is your answer. Nine azure squares among ninety-eight. A positive result moved your odds from 1-in-100 to roughly 1-in-11 — a genuinely informative update, nine times the background risk — but nowhere near the near-certainty the word “positive” smuggles in.

The same test, everywhere on one curve

Here is the deeper point the counting hides: the worth of a positive is not one number. Hold the test fixed and let only the base rate vary, and the positive predictive value sweeps along a curve — nearly worthless when screening the symptomless general population, a coin flip somewhere in the middle, and genuinely decisive among high-risk patients in a specialist clinic.

What a positive is worth, by base rateSensitivity 90% and false-positive rate 9%, held fixed throughout
0%25%50%75%100%0%5%10%15%20%HOW COMMON THE CONDITION IS (THE BASE RATE)CHANCE A POSITIVE IS TRUEA COIN FLIP0.1% → 1%1% → 9%5% → 34%10% → 53%
The curve nobody prints on the box. The identical test delivers a 1% trustworthy positive at a 0.1% base rate, 9% at 1%, and doesn't reach coin-flip territory until the condition affects roughly one person in eleven. This is why the same assay can be excellent diagnostics and misleading mass screening.

This curve explains a lot of otherwise confusing medical practice. It is why doctors ask about symptoms, family history and risk factors before testing — each “yes” slides you rightward along the curve, raising the base rate and therefore the meaning of any positive. It is why a positive result in a 25-year-old with no risk factors is treated differently from the same result in a symptomatic 60-year-old. And it is why mass screening programmes agonise over who to invite: screen too broadly and the programme manufactures false alarms by the thousand, each carrying real anxiety, follow-up procedures and cost.

So what do you actually do with a positive?

Not nothing — and not panic. The practical reading is that a screening positive is the start of a diagnostic process whose entire design anticipates this arithmetic. The screen's job is to concentrate the search: it takes a population where the condition is 1-in-100 and produces a much smaller group where it is 1-in-11. Inside that group, a second, more specific test now operates at a far friendlier base rate — the curve above, entered further to the right — which is why confirmatory testing works so much better than the first screen did.

Screen, then confirmThe process a positive result is designed to trigger · illustrative, at a 1% base rate
10,000SCREENEDpeople with no symptoms981FLAG POSITIVE90 truly sick · 891 healthyFOLLOW-UPMORE SPECIFIC TESTbase rate inside this group: ≈9%≈90 / ≈891CONFIRMED / CLEAREDthe screen did its joba positive screen is the start of a process — it concentrates the search, it doesn’t conclude it
Concentration, not conclusion. The screen narrows 10,000 people to 981, the follow-up sorts the 981 at a base rate ninety times richer than the original population's. Each stage is an honest filter doing the one job filters can do: changing the base rate for the next stage.

The same logic, run in reverse, also tells you when to be sceptical of reassurance: a negative result on a rare condition was overwhelmingly likely anyway, so it carries less information than it feels like it does — though happily, for rare conditions negatives are also almost always right.

The courtroom version

Before closing, one sobering note: this exact confusion of conditionals has a name in law — the prosecutor's fallacy — and a body count of miscarried justice. “The chance of this evidence arising by innocent coincidence is one in a million” is the leaflet's arrow; “the chance the defendant is innocent is one in a million” is the reversed one, and equating them ignores the base rate exactly as our test-taker did. In the British case of Sally Clark, an expert's “one in 73 million” figure for two natural infant deaths helped convict a grieving mother before the Royal Statistical Society's public protest and the eventual quashing of the verdict. DNA database trawls raise the same spectre: search several million profiles and a one-in-a-million match probability statistically guarantees innocent hits. The full entry treats both cases properly.

The one-sentence habit

Before reading any test result, ask how common the thing was before the test. Then count a concrete thousand.

That habit — base rate first, people not percentages — defuses the fallacy in medicine, security, hiring filters and spam folders alike. If you want it in your hands rather than your head, the base-rate explorer runs the whole calculation live: set the rarity and the accuracy, and watch a thousand people sort themselves into exactly the piles drawn above.

This is an explainer about statistical reasoning, not medical advice. A real result always deserves a conversation with a clinician who knows your history, your risk factors, and which test you actually took — all the things that move you along the curve.

Related

Keep reading