unspurious.
‹ The blog12 June 20269 min read

Survivorship bias: the evidence is in the graveyard

From Wald's bombers to falling cats and founder folklore — how filters write datasets, drawn out in five figures.

Every dataset you will ever meet has been through a doorway, and the doorway had rules. Companies had to stay solvent to appear in the database. Buildings had to stay standing to be admired. Patients had to return for the follow-up. Manuscripts had to be published, songs replayed, founders interviewed. Survivorship bias is what happens when we read the room and forget the door — when conclusions are drawn from whatever passed a filter, while the filter and everything it removed stay invisible.

It is, by a comfortable margin, the most intuitive illusion in this compendium and also the most quietly expensive. This post walks through its anatomy and four of its habitats — a war, a fund database, a veterinary journal and an airport bookshop — each with a picture of where the missing data went.

Wald and the planes that didn't come home

The founding story is from 1943. The mathematician Abraham Wald, working with the Statistical Research Group in New York, was handed a military damage problem: armour is heavy, bombers can't be plated everywhere, and surveys of returning aircraft showed exactly where the bullet holes accumulated — wings, tail, rear fuselage. The instinct, reasonable on its face, was to reinforce the places taking the most damage.

Wald's contribution was to name the doorway. The survey was not a sample of “where bombers get hit”; it was a sample of “where bombers get hit and survive to be surveyed.” Anti-aircraft fire doesn't aim for wingtips — hits were spread roughly evenly — which means the clean patches over the engines and cockpit weren't lucky. They were the wounds nobody came home from. His memos worked out how to estimate each section's true vulnerability from the survivors alone, and the recommendation inverted the instinct entirely: armour where the holes aren't.

The damage map, and what it doesn't showRecorded hits on returning aircraft · schematic; the polished story traces to Wald's real SRG memos
ENGINES — no recorded hitsCOCKPIT — no recorded hitsthe planes hit herenever made it homeeach dot: a hit logged on anaircraft that returned to base
Read the holes, or read the silence. Every dot is a hit an aircraft survived. The claret outlines mark the regions with no recorded hits at all — not because shells avoided them, but because the aircraft hit there are missing from the sample. The data's most informative regions are the empty ones.

The anatomy of the trap

Strip the war story to its skeleton and you get a structure worth memorising, because every case in this post — and most you'll meet in the wild — is the same machine with different labels. A filter sits between the world and your data. The filter's output is vivid, countable and easy to study. Its discards are silent, uncounted, and usually exactly where the answer lives.

One machine, many costumesThe structure beneath every case in this post
THE WORLDeverything that happenedTHE FILTERsurvival · publication · memoryYOUR DATASETvivid, countable, biasedthe silent discards: closed funds, demolished buildings,unread manuscripts, cats nobody brought to the vet— the part of the world that carries the answer
Every dataset has an admissions policy. Survival, publication, memory, fame — something decided what got through. The discards don't merely shrink the sample; when leaving it correlates with the outcome under study, they bend every conclusion drawn from what remains.

Note the crucial condition in that caption: the bias bites when exiting the sample is correlated with the thing being measured. A filter that discards at random merely shrinks your data. A filter that discards the failures, the dead, the demolished and the forgotten — while you study success, longevity, durability and greatness — manufactures conclusions wholesale.

The graveyard in your portfolio

Finance runs Wald's problem every day, in reverse. Fund databases list the funds that exist now; the ones that performed badly were closed or quietly merged away, taking their track records with them. Compute “the average fund's historical return” from such a list and you are averaging over survivors — the corpses have been removed from the denominator before you arrived. Academic studies of US mutual funds have put the resulting inflation at roughly one to two percentage points a year, which is easily the difference between “active management earns its fees” and “it doesn't.”

The simulation below makes the mechanism visible with no fraud anywhere in it: one hundred funds launched, identical rules, a closure threshold for sustained losses, and ten years of honest noise.

A simulated cohort: 100 funds launched, 10 years onSeeded simulation · funds close when cumulative losses breach a floor · illustrative
Still trading Closed along the way
THE COHORT — 86 ALIVE, 14 CLOSEDAVERAGE ANNUAL RETURN — THE 86 SURVIVORS+5.8% / yrAVERAGE ANNUAL RETURN — ALL 100 FUNDS LAUNCHED+3.7% / yrthe graveyard gap: 2.1 points / yr
Same cohort, two averages. Ask only the 86 funds still standing and the typical annual return is +5.8%. Put every fund ever launched back in the denominator and it drops to +3.7%. Nothing was hidden by anyone; the dead simply stopped being asked. Real-world estimates of this gap run one to two points a year.

The same arithmetic haunts backtests (strategies that would have failed were never brought to market), indices (failing companies are removed and replaced), and every “stocks always recover” argument built on the markets that happen to still exist. The investors of 1900 could have made the same argument about the St. Petersburg exchange.

The cat that fell eight storeys

In 1987, two New York veterinarians published a study of 132 cats brought in after falling from high-rise buildings, and reported one of the most delightfully counterintuitive curves in the literature: injuries rose with the height of the fall up to about six or seven storeys — and then declined. Cats falling from nine storeys arrived in better shape than cats falling from six. The proposed explanation was elegant physics: past a few storeys a cat reaches terminal velocity, relaxes, and spreads itself to land more softly.

It might even be true. But the dataset has a doorway, and the doorway is a veterinary clinic. A cat that dies on the pavement — or is so obviously beyond help that its owner never makes the journey — does not become a data point. The higher the fall, the more selective the doorway plausibly becomes, and the survivors of nine-storey falls may simply be the sturdy, lucky tail of a much grimmer distribution. The study can't tell the two stories apart, because the data it would need is precisely the data the filter removed.

High-rise syndrome and the miraculous dipAverage injury severity by storeys fallen · schematic, after Whitney & Mehlhaff (1987)
01234“THE MIRACULOUS DIP”23456789+STOREYS FALLENAVERAGE INJURIES PER CAT (SCHEMATIC)do high falls protect cats?or do the worst-hurt cats from high falls never reach the vet?
Two readings of one dip. The relaxed-cat physics is charming and possibly real. The survivorship reading is grimmer: the worst-hurt cats from the highest falls never enter the sample. The curve alone cannot referee between them — which is itself the lesson.

Once you see the cats, you see their cousins everywhere. Old buildings seem better built than today's — because a century of demolition removed the flimsy ones, leaving a curated exhibition of the sturdiest. The music of past decades seems uniformly great — because radio replays only the survivors of a brutal forgetting. “They don't make them like they used to” is, almost always, a true statement about the filter and a false one about the making.

Advice from the survivors

The most lucrative habitat of all is the airport bookshop. Success literature is survivorship bias with a publishing deal: study a hundred celebrated founders and you will reliably find they took bold risks, ignored the doubters, dropped out, persisted past all reason. Every word true. The problem is the bar that never gets drawn — the thousands who took the same bold risks, ignored the same doubters, and quietly vanished, unprofiled and uninterviewed.

The control group you never meetWhy traits of winners are uninterpretable alone
0%25%50%75%100%92%PROFILED FOUNDERS WHO“BET EVERYTHING”?FAILED FOUNDERS WHODID EXACTLY THE SAME“BOLD RISK-TAKING” AS A SECRET OF SUCCESS
One bar is not a comparison. That 92% of profiled founders “bet everything” sounds like a secret of success — until you ask what fraction of the failed founders did exactly the same. If the answer is also high, the trait predicts nothing but the willingness to gamble. The dashed bar is unmeasured in every airport book, and it is the entire question.

This is why “what do successful people have in common?” is the wrong question, no matter how rigorously it is answered. The right question — what distinguishes them from the equally bold failures — requires data the filter destroyed. Absent that, treat founder folklore the way Wald treated the bullet holes: as a map of what is survivable, not of what works.

Three questions that find the graveyard

What was the full starting cohort? “Funds in the database” and “funds ever launched” are different denominators. “Buildings from 1900” and “buildings built in 1900” are different populations. Always ask which one you are holding.

Who left, and was leaving correlated with the outcome? Random attrition shrinks a sample; selective attrition bends it. If exiting the data meant failing, dying, closing or being forgotten — and you are studying success, survival, longevity or greatness — the sample is biased by construction.

Would the celebrated trait also appear among the missing? Before crediting grit, risk, daily wine or terminal-velocity relaxation, ask whether the graveyard plausibly shares the trait. If you cannot observe the graveyard at all, hold the conclusion as folklore.

Survivorship bias is the rare trap with a reliable tell: a dataset that sounds like an achievement. If the sample had to accomplish something to be counted, the missing data is the data.

The full entry has the interactive bomber and the fund cohort live; its mirror image — Berkson's paradox, where the question is not who left the sample but how anyone got in — completes the pair of selection illusions in the compendium.

Related

Keep reading