What is survivorship bias?

Survivorship bias is the error of drawing conclusions only from the people or things that passed through a filter — the survivors — while the failures that were removed stay invisible. Because the discards are usually exactly the informative cases, the surviving sample is skewed, and any “secret of success” read from it can be pure folklore. The classic example is Abraham Wald reinforcing the parts of returning warplanes that had no bullet holes, because the planes hit elsewhere never came back.

Survivorship Bias — unspurious: a compendium of statistical illusions

01 · The inversion

Wald asked about the planes nobody could see

In 1943 the mathematician Abraham Wald was working for the Statistical Research Group in New York when the military brought him a damage problem. Armour is heavy; you cannot plate an entire bomber. Surveys of returning aircraft showed where the bullet holes accumulated — so reinforce the most-hit areas, surely?

Wald's contribution was to notice what the dataset was: not "where bombers get hit", but "where bombers get hit and survive". Anti-aircraft fire doesn't aim for wingtips. Hits were probably spread roughly evenly — which means the regions with no recorded holes were precisely the regions where a hit was fatal. His memos worked out how to estimate the vulnerability of each section from the survivors alone, and the recommendation followed: armour where the holes aren't.

Every survivorship problem has this same shape. A filter sits between the world and your data. The filter's output is vivid and countable; its discards are silent. And the silence is usually where the answer lives.

Every dataset has an admissions policy. Before trusting any pattern in the data, find out what the policy was — and what it rejected.

02 · The quiet version

The graveyard in the denominator

The financial industry runs Wald's problem in reverse, daily. Fund databases list the funds that exist today; funds that performed badly get closed or merged away, taking their track records with them. Compute "the average fund's historical return" from such a list and you are averaging over survivors — the corpses have been quietly removed from the denominator.

Studies of US mutual funds put the resulting inflation at roughly one to two percentage points a year — easily the difference between "active management works" and "it doesn't". The simulation below makes the mechanism visible: identical maths, no fraud, and the brochure number still floats well above the truth.

A simulated cohort: 100 funds launched, 10 years on Funds close when losses breach their floor · Seeded simulation, illustrative

Still trading Closed along the way

Fig. 2 — Same cohort, two averages. Ask only the funds still standing and the typical annual return looks handsome. Put every fund ever launched back in the denominator — including the ones that blew up in year two — and the figure drops. Nothing was hidden; the dead simply stopped being asked.

03 · Sampling the winners

Advice from the eight who made it

Success literature is survivorship bias with a book deal. Study a hundred celebrated founders and you'll find they took bold risks, ignored the doubters, dropped out, persisted. The problem is the control group you never meet: the thousands who did exactly the same things and quietly failed. A trait shared by winners tells you nothing until you know how common it is among the losers.

The same filter shapes gentler intuitions. Old buildings seem better built — because the flimsy ones were demolished a century ago. The music of past decades seems uniformly great — because radio only replays the survivors. Even a famous veterinary finding, that cats falling from above six storeys arrived with fewer injuries than cats falling from below, carries the same asterisk: the study could only count the cats somebody brought to a vet.

One startup cohort, and the slice that gets studied Areas drawn to scale

Fig. 3 — The interview pool. Of a thousand startups founded in one cohort, a handful exit big — and the books, keynotes and podcasts are built almost entirely from that sliver. Whatever habits the eight share, most of the other 992 probably shared them too.

04 · Field notes

Three questions that find the graveyard

What was the full starting cohort? "Average return of funds in the database" and "average return of funds ever launched" are different numbers. Always ask which one you're holding.

Who exited, and why? If leaving the sample is correlated with the very outcome you're measuring — failure, death, dropping out, going bust — the remaining sample is biased by construction, no matter how large it is.

Would this trait also appear among the missing? Before crediting grit, risk-taking or open-plan offices for anyone's success, ask whether the failures had them too. If you can't observe the failures at all, treat the lesson as folklore, not evidence.

Survivorship bias is the rare statistical trap with a reliable tell: a dataset that sounds like an achievement. Companies still trading. Patients who returned for follow-up. Planes that came home. Whenever the sample had to accomplish something to be counted, the missing data is the data.

The evidence that never came home.