01 · The inversion
Wald asked about the planes nobody could see
In 1943 the mathematician Abraham Wald was working for the Statistical Research Group in New York when the military brought him a damage problem. Armour is heavy; you cannot plate an entire bomber. Surveys of returning aircraft showed where the bullet holes accumulated — so reinforce the most-hit areas, surely?
Wald's contribution was to notice what the dataset was: not "where bombers get hit", but "where bombers get hit and survive". Anti-aircraft fire doesn't aim for wingtips. Hits were probably spread roughly evenly — which means the regions with no recorded holes were precisely the regions where a hit was fatal. His memos worked out how to estimate the vulnerability of each section from the survivors alone, and the recommendation followed: armour where the holes aren't.
Every survivorship problem has this same shape. A filter sits between the world and your data. The filter's output is vivid and countable; its discards are silent. And the silence is usually where the answer lives.
Every dataset has an admissions policy. Before trusting any pattern in the data, find out what the policy was — and what it rejected.
02 · The quiet version
The graveyard in the denominator
The financial industry runs Wald's problem in reverse, daily. Fund databases list the funds that exist today; funds that performed badly get closed or merged away, taking their track records with them. Compute "the average fund's historical return" from such a list and you are averaging over survivors — the corpses have been quietly removed from the denominator.
Studies of US mutual funds put the resulting inflation at roughly one to two percentage points a year — easily the difference between "active management works" and "it doesn't". The simulation below makes the mechanism visible: identical maths, no fraud, and the brochure number still floats well above the truth.
03 · Sampling the winners
Advice from the eight who made it
Success literature is survivorship bias with a book deal. Study a hundred celebrated founders and you'll find they took bold risks, ignored the doubters, dropped out, persisted. The problem is the control group you never meet: the thousands who did exactly the same things and quietly failed. A trait shared by winners tells you nothing until you know how common it is among the losers.
The same filter shapes gentler intuitions. Old buildings seem better built — because the flimsy ones were demolished a century ago. The music of past decades seems uniformly great — because radio only replays the survivors. Even a famous veterinary finding, that cats falling from above six storeys arrived with fewer injuries than cats falling from below, carries the same asterisk: the study could only count the cats somebody brought to a vet.
04 · Field notes
Three questions that find the graveyard
What was the full starting cohort? "Average return of funds in the database" and "average return of funds ever launched" are different numbers. Always ask which one you're holding.
Who exited, and why? If leaving the sample is correlated with the very outcome you're measuring — failure, death, dropping out, going bust — the remaining sample is biased by construction, no matter how large it is.
Would this trait also appear among the missing? Before crediting grit, risk-taking or open-plan offices for anyone's success, ask whether the failures had them too. If you can't observe the failures at all, treat the lesson as folklore, not evidence.
Survivorship bias is the rare statistical trap with a reliable tell: a dataset that sounds like an achievement. Companies still trading. Patients who returned for follow-up. Planes that came home. Whenever the sample had to accomplish something to be counted, the missing data is the data.