01 · The sharpshooter's method
First shoot, then draw the target
The fallacy is named for a Texan marksman who sprays a barn wall with bullets, finds the tightest cluster of holes, and paints a target around it. Inspected after the fact, any spray of randomness contains clusters — the deceit is in pretending the bullseye came first.
A p-value below .05 is a promise about a single, pre-aimed shot: if there were nothing here, results this extreme would arise by luck only one time in twenty. Fire twenty shots — twenty subgroups, twenty outcome measures, twenty foods, twenty trading rules — and "one time in twenty" stops being reassuring and starts being a schedule. The figure above is nothing but that schedule running on schedule.
The trap rarely announces itself, because nobody publishes the nineteen misses. A paper, press release or pitch deck shows you the painted target — left-handed smokers respond to the drug, p = .03 — and the wall full of stray holes stays in a drawer.
A p-value answers "how surprising is this shot?" It cannot answer "how many shots were fired?" — and the second question is the one that matters.
02 · The most instructive corpse in neuroscience
The salmon that passed the test
In 2009 the neuroscientist Craig Bennett and colleagues placed a whole Atlantic salmon — purchased at a market, definitively dead — into an fMRI scanner, showed it photographs of people in social situations, and "asked" it to judge their emotions. Then they analysed the scan the way much of the field then did: roughly 130,000 tiny brain regions, each tested separately for task-related activity, with no correction for the number of tests.
A small cluster of voxels in the dead fish's brain cavity lit up as statistically significant. The salmon, by the conventional threshold, was processing human emotion.
The study — which earned an Ig Nobel Prize in 2012 — was a deliberate piece of statistical theatre. At a per-test false-positive rate, 130,000 tests must produce a scatter of spurious hits, and adjacent spurious voxels will sometimes form convincing-looking clusters. Its serious legacy is that multiple-comparison corrections, once skipped in a sizeable fraction of imaging papers, became impossible to omit politely.
03 · The honest researcher's version
The garden of forking paths
The sharpshooter needn't be cynical. As Andrew Gelman and Eric Loken argued, a researcher who runs one analysis can still be implicitly choosing it from a garden of forking paths: exclude that outlier or keep it, adjust for age or don't, take the mean or the median, report endpoint A or endpoint B. Each choice is defensible; together they multiply into dozens of analyses that could have been run — and the one that reaches print is, naturally, one that "worked". No single dishonest step is taken, and the wall still ends up painted.
The known remedies all amount to fixing the target before firing. Pre-register the analysis. Correct the threshold for the number of comparisons — divide it Bonferroni-style, or control the false-discovery rate. Hold out data the choices never touched. And the cheapest remedy of all: treat any surprising, slice-specific finding as a hypothesis for the next study, not a conclusion from this one.
04 · Field notes
Painted targets in the wild
Clinical trials taught the lesson on purpose. The landmark ISIS-2 heart-attack trial (1988) demonstrated convincingly that aspirin saves lives — and its authors, to dramatise the perils of subgroup slicing, also reported that the benefit vanished for patients born under Gemini or Libra. Peter Austin's group later ran the joke in reverse, trawling Ontario hospital records by star sign and duly "finding" zodiac-linked diagnoses. Both were warnings, dressed as findings, about how easily slicing manufactures significance.
Finance backtests. Try a thousand trading rules on the same price history and dozens will be wildly "profitable" in-sample, for exactly the reason the dead salmon thought about photographs. The strategies that get sold are the painted targets; the out-of-sample future is the part of the wall nobody shot yet.
Cancer clusters and headline nutrition. The term "Texas sharpshooter" comes from epidemiology, where apparent disease clusters around some landmark must be weighed against the thousands of neighbourhoods in which nobody went looking. Food-and-health headlines run the same engine: survey hundreds of foods against dozens of outcomes and chance alone keeps the front pages stocked.
The protective question never changes: how many shots were fired before this target was painted? Ask how many subgroups, endpoints, foods or strategies were tested; whether the hypothesis predates the data; whether the threshold was corrected; whether it replicates on a wall the shooter hasn't seen. A genuine bullseye survives all four questions. A painted one rarely survives the first.