Reference
The glossary.
Every illusion, bias and term in the compendium, defined in a sentence — each linked to the entry or tool that explains it in full.
Anscombe's quartet
Four datasets built by Francis Anscombe (1973) with nearly identical mean, variance, correlation and regression line, but completely different shapes when plotted — the classic case for graphing data rather than trusting summary statistics alone.
Base-rate fallacy
Judging the probability of a condition from a test's accuracy while ignoring how rare the condition is — confusing P(positive | sick) with P(sick | positive).
Benford's Law
In data spanning many orders of magnitude, the leading digit is small far more often than large — a 1 about 30% of the time. Used to flag possible fraud, and often misapplied to data it doesn't fit.
Berkson's paradox
A spurious negative association created between two independent traits when the sample is selected on something both traits influence — a collider.
Collider
A variable that two others both point into. Conditioning on a collider — by selecting or filtering on it — opens a false association between its causes.
Confounder
A variable that influences both the supposed cause and the supposed effect, creating an association that is not causal unless the confounder is held fixed.
Ecological fallacy
Inferring something about individuals from statistics about the groups they belong to. What is true of populations can be false of every person in them.
Friendship paradox
On average, your friends have more friends than you do — because sampling people through their friendships favours the highly-connected.
Gambler's fallacy
The belief that independent random events are “due” to correct — that after a run of one outcome, the other becomes more likely. A fair coin or wheel has no memory, so each outcome stays equally likely. Also called the Monte Carlo fallacy.
Goodhart's law
When a measure becomes a target, it ceases to be a good measure. Rewarding a proxy metric makes people optimise the metric rather than the goal it stood for, so the two come apart.
Inspection paradox
Observations made by arriving into, or sitting inside, intervals and groups oversample the large ones in proportion to their size.
Lord's paradox
Analysing change scores and analysing baseline-adjusted outcomes can give contradictory verdicts from the same before-and-after data.
Monty Hall problem
A probability puzzle in which switching your choice after the host reveals a losing option wins two-thirds of the time — because the host's reveal is constrained, and constraint carries information.
P-hacking
Trying many analyses of one dataset and reporting whichever crosses the significance threshold — whether deliberately or through a garden of forking paths.
Positive predictive value (PPV)
The probability that a positive test result is correct. It depends on the test's accuracy and on how common the condition is in the population tested.
Regression to the mean
Extreme measurements tend to be followed by less extreme ones, because part of any extreme is luck that does not repeat. Often mistaken for a real effect.
Relative vs. absolute risk
Relative risk states a change as a proportion of a baseline (“+50%”); absolute risk gives the real change in cases. A relative figure is uninterpretable without the baseline it omits — the press's favourite omission.
Sensitivity & specificity
Sensitivity is the share of truly positive cases a test catches; specificity is the share of truly negative cases it clears. Neither tells you what a positive result is worth.
Simpson's paradox
A trend present in every subgroup of data reverses or disappears when the subgroups are combined.
Specification curve
A plot of the result of every reasonable analysis of a dataset at once, used to see whether a finding survives defensible changes or vanishes with them.
Spurious correlation
A statistical association between two variables that are not causally related — usually because both are driven by a third factor, or because both trend over time and drift in step. Strong correlations arise from unrelated random data surprisingly often.
Survivorship bias
Drawing conclusions only from the people or things that passed a survival filter, while the failures — the informative ones — remain invisible.
Texas sharpshooter fallacy
Firing many shots, then drawing the target around the tightest cluster — treating chance patterns found after the fact as if they were predicted.
Truncated axis
A chart whose quantitative axis does not begin at zero, visually exaggerating differences. A genuine distortion for length-based encodings like bars.
Will Rogers phenomenon
Moving an element from one group to another can raise the average of both groups at once — progress that exists only on paper.
Winner's curse
The tendency for a value selected for being most extreme — a winning auction bid, or a statistically significant result — to overestimate the truth. With publication bias, it inflates the effects reported in the scientific literature.