01 · What just happened
The summary is not the data
A summary statistic is a compression — it throws away almost everything and keeps a single number. Usually that is exactly what we want. Anscombe's quartet is the unforgettable demonstration of what compression can hide: four datasets that agree on every standard summary a regression report would print, yet describe four completely different situations. One is a fair linear relationship. One is a perfect curve. One is a clean line knocked askew by a single outlier. One has no relationship at all, propped up entirely by one stray point.
If you had been handed only the table of statistics — mean, variance, correlation, the fitted line y = 3.00 + 0.500x, an R² of 0.67 — you could not tell which world you were in. You would fit a straight line to all four and report the same confident result for each. Three of those four reports would be wrong, and the numbers would never warn you. The only thing that does is the picture.
Anscombe's own framing, in his 1973 paper, was a rebuke to a belief he saw among statisticians: that “numerical calculations are exact, but graphs are rough.” His quartet flips it — here the calculations are the rough instrument and the graph is the exact one.
02 · What is held constant
Seven numbers, frozen solid
The trick is engineered with care. Across all four datasets, seven of the most-reported quantities in applied statistics are identical to the precision shown — not approximately, but by construction.
These are not obscure quantities; they are the backbone of a first-pass analysis. Mean and spread describe each variable, correlation and R² claim to describe their relationship, and the regression line claims to summarise it. Anscombe's point is that this entire battery can be satisfied by data that violates every assumption underneath it.
03 · Four different realities
What the eye sees that the numbers can't
Switch through the four panels in the explorer above and each tells its own story. Dataset I is the honest one — a noisy straight line, exactly the situation linear regression is built for. Dataset II is a clean curve; the relationship is real and strong, but it bends, so a straight line and a correlation coefficient are answering the wrong question. Dataset III is ten points in a near-perfect line plus one outlier that tilts the fitted line and drags the correlation down from a perfect 1.00 to 0.816. Dataset IV is the most unsettling: every point shares the same x value except one, and that single high-leverage point at x = 19 fabricates the entire correlation out of nothing.
Here is the part most retellings leave out, and the part Anscombe actually cared about. His paper was an argument for graphical regression diagnostics — and the sharpest of those is the residual plot, which shows how far each point sits from the fitted line. Plot the residuals and the four datasets stop hiding.
This is why “always plot your data” is too weak a moral. The richer one is: plot the right thing. A scatter reveals shape; a residual plot reveals whether your chosen model fits; a leverage check reveals whether any single point is quietly running the show.
04 · Feel the leverage
One point can own the whole line
Datasets III and IV are really about influence — the unequal power of individual points over a fit. The cleanest way to understand it is to do it. Below is a small linear cloud; drag the red point and watch the regression line, the correlation and R² lurch in response. Pull it far out along the x-axis, the way dataset IV's stray point sits, and you will find you can set the slope to almost anything you like, single-handed.
05 · Field notes
From a quartet to a dinosaur
The modern sequel. For decades nobody knew how Anscombe built his datasets. Then in 2017 Justin Matejka and George Fitzmaurice showed how to generate any shape you like with a fixed set of summary statistics, using simulated annealing to nudge points around while holding the numbers still. Their showpiece, the “Datasaurus,” is a scatter plot of a dinosaur whose mean, variance and correlation match a boring blob's — along with a dozen other shapes that all share the same statistics.
Why it still matters. It would be comforting to think Anscombe's quartet is a museum piece from before everyone had plotting software. It isn't. Automated pipelines, dashboards and machine-generated reports increasingly hand people summary numbers — a correlation here, an R² there — with no plot attached, at a scale Anscombe never imagined. The failure mode he diagnosed is more common now, not less.
So the checklist question is almost insultingly simple, and it is the one professionals still skip under deadline: have you actually plotted it? Not summarised it, not correlated it — plotted the raw points, and then plotted the residuals. The rest of the compendium is full of subtle illusions; this is the one with the simplest cure, and it is still the one most often left undone.
Continue the field guide
More ways to be honestly wrong
Lie With This Chart
The flip side of Anscombe: when the data is fine but the drawing of it does the lying. Drag an axis off zero.
№ 10 · INFERENCE ILLUSIONSRegression to the Mean
Another way a fitted line misleads: extremes drift back toward average on their own, with no cause behind the slope.
↩ THE COMPENDIUMAll entries & tools
The full catalogue of statistical illusions, organised by mechanism, plus the pocket checklist of questions.