01 · What you're seeing
A high correlation is the easiest thing in the world to find
The two series the machine draws share no cause, no link, no contact of any kind — they are separate calls to a random number generator. And still, press after press, their correlation climbs into territory that in a research paper would read as a strong, publishable finding. The lesson is uncomfortable: a large correlation coefficient, on its own, is almost worthless as evidence that two things are related. It is a starting point for investigation, not a conclusion.
The trick the machine exploits is that we instinctively read a correlation as if it were rare and meaningful — as if two lines tracking each other must be connected. They needn't be. Given enough wiggle room, unrelated numbers cohere all the time, and our pattern-hungry eyes supply the story for free.
Switch the machine to flat “coin-flip noise” and the spurious correlations grow much rarer. That single toggle is the whole insight: it is trends, not randomness in general, that make fake correlations so easy.
02 · Why trends are the trap
Anything that drifts over time will seem to move together
The default setting uses random walks — series where each step nudges up or down from the last, so the line wanders and drifts the way real-world time series do: populations, prices, temperatures, your follower count. Two independent random walks are notorious for looking correlated, because each one tends to spend long stretches drifting in a single direction. When both happen to drift upward over the same window — as trending data so often does — the correlation coefficient lights up, despite there being no link whatsoever.
This is why correlating two time series is one of the most treacherous things you can do with data, and why careful analysts detrend first or model the change rather than the level. The famous galleries of absurd correlations — cheese consumption against bedsheet fatalities, a film star's releases against drowning rates — are funny precisely because every series in them is trending, which all but guarantees a few will line up.
03 · The hunt makes it worse
Search enough pairs and a stunner is guaranteed
Press the machine's Hunt button and it stops drawing one pair at a time and instead rifles through hundreds, keeping the most extreme correlation it stumbles on. Within a second or two it will hand you something near-perfect. That isn't luck — it's arithmetic. Each pair has some modest chance of looking strongly correlated; check enough of them and finding at least one becomes a near-certainty.
Anyone with a large enough pile of variables can therefore “discover” a jaw-dropping correlation and present just that one, with the hundreds of failures politely omitted. The correlation is real in the data; the impression it gives — that something meaningful was found — is the illusion.
04 · How to use it
Three questions for any correlation you meet
The machine is a vaccine. Once you have watched random numbers produce a dozen gorgeous correlations, the next “study finds X linked to Y” headline lands differently. Three questions defuse most of them:
Is there a plausible mechanism? Correlation is only the beginning of a causal claim; without a story for how one thing could move the other — one that was proposed before the data was dredged — a strong r is just a coincidence with good lighting.
Are both things trending over time? If so, treat the correlation as guilty until proven innocent. Shared drift is the single commonest source of spurious relationships, and it explains a startling fraction of “surprising link” stories.
How many things were compared? One correlation reported from a search of thousands is not a finding; it is the survivor of a lottery. Ask what else was tested and quietly set aside.
That is the whole of it. Correlation is a question, not an answer; trends make the question lie; and a long enough search will always turn up a beauty. The rest of the compendium is built on the same habit — distrust the pattern until you understand the process that made it.