01 · What just happened
The averages moved because the labels did
The phenomenon takes its name from a line attributed to the humorist Will Rogers during the Depression-era westward migration: “When the Okies left Oklahoma and moved to California, they raised the average intelligence level in both states.” It is a joke about two groups at once, and it is also a precise piece of statistics. If the people who leave are below their old state's average but above their new state's average, then their departure lifts the average they leave and their arrival lifts the average they join. Both go up. Nobody got smarter.
The machine above is that joke made literal. There is a band between the two group means, and any value sitting in it has a peculiar property: it is dragging down the higher group and propping up the lower one. Move it across and you relieve both burdens at once. Crucially, the overall average — every value pooled together — cannot change, because no value changed; only its group membership did. That gap, between healthy-looking group averages and an unmoved total, is where the illusion lives.
It is a close cousin of Simpson's paradox: both are about how regrouping the same numbers tells a different story. Simpson's reverses a trend by pooling; Will Rogers improves every group by reshuffling.
02 · The anatomy
One band, two improvements
The condition is exact and worth stating plainly. Moving a value out of a group raises that group's average only if the value was below it. Adding a value to a group raises that group's average only if the value is above it. So to lift both at once you need a value that is below the average of the group it leaves and above the average of the group it joins — a value living in the gap between the two means.
Nothing here is a trick of arithmetic gone wrong; every average is computed correctly. The deception is in the comparison. When you compare “Group A before” with “Group A after,” you are quietly assuming the two groups contain comparable members. The moment membership is allowed to shift, that assumption fails, and a rising average can mean improvement, reshuffling, or — most treacherously — pure reclassification with no improvement at all.
03 · The case that named it
How better scanners 'cured' cancer on paper
In 1985 the physician Alvan Feinstein gave the phenomenon its medical name in The New England Journal of Medicine, after spotting it in lung-cancer survival data. He compared a group of patients treated in 1977 with an earlier group from the 1950s and 60s at the same hospitals, and found that survival had improved — not just overall, but within every single stage of the disease. It looked like a triumph. It was an artefact.
The newer patients had been examined with better imaging. Those scanners revealed small metastases that the older technology had missed — metastases that had always been there, silently, in patients previously filed as “early stage.”
In Feinstein's own words, the migrants' prognosis was “worse than that for other members of the good-stage group” but “better than that for other members of the bad-stage group,” so survival rose in each group “without any change in individual outcomes.” When he re-classified both cohorts by symptoms — a yardstick the new scanners couldn't shift — the apparent progress evaporated. The two eras had nearly identical survival all along.
04 · The signature
Every stage improves; nobody is saved
Stage migration leaves a fingerprint so distinctive it has become a diagnostic test for the illusion itself: survival improves in every subgroup while the survival of the whole population barely moves. Logically that should feel impossible — if every part got better, surely the whole did too? It doesn't, because the parts were repopulated. Patients shuffled from the good group to the bad group lift the statistics of both while changing the total not at all.
The stakes are not academic. Stage migration can make a useless new test or therapy look effective, justify expensive screening that never extended a life, and corrupt any comparison of cancer outcomes across hospitals, countries or decades whenever diagnostic standards differ. It also rides alongside its relative, lead-time bias — detecting disease earlier makes survival from diagnosis look longer even when the date of death is unchanged. Both inflate the same hopeful statistics; both are illusions of the clock and the label, not the cure.
05 · Field notes
Wherever the boxes can be redrawn
Beyond medicine. Any time a population is split into labelled groups and the labelling can shift, the Will Rogers phenomenon is available. Move a mid-tier student from the top set to a lower one and both sets' averages can rise. Reclassify a struggling fund from “growth” to “value” and both categories' returns can tick up. Promote a mediocre player from the minors to the majors and you may lift the standard of both leagues. Redraw the boundary of a credit rating, a tax bracket, a diagnostic threshold, and the averages on both sides can move with nobody underneath them changing.
The structural twin. Feinstein found his example while thinking hard about how categories deceive, and the lesson generalises the one behind Simpson's paradox and the ecological fallacy: a statistic computed within a group is only comparable across time if the group is the same group. Change the membership and you have changed the question, however identical the label looks.
So the question to keep in your pocket is the one that separates them: did the groups improve, or did someone change groups? If the boundary moved — a new scanner, a new rule, a new definition — between the two numbers you are comparing, treat any improvement as suspect until you have ruled out the migration. The rest of the compendium is full of honest numbers behaving badly; this is the one where they march in two directions at once.