01 · What just happened
A correlation changed its mind between floors
In Figure 1 there are two honest relationships, one stacked on top of the other. At the level of states, money and the Democratic vote move together — the wealthier the state, the bluer it tends to be. At the level of voters, money and the Republican vote move together — within any given state, the richer the individual, the redder they tend to lean. Both are real. They point in opposite directions, and they live on different floors of the same building.
The ecological fallacy is the act of taking the lift between those floors without paying the fare. You observe the state-level pattern — because state-level data is what gets published — and you conclude something about individuals: “so wealthy people vote Democratic.” The data never said that. It couldn't have, because it never looked at a single person.
The unit you measured is the unit you may speak about. Measure states, and you have learned something about states — not about the people inside them, however natural the leap feels.
02 · The paper that named it
Robinson, 1950: literacy and the foreign-born
The sociologist William S. Robinson gave the error its name in a 1950 paper that has been quietly terrifying social scientists ever since. He took the 1930 US census and asked a simple question: are immigrants less literate? At the level of the 48 states, the answer looked like a clear and surprising no — states with more foreign-born residents had higher literacy, a strong positive correlation of about +0.53.
Then he looked at individuals. Person by person, foreign-born residents were slightly less likely to be literate than the native-born — a small negative correlation, about −0.11. The state-level figure hadn't measured immigrants' literacy at all. It had measured where immigrants chose to settle: the industrial northern states, which happened to have high native literacy for reasons that had nothing to do with immigration.
Robinson's verdict was deliberately deflating: an ecological correlation, he argued, “cannot validly be used as a substitute for an individual correlation.” Not usually, not as an approximation — the two numbers are answers to different questions, and the group-level one carries essentially no guaranteed information about the individual-level one.
03 · The uncomfortable part
The aggregate barely constrains the individual
It would be reassuring if the group pattern were at least a hint — usually right, occasionally reversed. It isn't even that. A given set of group averages is compatible with an enormous range of individual realities, including ones that slope the opposite way, ones that are perfectly flat, and ones that match. The averages pin down where each group's cloud sits; they say almost nothing about the shape inside each cloud.
This is the formal heart of the matter. The relationship between a group-level correlation and the individual-level one depends on hidden quantities — how much variation lives within groups versus between them — that the aggregate data simply does not contain. Statisticians since Leo Goodman have built ecological inference methods to recover individual relationships from aggregate data under extra assumptions, but the assumptions do the work, and where they fail the recovered numbers can be confidently wrong.
04 · A cousin, not a twin
How this differs from Simpson's paradox
Readers of entry № 1 will feel a strong family resemblance, and it is real — both illusions involve a relationship flipping as you move between levels of aggregation. But they are different animals, and confusing them muddies both.
Simpson's paradox is about data you have: you hold the individual records, complete with group labels, and the puzzle is whether to report the pooled trend or the within-group trends. It is a question of which true number to present.
The ecological fallacy is about data you lack: you hold only the group averages, and the error is leaping to a conclusion about individuals you never observed. It is not a question of which number to report — it is an inference that may be flatly false, with no individual data anywhere in sight to catch it.
Simpson's is a choice; the ecological fallacy is a guess. The same machinery of aggregation underlies both, which is why they share the family, but only one of them is a mistake you can make while looking at the full data.
05 · Field notes
Where the leap gets taken
Nutrition and the “ecological study.” Much of mid-century diet science compared countries: nations eating more saturated fat had more heart disease, so fat was indicted for individuals. Cross-country correlations are cheap and seductive, but nations differ in a thousand correlated ways, and the individual-level story — teased out later by cohort studies — proved far messier. The “French paradox” was largely a paradox of levels.
Politics and the map. “Counties that voted for X have more Y” is an ecological claim, and the cable-news inference — that X-voting people have more Y — is the fallacy in real time. The income–voting reversal in Figure 1 is the textbook case, and it survived into serious commentary for years before Andrew Gelman's group disentangled the levels.
Crime, immigration, deprivation. Neighbourhoods with more of some characteristic having more of some outcome is endlessly reported as though it described the residents. Sometimes the individual relationship agrees; sometimes, as with immigration and crime, the aggregate and individual stories point different ways. The map cannot tell you which.
The mirror-image error. Leaping the other way — from an individual relationship to a confident claim about groups or regions — is the individualistic (or atomistic) fallacy, and it is just as unlicensed. And a close structural cousin, the modifiable areal unit problem, shows that even the group-level correlation isn't fixed: redraw the boundaries — merge the precincts, re-bin the ages — and the ecological number itself can swell, shrink or reverse.
The defence is a single question, asked whenever a statistic about places, periods or populations is used to describe people: does this pattern hold for individuals, or only for their groups? If the data were collected on groups, the honest answer is that you do not yet know — and the aggregate, however striking, is a hypothesis about individuals, never a measurement of them.