unspurious.

The aggregation illusions · The ecological fallacy

What's true of the group can be false of the person.

The ecological fallacy: reading a pattern in group averages — states, regions, countries — and assuming it must hold for the individuals inside them. It needn't. It can reverse.

Income and the vote — by state, and by voter Each state's average is a diamond. Zoom in to see its voters. Illustrative, after Gelman et al.
What the states show: the richer a state's average income, the more it leans Democratic — a clear downward trend. The tempting leap: “wealthier voters vote Democratic.”
Fig. 1 — The reversal across levels. Toggle the view. Across states the trend runs downward; zoom into the voters and within every state it runs upward — richer individuals lean more Republican. The state-level slope was never a statement about people. (Illustrative data in the spirit of Gelman et al., Red State, Blue State, Rich State, Poor State, 2008.)
The short answer

What is the ecological fallacy?

The ecological fallacy is inferring something about individuals from statistics measured only at the level of the groups they belong to — and it can be flatly wrong, even reversed. Richer US states lean Democratic while richer individuals lean Republican; the state-level pattern was never a fact about people. Group averages constrain individual relationships hardly at all, so a correlation across regions cannot be read as a correlation across persons.

The fast check“Does the pattern hold for individuals, or only for their groups?”

01 · What just happened

A correlation changed its mind between floors

In Figure 1 there are two honest relationships, one stacked on top of the other. At the level of states, money and the Democratic vote move together — the wealthier the state, the bluer it tends to be. At the level of voters, money and the Republican vote move together — within any given state, the richer the individual, the redder they tend to lean. Both are real. They point in opposite directions, and they live on different floors of the same building.

The ecological fallacy is the act of taking the lift between those floors without paying the fare. You observe the state-level pattern — because state-level data is what gets published — and you conclude something about individuals: “so wealthy people vote Democratic.” The data never said that. It couldn't have, because it never looked at a single person.

The unit you measured is the unit you may speak about. Measure states, and you have learned something about states — not about the people inside them, however natural the leap feels.

02 · The paper that named it

Robinson, 1950: literacy and the foreign-born

The sociologist William S. Robinson gave the error its name in a 1950 paper that has been quietly terrifying social scientists ever since. He took the 1930 US census and asked a simple question: are immigrants less literate? At the level of the 48 states, the answer looked like a clear and surprising no — states with more foreign-born residents had higher literacy, a strong positive correlation of about +0.53.

Then he looked at individuals. Person by person, foreign-born residents were slightly less likely to be literate than the native-born — a small negative correlation, about −0.11. The state-level figure hadn't measured immigrants' literacy at all. It had measured where immigrants chose to settle: the industrial northern states, which happened to have high native literacy for reasons that had nothing to do with immigration.

Robinson’s reversal: literacy & nativity, 1930Ecological correlation +0.53 · individual correlation −0.11 · after Robinson (1950)
85%90%95%0%10%20%30%40%BY STATE — 48 US states, 1930r = +0.53share of a state’s residents who are foreign-born →80%85%90%95%100%94.1%NATIVE-BORN88.4%FOREIGN-BORNBY PERSON — individual literacyr = −0.11
Fig. 2 — Same census, opposite signs. Left: across states, the foreign-born share and literacy rise together. Right: individually, the foreign-born were a little less literate. The ecological correlation reflected where immigrants lived, not who they were.

Robinson's verdict was deliberately deflating: an ecological correlation, he argued, “cannot validly be used as a substitute for an individual correlation.” Not usually, not as an approximation — the two numbers are answers to different questions, and the group-level one carries essentially no guaranteed information about the individual-level one.

03 · The uncomfortable part

The aggregate barely constrains the individual

It would be reassuring if the group pattern were at least a hint — usually right, occasionally reversed. It isn't even that. A given set of group averages is compatible with an enormous range of individual realities, including ones that slope the opposite way, ones that are perfectly flat, and ones that match. The averages pin down where each group's cloud sits; they say almost nothing about the shape inside each cloud.

One set of averages, three different worldsEvery panel produces the identical descending line of group means
THE AGGREGATE — five group averages, all you ever observeINDIVIDUALS SLOPE UPa full reversalINDIVIDUALS FLATno individual linkINDIVIDUALS SLOPE DOWNmatches the aggregateThe same five averages — three different individual worlds. The aggregate cannot tell them apart.
Fig. 3 — Under-determination. The five claret diamonds — the only thing you observe — are the same in all three panels. The individuals inside them slope up, flat, or down with equal ease. Nothing in the aggregate lets you choose. This is why the leap is not “risky” but unlicensed.

This is the formal heart of the matter. The relationship between a group-level correlation and the individual-level one depends on hidden quantities — how much variation lives within groups versus between them — that the aggregate data simply does not contain. Statisticians since Leo Goodman have built ecological inference methods to recover individual relationships from aggregate data under extra assumptions, but the assumptions do the work, and where they fail the recovered numbers can be confidently wrong.

04 · A cousin, not a twin

How this differs from Simpson's paradox

Readers of entry № 1 will feel a strong family resemblance, and it is real — both illusions involve a relationship flipping as you move between levels of aggregation. But they are different animals, and confusing them muddies both.

Simpson's paradox is about data you have: you hold the individual records, complete with group labels, and the puzzle is whether to report the pooled trend or the within-group trends. It is a question of which true number to present.

The ecological fallacy is about data you lack: you hold only the group averages, and the error is leaping to a conclusion about individuals you never observed. It is not a question of which number to report — it is an inference that may be flatly false, with no individual data anywhere in sight to catch it.

Simpson's is a choice; the ecological fallacy is a guess. The same machinery of aggregation underlies both, which is why they share the family, but only one of them is a mistake you can make while looking at the full data.

Two ways aggregation bites
THE ECOLOGICAL FALLACYGROUPSyou observe this — states, regions, averagesPEOPLEhidden — you never see insidethe illegal leap“groups → individuals”an inference that may simply be falseSIMPSON’S PARADOXGROUPSyou build this by aggregatingPEOPLEyou observe this — individuals, with labelspoolsplita choice about which true number to report
Fig. 4 — Which rung do you stand on? The ecological fallacy starts at the group rung with the person rung hidden, and leaps down. Simpson’s starts at the person rung and chooses how to climb up.

05 · Field notes

Where the leap gets taken

Nutrition and the “ecological study.” Much of mid-century diet science compared countries: nations eating more saturated fat had more heart disease, so fat was indicted for individuals. Cross-country correlations are cheap and seductive, but nations differ in a thousand correlated ways, and the individual-level story — teased out later by cohort studies — proved far messier. The “French paradox” was largely a paradox of levels.

Politics and the map. “Counties that voted for X have more Y” is an ecological claim, and the cable-news inference — that X-voting people have more Y — is the fallacy in real time. The income–voting reversal in Figure 1 is the textbook case, and it survived into serious commentary for years before Andrew Gelman's group disentangled the levels.

Crime, immigration, deprivation. Neighbourhoods with more of some characteristic having more of some outcome is endlessly reported as though it described the residents. Sometimes the individual relationship agrees; sometimes, as with immigration and crime, the aggregate and individual stories point different ways. The map cannot tell you which.

The mirror-image error. Leaping the other way — from an individual relationship to a confident claim about groups or regions — is the individualistic (or atomistic) fallacy, and it is just as unlicensed. And a close structural cousin, the modifiable areal unit problem, shows that even the group-level correlation isn't fixed: redraw the boundaries — merge the precincts, re-bin the ages — and the ecological number itself can swell, shrink or reverse.

The defence is a single question, asked whenever a statistic about places, periods or populations is used to describe people: does this pattern hold for individuals, or only for their groups? If the data were collected on groups, the honest answer is that you do not yet know — and the aggregate, however striking, is a hypothesis about individuals, never a measurement of them.

Continue the field guide

More ways to be honestly wrong