unspurious.

Forensic tools · An interactive tool

The Benford detector.

In real-world numbers, the leading digit is a 1 about 30% of the time — not 11%. Paste your own data and see whether it obeys the strangest regularity in statistics, and what it means when it doesn't.

First significant digit, observed vs predicted Bars = your data · the line = what Benford's Law predicts
Your data Benford predicts
Reading 2,000 numbers.
MEAN ABSOLUTE DEVIATION FROM BENFORD
How to read it. The claret line is the Benford prediction — 30.1% for a leading 1, falling to 4.6% for a 9. The bars are your data's actual leading digits. Naturally occurring numbers that span several orders of magnitude hug the line; bounded, assigned, or invented numbers wander off it. The conformity score (a mean absolute deviation, after Nigrini) puts a number on the gap — but read the cautions below before calling anything fraud.

01 · The strangest regularity in statistics

Why a leading 1 beats a leading 9, six to one

Pick a county's population, a company's revenue, the length of a river, the number on your last electricity bill. Intuition says the first digit should be anything from 1 to 9 with equal odds — about 11% each. Intuition is wrong. Across a startling range of real-world data, the leading digit is a 1 roughly 30% of the time and a 9 less than 5% of the time, following a precise curve: the probability of leading digit d is log₁₀(1 + 1/d).

It was first noticed by the astronomer Simon Newcomb in 1881 — the early pages of logarithm tables, the ones beginning with 1, were grubbier and more worn than the later pages — and rediscovered and popularised by the physicist Frank Benford in 1938, who checked it against 20,000-odd numbers from river areas to baseball statistics to the atomic weights of elements. Hence Benford's Law, occasionally the Newcomb–Benford law, and the name no one uses, the first-digit law.

The law isn't about any single number — it's about the company a number keeps. Ask not “is this digit suspicious?” but “does this collection of leading digits sit on the curve?”

02 · Why it happens

It's a fact about logarithms, not numbers

The cleanest way to see why is to stop thinking about a number line and start thinking about a log ruler. On a logarithmic scale, the distance from 1 to 2 is the same as from 2 to 4 or 4 to 8 — each is one doubling. Crucially, the stretch occupied by numbers beginning with 1 (from 1 up to 2) is far longer than the stretch beginning with 9 (from 9 up to 10).

Where the 30% comes fromThe leading-digit bands on a base-10 log scale
A LOG RULER FROM 1 TO 10 — EACH LEADING DIGIT GETS A BAND130.1%1217.6%2312.5%349.7%457.9%566.7%675.8%785.1%894.6%910how much ruler a digit occupies = how often it leads
Fig. 1 — The log ruler. The band for a leading 1 occupies 30.1% of the ruler; the band for a 9, just 4.6%. Any process that scatters values smoothly across orders of magnitude — and most multiplicative, growing, or naturally combined quantities do — lands in the wide bands more often. That is the whole law.

This also explains the law's deepest property: it is scale-invariant. Convert every figure from dollars to euros, miles to kilometres, or square feet to hectares — multiply the whole dataset by any constant — and the leading-digit distribution is unchanged. A regularity that survives changing the units can't be an artefact of the units, which is a strong hint it is real. It is also why the powers of 2 and the Fibonacci numbers in the tool above follow it almost perfectly: multiplicative sequences slide along the log ruler at a steady rate, visiting each band in exact proportion to its width.

03 · What it's good for

The auditor's smoke alarm

Because honest, naturally-arising figures tend to obey the law and many invented figures don't, Benford's Law became a staple of forensic accounting. People fabricating numbers reach for digits too evenly, round too neatly, cluster below psychological thresholds, and repeat themselves — all of which bend the first-digit curve. The accounting scholar Mark Nigrini turned this into a practical audit technique, and Benford tests now sit inside tax-authority software, expense-fraud screens and corporate audit tools worldwide. Suspicious patterns in macroeconomic data — including the figures Greece reported before its debt crisis — have been flagged the same way.

The key word, though, is smoke alarm. A failed Benford test is a reason to look closer, not a verdict. Which brings us to the part of the story this site exists to tell.

04 · Where it fails — and gets weaponised

Most data does not follow Benford's Law

Benford's Law applies to a specific kind of data: figures that range freely across several orders of magnitude. Vast amounts of perfectly honest data don't qualify, and quietly fail the test for reasons that have nothing to do with fraud. Adult heights in centimetres cluster between 150 and 200, so nearly every leading digit is a 1. Assigned numbers — phone numbers, postcodes, invoice IDs — follow whatever rule assigned them. Anything bounded, rounded, or capped breaks the spell.

Two datasets, one Benford curveBoth are honest; only one is the kind of data the law describes
SPANS MANY ORDERS OF MAGNITUDEpopulations, file sizes, market caps0%20%40%123456789✓ conformsA NARROW, BOUNDED RANGEvote counts per precinct, heights, ages0%20%40%123456789✗ fails
Fig. 2 — The law has preconditions. Data spanning many orders of magnitude tracks the curve. Data confined to a narrow band — like vote counts per precinct, which rarely stray outside a few hundred to a few thousand — piles onto the low digits and misses badly. Neither is fraudulent. One simply isn't a Benford dataset.

This is exactly where Benford's Law gets misused, and the most public example arrived in November 2020, when viral posts “proved” US election fraud by showing that certain candidates' precinct-level vote counts didn't follow Benford's Law. The argument was statistically empty. Precinct vote totals occupy a narrow, bounded range set by how precincts are sized — precisely the conditions under which honest data fails the first-digit test. Election-forensics researchers had warned for years that Benford's Law is unreliable for vote counts, and the 2020 claims were a textbook case of running a valid tool far outside its valid range and reading the failure as a smoking gun.

A failed Benford test on the wrong kind of data proves nothing about fraud — only that you applied the test to data it was never meant for.

That is the unspurious lesson in miniature. Benford's Law is real, elegant, and genuinely useful. It becomes an illusion the instant its preconditions are dropped: applied to bounded data, tiny samples, or assigned numbers, it manufactures false suspicion as readily as it once caught real fraud. Use it as a screen that raises questions for a human to investigate — never as a proof that closes them.

Keep going

More from the compendium