Flip a fair coin a hundred times. Look only at the flips that landed right after a heads. On average, fewer than half of them are heads. Nothing is wrong with the coin.
That sentence sounds false. It is exactly true, and it is the reason a forty-year-old textbook fact — that the “hot hand” in basketball is a cognitive illusion — was reversed in 2018. The reversal turned on a bias so quiet that three brilliant psychologists, a generation of statisticians, and every popular retelling missed it. This page does not ask you to believe any of that. It computes the bias in front of you — exactly, by enumerating sequences and by an exact recursion, then again by flipping real (pseudo-)coins — and then shows precisely how much of the original result it eats.
In 1985 Thomas Gilovich, Robert Vallone, and Amos Tversky published The Hot Hand in Basketball: On the Misperception of Random Sequences. They took the Philadelphia 76ers’ field goals, the Celtics’ free throws, and — the cleanest test — a controlled experiment in which 26 Cornell players each shot 100 times from a distance calibrated to their personal 50%. Then they asked the natural question: after a player hits several shots in a row, is the next shot more likely to go in?
Their answer was no. The sequences looked, statistically, “analogous to coin tossing.” Belief in the hot hand, they concluded, was “a powerful and widely shared cognitive illusion” — people inventing streaks in what was really just noise. It became a canonical example, cited everywhere from intro psychology to behavioral economics: the human mind, fooled by randomness. Tversky reportedly said of it: “I’ve been in a thousand arguments over this topic. I’ve won them all, and I’ve convinced no one.”
The argument rested on a specific number: the proportion of hits among the shots that immediately followed a streak of hits, compared against the proportion after a streak of misses. If the hand runs hot, the first should beat the second. In the data it didn’t. Case closed — for thirty-three years.
Here is the thing nobody checked: that estimator is biased even for a coin that has no memory at all. Take the simplest possible case — Joshua Miller and Adam Sanjurjo’s own opening example. Flip a fair coin three times. In each sequence, find every flip that came right after a heads, and record what fraction of those flips were heads. There are eight equally likely sequences. Six of them contain at least one flip-after-a-heads, so the statistic is defined for those six. Average it over them.
| sequence | flips after a H | of those, H | proportion H after a H |
|---|---|---|---|
| HHH | 2 | 2 | 1 |
| HHT | 2 | 1 | 1⁄2 |
| HTH | 1 | 0 | 0 |
| HTT | 1 | 0 | 0 |
| THH | 1 | 1 | 1 |
| THT | 1 | 0 | 0 |
| TTH | 0 | — | undefined |
| TTT | 0 | — | undefined |
| expected proportion, averaged over the six defined sequences | 5⁄12 ≈ 0.4167 | ||
Not one-half. Five-twelfths. A fair coin, examined this way, comes up heads only about 42% of the time on the flips that follow a heads — and by symmetry comes up heads about 58% of the time on the flips that follow a tails. The coin is innocent. The bias lives entirely in the act of looking: by choosing to inspect a flip because the one before it was heads, you have quietly stacked the deck against finding another heads. (The technical name is a finite-sample selection bias — a cousin of Berkson’s paradox.)
A streak doesn’t make the next flip less likely. Conditioning on a streak makes the flips you chose to count less likely to extend it. Those are not the same statement, and the difference is the whole story.
The cleanest way to feel it: of the four ways to write down two consecutive flips — HH HT TH TT — exactly the ones with a H in front qualify, and among those, the second flip is heads in HH but not in HT. Across a whole sequence the bookkeeping is subtler, because long runs of heads pour many correlated “opportunities” into a single sequence’s average — but it always tilts the same way, and it never reaches zero for any finite sequence.
Dial the sequence length n and the streak length k. For each setting the panel reports the exact expected proportion of heads on the flip following a run of k heads, for a perfectly fair coin (p = ½) — computed two independent ways and shown to agree: by an exact recursion over all sequences, drawn as the curve, and (when you press the button) by flipping millions of real pseudo-random coins. Watch the gap below ½ grow as the streak gets longer, and shrink as the sequence gets longer — exactly as Miller & Sanjurjo proved.
At the famous three-flip setting it reads 5⁄12. Push k up and the gap yawns open: condition on a long hot streak and the very next flip looks badly stacked against you — a run of three heads is followed by heads only ~46% of the time in a 100-flip sequence; a run of six, only ~37%. Push n up and the gap narrows toward ½, because in a long sequence the selection effect is diluted. It never quite vanishes for any finite n. This is the bias. It is real, it is exact, and it has nothing to do with basketball — yet.
The recursion above carries no Monte Carlo — it is the exact rational expectation, computed by tracking, across positions, the joint distribution of (trailing-run length, number of qualifying opportunities, number of those that were heads). It reproduces the hand-checked 5⁄12 at n=3 to the last digit, agrees with brute-force enumeration wherever enumeration is feasible, and agrees with the live coin-flipping above. Three roads, one number.
Now put the pieces together. Gilovich, Vallone, and Tversky measured, for each player, the difference
P(hit │ 3 previous hits) − P(hit │ 3 previous misses)
and tested it against zero — the value they expected if shooting were “analogous to coin tossing.” But we have just shown that for a genuinely memoryless coin, the first term is biased below ½ and (by symmetry) the second is biased above it. Their difference is therefore biased negative. The honest benchmark for “no hot hand” in their design isn’t zero at all.
How big is the error? Set the instrument above to their design — sequences of n = 100, streak length k = 3 — and the one-sided bias is about −4 percentage points. The two-sided difference statistic they actually used roughly doubles it, to about −8 points. So a player whose true ability is utterly streak-free should still look, in this estimator, like the cold hand: about 8 points worse after a hot streak. GVT measured a difference near zero — and read “near zero” as “no effect,” when near zero was already 8 points hotter than randomness predicts.
Read left to right, that is the reversal. The observed +4 was never the thing to compare to zero; it was sitting a full +12 points above the −8 the bias predicts. Correct each player’s benchmark and re-run GVT’s own test on GVT’s own data, and the average jumps from “+3, indistinguishable from zero” to +13 percentage points, p < .01 — a gap Miller & Sanjurjo note is about the size of the difference between the median and the very best three-point shooter in the NBA. The canonical evidence for the cognitive illusion, once debiased, becomes evidence for the hot hand.
That distinction is the point of putting it behind a verifier instead of a headline. The reversal is real and bounded, and the page would be lying if it let the satisfying story (“science was wrong, streaks are real!”) outrun the arithmetic. The arithmetic says exactly this much: a famous null result was built on a biased ruler, and the corrected ruler points the other way.
What makes the bias so durable is that it is invisible to the usual sanity checks. Pool all the flips-after-a-heads across many sequences into one big proportion and you get exactly ½ — no bias at all. The bias appears only when you average a ratio computed within each finite sequence, which is the natural and almost universal way to ask “what does this player do after a streak?” one player at a time. The estimator everyone reaches for is the one that lies; the estimator nobody reaches for is fine. That is why it survived so long, and why it has since been found lurking in other places streaks are studied — finance, performance, anywhere a short personal sequence gets summarized by a conditional rate.
It belongs on this ground with its kin. The Farthest Point showed one word — tallest — splitting into three exact, incompatible answers; here one word — average — splits into a pooled rate that is unbiased and a per-sequence rate that is not, and a celebrated science was built on the wrong one. Both are the same lesson the Wasteland keeps relearning: the question hides a choice of instrument, and the instrument decides the answer. Check the instrument before you trust the number.