Artificial Wasteland — Ground Truth

The Cold Hand

verification  ·  the famous proof that the hot hand is an illusion was running a biased estimator

Flip a fair coin a hundred times. Look only at the flips that landed right after a heads. On average, fewer than half of them are heads. Nothing is wrong with the coin.

That sentence sounds false. It is exactly true, and it is the reason a forty-year-old textbook fact — that the “hot hand” in basketball is a cognitive illusion — was reversed in 2018. The reversal turned on a bias so quiet that three brilliant psychologists, a generation of statisticians, and every popular retelling missed it. This page does not ask you to believe any of that. It computes the bias in front of you — exactly, by enumerating sequences and by an exact recursion, then again by flipping real (pseudo-)coins — and then shows precisely how much of the original result it eats.

I · The fact that everyone repeated

In 1985 Thomas Gilovich, Robert Vallone, and Amos Tversky published The Hot Hand in Basketball: On the Misperception of Random Sequences. They took the Philadelphia 76ers’ field goals, the Celtics’ free throws, and — the cleanest test — a controlled experiment in which 26 Cornell players each shot 100 times from a distance calibrated to their personal 50%. Then they asked the natural question: after a player hits several shots in a row, is the next shot more likely to go in?

Their answer was no. The sequences looked, statistically, “analogous to coin tossing.” Belief in the hot hand, they concluded, was “a powerful and widely shared cognitive illusion” — people inventing streaks in what was really just noise. It became a canonical example, cited everywhere from intro psychology to behavioral economics: the human mind, fooled by randomness. Tversky reportedly said of it: “I’ve been in a thousand arguments over this topic. I’ve won them all, and I’ve convinced no one.

The argument rested on a specific number: the proportion of hits among the shots that immediately followed a streak of hits, compared against the proportion after a streak of misses. If the hand runs hot, the first should beat the second. In the data it didn’t. Case closed — for thirty-three years.

II · The coin that looks unfair to itself

Here is the thing nobody checked: that estimator is biased even for a coin that has no memory at all. Take the simplest possible case — Joshua Miller and Adam Sanjurjo’s own opening example. Flip a fair coin three times. In each sequence, find every flip that came right after a heads, and record what fraction of those flips were heads. There are eight equally likely sequences. Six of them contain at least one flip-after-a-heads, so the statistic is defined for those six. Average it over them.

sequenceflips after a Hof those, Hproportion H
after a H
HHH221
HHT211⁄2
HTH100
HTT100
THH111
THT100
TTH0undefined
TTT0undefined
expected proportion, averaged over the six defined sequences 5⁄12 ≈ 0.4167

Not one-half. Five-twelfths. A fair coin, examined this way, comes up heads only about 42% of the time on the flips that follow a heads — and by symmetry comes up heads about 58% of the time on the flips that follow a tails. The coin is innocent. The bias lives entirely in the act of looking: by choosing to inspect a flip because the one before it was heads, you have quietly stacked the deck against finding another heads. (The technical name is a finite-sample selection bias — a cousin of Berkson’s paradox.)

A streak doesn’t make the next flip less likely. Conditioning on a streak makes the flips you chose to count less likely to extend it. Those are not the same statement, and the difference is the whole story.

The cleanest way to feel it: of the four ways to write down two consecutive flips — HH HT TH TT — exactly the ones with a H in front qualify, and among those, the second flip is heads in HH but not in HT. Across a whole sequence the bookkeeping is subtler, because long runs of heads pour many correlated “opportunities” into a single sequence’s average — but it always tilts the same way, and it never reaches zero for any finite sequence.

III · The engine

Dial the sequence length n and the streak length k. For each setting the panel reports the exact expected proportion of heads on the flip following a run of k heads, for a perfectly fair coin (p = ½) — computed two independent ways and shown to agree: by an exact recursion over all sequences, drawn as the curve, and (when you press the button) by flipping millions of real pseudo-random coins. Watch the gap below ½ grow as the streak gets longer, and shrink as the sequence gets longer — exactly as Miller & Sanjurjo proved.

INSTRUMENT — streak-selection bias, p = ½n=100 · k=3
Curve: exact E[proportion of H after a run of k] versus n, for the current k. The dashed line is ½ — what an unbiased estimator would give.
0.4603 exact
— press to verify by Monte Carlo —
exact (recursion)
0.4603
observed (flips)

At the famous three-flip setting it reads 5⁄12. Push k up and the gap yawns open: condition on a long hot streak and the very next flip looks badly stacked against you — a run of three heads is followed by heads only ~46% of the time in a 100-flip sequence; a run of six, only ~37%. Push n up and the gap narrows toward ½, because in a long sequence the selection effect is diluted. It never quite vanishes for any finite n. This is the bias. It is real, it is exact, and it has nothing to do with basketball — yet.

THE CHECK — the exact recursion, runnable

    
awaiting run

The recursion above carries no Monte Carlo — it is the exact rational expectation, computed by tracking, across positions, the joint distribution of (trailing-run length, number of qualifying opportunities, number of those that were heads). It reproduces the hand-checked 5⁄12 at n=3 to the last digit, agrees with brute-force enumeration wherever enumeration is feasible, and agrees with the live coin-flipping above. Three roads, one number.

IV · Why the proof of the illusion was itself an illusion

Now put the pieces together. Gilovich, Vallone, and Tversky measured, for each player, the difference

P(hit │ 3 previous hits)  −  P(hit │ 3 previous misses)

and tested it against zero — the value they expected if shooting were “analogous to coin tossing.” But we have just shown that for a genuinely memoryless coin, the first term is biased below ½ and (by symmetry) the second is biased above it. Their difference is therefore biased negative. The honest benchmark for “no hot hand” in their design isn’t zero at all.

How big is the error? Set the instrument above to their design — sequences of n = 100, streak length k = 3 — and the one-sided bias is about −4 percentage points. The two-sided difference statistic they actually used roughly doubles it, to about −8 points. So a player whose true ability is utterly streak-free should still look, in this estimator, like the cold hand: about 8 points worse after a hot streak. GVT measured a difference near zero — and read “near zero” as “no effect,” when near zero was already 8 points hotter than randomness predicts.

What an iid, no-hot-hand coin should yield in GVT's design (k=3, n=100)
−8 pp
the biased benchmark for "no effect," reproduced above
What GVT measured in the Cornell data and read as "≈ zero / no hot hand"
+4 pp
.49 after a hit-streak vs .45 after a miss-streak
The effect once each player's bias is removed (Miller & Sanjurjo, 2018)
+13 pp
p < .01, S.E. 4.7 — a large effect, not zero

Read left to right, that is the reversal. The observed +4 was never the thing to compare to zero; it was sitting a full +12 points above the −8 the bias predicts. Correct each player’s benchmark and re-run GVT’s own test on GVT’s own data, and the average jumps from “+3, indistinguishable from zero” to +13 percentage points, p < .01 — a gap Miller & Sanjurjo note is about the size of the difference between the median and the very best three-point shooter in the NBA. The canonical evidence for the cognitive illusion, once debiased, becomes evidence for the hot hand.

what this does and does not establish The honest claim — and the only one this page makes — is the one that is checkable above: GVT’s estimator was biased against the hot hand, by enough to swallow their result, and correcting it reverses the sign of their conclusion in their own controlled-shooting data. That is settled and reproduced here. What remains genuinely debated is the magnitude in live NBA game play (where defenses adapt to a hot shooter, confounding the in-game data) and how universal the effect is across players. Miller & Sanjurjo call their estimates conservative and find significant effects for only some players; Gilovich has not fully conceded the broad interpretation. “The hot hand is not a pure cognitive illusion” is now well supported. “The hot hand is large and everywhere” is not what was shown — here or anywhere.

That distinction is the point of putting it behind a verifier instead of a headline. The reversal is real and bounded, and the page would be lying if it let the satisfying story (“science was wrong, streaks are real!”) outrun the arithmetic. The arithmetic says exactly this much: a famous null result was built on a biased ruler, and the corrected ruler points the other way.

V · The shape this keeps

What makes the bias so durable is that it is invisible to the usual sanity checks. Pool all the flips-after-a-heads across many sequences into one big proportion and you get exactly ½ — no bias at all. The bias appears only when you average a ratio computed within each finite sequence, which is the natural and almost universal way to ask “what does this player do after a streak?” one player at a time. The estimator everyone reaches for is the one that lies; the estimator nobody reaches for is fine. That is why it survived so long, and why it has since been found lurking in other places streaks are studied — finance, performance, anywhere a short personal sequence gets summarized by a conditional rate.

It belongs on this ground with its kin. The Farthest Point showed one word — tallest — splitting into three exact, incompatible answers; here one word — average — splits into a pooled rate that is unbiased and a per-sequence rate that is not, and a celebrated science was built on the wrong one. Both are the same lesson the Wasteland keeps relearning: the question hides a choice of instrument, and the instrument decides the answer. Check the instrument before you trust the number.

Apparatus

Gilovich, T., Vallone, R., & Tversky, A. (1985). “The Hot Hand in Basketball: On the Misperception of Random Sequences.” Cognitive Psychology 17, 295–314. The canonical study. 9 Philadelphia 76ers (in-game field goals, 1980–81), Celtics free throws, and 26 Cornell players × 100 controlled shots. Concluded shooting is “analogous to coin tossing” and the hot hand a “cognitive illusion.”
Miller, J. B., & Sanjurjo, A. (2018). “Surprised by the Hot Hand Fallacy? A Truth in the Law of Small Numbers.” Econometrica 86(6), 2019–2047. DOI: 10.3982/ECTA14943. onlinelibrary.wiley.com/doi/abs/10.3982/ECTA14943 Proves the streak-selection bias; the n=3 → 5/12 example is their Table 1 (p. 2–3). The GVT reanalysis (k=3): observed ≈ +4 pp, iid benchmark ≈ −8 pp, per-player debiased effect ≈ +13 pp, p < .01, S.E. 4.7 (§3.2–3.3, p. 11–12).
Open access. Final-version full text (Wharton) and arXiv preprint. arxiv.org/abs/1902.01265
Tversky quote. “I’ve been in a thousand arguments over this topic. I’ve won them all, and I’ve convinced no one.” Attributed to Amos Tversky. Widely reported, e.g. ESPN, “He’s heating up… Klay Thompson and the truth about the hot hand.” The scholarly provenance is usually given as Bar-Eli, Avugos & Raab (2006), Psychology of Sport and Exercise.
On the remaining debate. A. Gelman, “Gilovich doubles down on hot hand denial” (2017); Data Colada #88 (an accessible explainer of the bias); in-game vs controlled-shooting disagreement persists in the post-2018 literature. datacolada.org/88
The check. Every figure on this page is computed client-side. The exact recursion is printed above and runs on the button; it was independently cross-checked against (i) brute-force enumeration of all 2n sequences for n ≤ 24 and (ii) a Monte Carlo over millions of sequences. All three agree (n=3 → 5/12 exactly; n=100,k=3 → −3.97 pp one-sided; the two-sided GVT statistic → −7.93 pp ≈ the −8 pp Miller & Sanjurjo report).