Correlation Does in Fact Imply Causation
Perhaps the single most-quoted phrase about statistics1 is that 'correlation does not imply causation.' It's a phrase I've spoken hundreds of times, even after the ideas that resulted in this essay were broadly developed. It's often a useful educational tool for beginner-level students, and it's convenient as a shorthand description of a failure of scientific reasoning that's disturbingly common: just because A correlates with B, it doesn't mean that A causes B. The classic example is that ice cream sales correlate with violent crime rates, but that doesn't mean ice cream fuels crime — and of course this is true, and anyone still making base-level errors is well-served by that catchphrase 'correlation does not imply causation'.
The thing is, our catchphrase is wrong — correlation does in fact imply2 causation. More precisely, if things are correlated, there exists a relatively short causal chain linking those things, with confidence one minus the p-value of the correlation. Far too many smart people think the catchphrase is literally true, and end up dismissing correlation as uninteresting. It's of course possible for things to be correlated by chance, in the same way that it's possible to flip a coin and get 10 heads in a row3, but as sample size increases this becomes less and less likely, that's the whole point of calculating the p-value when testing for correlation. In other words, there are only two explanations for a correlation: coincidence or causation.
Let's return to the ice cream example. It doesn't take long to guess what's really going on here: warm weather causes both the increased occupancy of public space and irritability that leads to spikes in violent crime and to a craving for a cold treat. So no, ice cream does not cause violent crime. But they are causally linked, through a quite short causal pathway. There are three possible architectures for the pathway: A causes B, B causes A, and C causes both, either directly or indirectly4.
I would hate to push anyone back to the truly naive position that A correlating with B means A causes B, but let's not say false things: correlation does in fact imply causation5, just doesn't show you which direction that causation flows.
Why do I care about correcting this phrase? Two reasons — it is bad as a community to have catchphrases that are factually false, and "correlation does not imply causation" can and has been used for dark arts before. Rather famously, Ronald Fisher spent decades arguing that there was insufficient evidence to conclude that smoking causes lung cancer - because correlation does not imply causation. The tobacco industry was grateful. Meanwhile, the correlation was telling us exactly what we should have been doing: not dismissing it, but designing experiments to determine which of the three causal architectures explained it. The answer, of course, was the obvious one. Correlation was trying to tell us something, and we spent decades pretending it wasn't allowed to.