An often repeated mantra is this: correlation does not imply causation. This is wrong.
In An Enquiry Concerning Human Understanding, David Hume lays out his famous argument against induction. In particular, he writes:
It is confessed, that the utmost effort of human reason is to reduce the principles, productive of natural phenomena, to a greater simplicity, and to resolve the many particular effects into a few general causes, by means of reasonings from analogy, experience, and observation. But as to the causes of these general causes, we should in vain attempt their discovery; nor shall we ever be able to satisfy ourselves, by any particular explication of them. These ultimate springs and principles are totally shut up from human curiosity and enquiry. Elasticity, gravity, cohesion of parts, communication of motion by impulse; these are probably the ultimate causes and principles which we shall ever discover in nature; and we may esteem ourselves sufficiently happy, if, by accurate enquiry and reasoning, we can trace up the particular phenomena to, or near to, these general principles. The most perfect philosophy of the natural kind only staves off our ignorance a little longer: as perhaps the most perfect philosophy of the moral or metaphysical kind serves only to discover larger portions of it. Thus the observation of human blindness and weakness is the result of all philosophy, and meets us at every turn, in spite of our endeavours to elude or avoid it.
He rightly observes that our best attempts to make sense of the world consist in reducing causes and effects to simpler parts, thus staving off our ignorance for a while, but what we’re doing is ultimately only pushing our ignorance onto ever simpler reactions. These elementary cause-effect relationships, whether they be “elasticity, gravity, cohesion of parts, communication of motion by impulse” or Quantum Mechanics, are still assumptions that we make. We don’t make them arbitrarily, but we still assume them, because no one has ever observed a cause. The question is, how do you establish causation if not by correlation? How do you come to know that one billiard ball causes another to move in a particular direction, if not by observing that the correlation holds at all times? Using modern physics, you may be able to explain it in terms of the motions and interactions of elementary particles; but how do you know that one particular elementary force or particle caused some other state of affairs to obtain? How do you know that photon caused that electron to make a quantum leap? (Fun fact: in common parlance, “quantum leap” means “huge leap”, while in physics, it is one of the smallest leaps you can possibly make, between two discrete energy states of an electron.) Ultimately, you know it because you observed the correlation and you ruled out other explanations.
Rod Knowlton writes:
Correlation alone can be a reason to look into something, but cannot be the basis for a conclusion.
I don’t want to say that he’s wrong, because I understand perfectly well what he means, which is that just because you observe a correlation between two kinds of events, that doesn’t mean there is a causal relationship between the two. It’s a perfectly useful phrase for describing that fact. The vast majority of people who hear that “correlation does not imply causation” will take from it that useful lesson without getting into serious philosophical territory. But I like to take things all the way to their logical conclusions, and unfortunately, the absence of inferences based on correlation would mean that we could never know that one thing caused another.
Consider a case that practically begs for the correlation doesn’t imply causation mantra. Say someone observes that every time there’s a military coup in Latin America, there’s a great year for oranges in China; there are twice as many ripe oranges in a given year if there was a Latin American military coup that year. There is no logical reason why orange harvests in China and military coups in Latin America would be related, so naturally, we’d be skeptical that there was a causal relationship between the two; even if we could clearly point out that one of them always preceded the other, that would hardly prove that one event was the cause of the other. How would we go about investigating this?
We’d do it by varying parameters. By looking at different conditions, we can observe whether or not the correlation always holds. If we have exhausted every possible cause, and found that the only thing that always precedes a good orange year in China is a military coup in Latin America, so that there is never a good orange year without a preceding coup, never a coup without a following good orange year, and no other factor that also consistently correlates with the orange year, then we can tentatively conclude that the military coup is somehow causing the good orange year. Since we’re unable to reduce this cause-effect relationship to a combination of simpler operations that we already trust, we may still be skeptical and quite open to the possibility that we’ve simply missed something — that there’s something else that also correlates, that we didn’t see. For all we know, there could even be ten different causes for ten different good orange years, and the correlation is simply accidental, but as long as we assume that the future resembles the past (a claim we cannot ultimately prove either, as Hume also reminds us), that’s exceedingly unlikely and becoming unlikelier every time we observe the correlation to hold.
But what if we could find some plausible combination of already accepted causal relationships that together add up to make the military coup affect the Chinese orange growth? This interaction here, this interaction there, and together, we get this? We may have guessed that the processes in the motor of a car makes it move, because we see that every time we start the motor and perform the necessary steps to control it, the car moves; but we can be sure because every process that goes on in a car’s motor are well-understood (not necessarily by you and me, not even necessarily by a skilled mechanic, but certainly by the aggregate of human knowledge) and we can see how they add up to the combined effect: the car moves. If we could find a chain of well-understood and trusted causal relationships that together forge a connection between cause and effect, military coup and oranges, we would have to accept the relationship as a truly causal one, and not just accidental correlation.
All this is just what Hume more concisely says in the quote above: we unload our uncertainty about causal relations on simpler, more fundamental causal relationships, but this can only stave off our uncertainty for so long. At some point, we have to ask: but how do we know that this elementary interaction causes that one? How do we know that the fundamental cause-effect relationships that we take for granted aren’t accidental? The answer is: we observe correlations! We know this because we have observed the correlation to hold many times, and by varying circumstances we can see that no other correlation holds as well; thus, to find out the cause of something that happened, we look for the strongest correlation, the one that holds most consistently. That is all. We cannot see, touch, smell, hear or taste a cause. Causes cannot be observed directly. The only thing we can actually observe is correlation, and the only way to protect ourselves from accidental correlations is to keep varying circumstances until only one correlation remains. Any number of things could occur before an eight ball rolls over to the right hole; by observing what happens in a variety of circumstances, if we observe that the only thing that always happens before the ball starts rolling is that another ball strikes it, we conclude that the cause of one ball’s movement is the force of another that strikes it. All we can ever do is reduce events to Hume’s “few general causes”, and regardless of the nature of these causes, our trust in them comes only from observing strong correlations.
For this reason, correlation implies causation. In fact, correlation is ultimately the only thing that implies causation. I was deliberately provocative when I said that “correlation does not imply causation” was wrong, because in the sense that “any correlation doesn’t imply causation”, it’s obviously true. I did this to highlight this interesting and possibly troubling epistemological fact (fact about how we come to know things): that the only way to establish knowledge of cause and effect is to observe correlations, because causes aren’t directly observable. Not any kind of correlation, of course. We must rule out alternative explanations. That’s actually a good nutshell version of the scientific method, which for understandable reasons is usually phrased differently: observe correlations of the form “A happens, then B”. Repeat experiement to confirm. Systematically vary circumstances until you start seeing correlations disappear. Continue until you have found one correlation that seems to stubbornly refuse going away, no matter what wacky edge-cases you throw at it. You may now, provisionally, conclude that A causes B. Wash, rinse, repeat.
Why is this troubling? Well, for one thing, we may fall prey to false causes, precisely the kind of thing the “correlation doesn’t imply causation” mantra warns against. No matter how many times one thing happens (or doesn’t happen) after another, we cannot be absolutely certain that it will (or won’t) happen again in the future. I’m reminded of the chemistry lab that exploded in Florida in 2007. They had done the chemical reaction that blew up the lab 175 times before without incident; then, suddenly, something went wrong and the lab went boom and real, actual people died. That particular correlation (between doing the reaction and a safe outcome) wasn’t true. Turns out, luck was what had been keeping them safe all this time.
Not only false positives (accidental correlations), but also false negatives (causes that don’t correlate) could trip us up because of our reliance on correlations. Maybe an accidental correlation is masking the fact that there are different causes to the same phenomenon; if none of the causes are repeatable, because of unique circumstances that we cannot duplicate, then we may never be able to comfirm that one time, it really was the dog that farted and not your sorry, excuse-making ass. If God’s invisible hand caused something to happen, that event might not be replicable, and we may never be able to confirm such miracles. And how the hell do you disambiguate between perfect correlations? If both A and B always happen before C, but nothing else is constant every time C happens, how the hell do we know which is the cause? How do we know it wasn’t the Invisible Pink Unicorn’s Invisible Pink Horn that caused C? We may never know, because all we have to help us establish causal relationships is correlation.