Correlation and Causation
“Correlation is not causation. Correlation is not causation. Correlation is not causation...”. At times during my statistics studies I felt like Jack Nicholson in the film The Shining. We witness his decent in to madness, as he repeatedly types the same sentence over and over again, “All work and no play makes Jack a dull boy. All work and no play makes Jack a dull boy...”.
“Correlation is not causation” is a statistics mantra. It’s ingrained, military school-style, into every budding statistician.
But what does it actually mean? Well, correlation is an amount of relationship between two things. Think of this as a number describing the relative change in one thing when there is a change in the other thing, with 1 being a strong positive relationship between two sets of numbers, -1 being a strong negative relationship and 0 being no relationship whatsoever. So, this phrase means that correlation between two things does not necessarily mean that one causes the other. For a festive example, we can not say, just because people go gift shopping and its cold, that the cold weather causes high street spending.
This phrase has not caught on in the wider world yet though. And its understandable to see why. It is seductive to suppose causation when there is no hard evidence for it. Our preconceptions and suspicions creep into our scientific reasoning and we can be too quick to make the leap from correlation to causation without justification. Sometimes it may seem to just makes sense to do this but it’s important not to be so eager wrapping-up the case and going home.
When a suspected causation is found to be the weaker relation correlation this can be due to a third factor that was responsible for affecting both of them. This sneaky hidden third wheel is called a confounder.
Arguably the most well known and important case where the correlation and causation twins were concerned is with the connection between smoking and lung cancer. It was not under dispute that there is a correlation between the two but to prove that one causes the other is no mean feat. It left some experts fuming that something they thought was obvious could not easily be scientifically proved. The counter argument said that there may be a confounder that was responsible for the correlation between smoking and lung cancer. Perhaps people who were more genetically predisposed to want to smoke were also more susceptible to getting cancer?
Of course, this causation is no longer in-dispute. It is held as fact that smoking does cause cancer.
But with this conclusion done and dusted now its on to the next smoking issues. Smoking indoors in public places in now banned and some propose banning smoking outdoor too, like in parks. The current debate is whether to ban smoking in cars with children in or would, for example, an information campaign be better?
Bringing in a law to prohibit something and it being effective in doing so may not necessarily have a direct causal link. This may depend on just how enforceable the law is. Banning smoking in public places is relative easy to do. Or perhaps, as good law abiding citizens, whether or not we get caught isn’t the issue.
Maybe there is a correlation between people who smoke and people who flaunt the law. Clearly, this is not a causal relationship. Perhaps, in this situation, there’s a particular rebellious personality trait as a confounder? I am not saying that these things are true but merely pointing-out that things like this ought to be considered.
The result to these smoking debates remain to be seen.
In the meantime, the drill of “correlation is not causation” may drive statistics students round the bend but will any of them be allowed to light-up in the process?
No comments:
Post a Comment