Combating Data Manipulation VIII: Correlation vs. Causation

“Correlation” and “Causation” are two words that are thrown around a lot when talking about data analysis and interpretation…but what do they actually mean?

What’s correlation?

Correlation is a statistical term that describes how closely two variables are related to each other.  This can most easily be shown on a linear graph:

File:Loi d'Okun.png

Found at:

The above graph shows a bunch of data points (ignore what the data points actually mean…this is just an example) and the line drawn through it is a best fit line, or a line that the people who drew this graph thought would best represent their data.  If the variables have a strong correlation, that means that those data points fit that line very well.  If the variables have a weak correlation, it means the data points fit the line a little bit.  Measures of correlation are used to determine how well related two variables are.

Here’s a list of weird correlations:  These variables have a relationship, as is clear from the graphs provided.

What’s causation?

Causation shows that a change in one variable causes a change in the other variable.  If you took a look at the Buzzfeed article, lemon imports are correlated with highway deaths, meaning that there is a relationship between the two, but anyone who said more lemons save lives on the road would be laughed at, because it’s silly to think that there is a causation.

So, what do people mean when they say correlation doesn’t imply causation?

They mean that just because two variables have a relationship does not mean that one causes the other.  One of the most common examples of this is the debate over whether vaccines cause autism.  While there is very little evidence to back this up, there are some graphs floating around the internet that show a clear correlation between diagnoses of autism and amounts of vaccinations.  But, just because these graphs show a positive linear slope does not mean that vaccines cause autism; this could be because as medical advances increase we are able to understand and diagnose more cases of autism, and separately medical advances cause us to develop more vaccines and that increased vaccination rates (for more information about vaccines, check out my recent post:

Another example was presented by Leonard Koppett, a sportswriter, who showed that there is a correlation between who wins the Super Bowl and changes in the stock market.  However, despite what some people believed after he stated this, this does not mean that the team that wins the Super Bowl can directly influence how the stock market works.

So, how do we determine which are causations?

One way is to perform controlled experiments to shed some light on whether one causes the other; if we can change one variable and see its changes on the other variable without any other factors affecting the outcome, then we can say with more certainty that there is a causation.

An easier way to see is if we can explain logically and rationally why there would be a causation.  It’s easy to throw out the lemon vs. highway accidents as just correlation because it doesn’t make any sense for lemon imports to affect highway accidents, unless somehow lemons were repeatedly falling off trucks and causing lots of accidents.

The autism vs. vaccines example is a little tougher, because to someone who hasn’t studied this extensively there could potentially be an explanation for causation, so then we have to wonder whether there could be other reasons for the correlation.  Once we do this, we realize that medical advancements can independently cause both, and then we have some rationale for this being just a correlation.  Unfortunately, a lot of misunderstandings about correlation vs. causation have caused misguided mistrust in vaccines, which can become a huge problem (for more information, see my recent post

This is a very important concept for making educated decisions, because given a graph of two variables with a positive correlation it is important to step back and think what the creators of the graph want it to say vs. what the graph actually says.  Sometimes it requires a little extra understanding and research, and sometimes it requires relying on experts, like those who research and publish papers on autism and vaccines.  However, everyone has the ability to interpret data and distinguish between correlation and causation.

Up next in Combating Data Manipulation – a little more about voting!

Works Cited


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s