We’ve spent the last 9 posts exploring the good and the bad of data. We’ve seen examples of data being skewed or manipulated to advance a particular agenda, and we’ve seen examples of math being used to model aspects of life. But, how can people recognize when numbers are being manipulated? Here are a few steps to take when presented with a number:
(1) Ask where the number comes from: know who came up with the number and what methods they used. For example, if given polling data, ask what the sample size of the poll was and how they found the people to poll. You also may want to ask who conducted the poll, to see what kind of incentive they may have to twist the data.
(2) Ask yourself if you think they used correct methodology: do you really think that the pollsters adequately captured a representative of the whole population? Do you think that makeup companies really conduct proper tests before determining how much strength or lift their mascara gives? A lot of time, catching data manipulation is all about thinking and using your own intuition before taking a statistic or number as truth. Using an example from an earlier post, it doesn’t take too much math or previous knowledge to think about the statement of Blue Dog Coalition: “Throughout the first 224 years (1776-2000) of our nation’s history, 42 U.S. presidents borrowed a combined $1.01 trillion from foreign governments and financial institutions according to the U.S. Treasury Department. In the past four years alone (2001-2005), the Bush Administration has borrowed a staggering $1.05 trillion” and recognize that it doesn’t make a whole lot of sense when you factor in inflation and the changing value of the dollar.
(3) Know the error: there is error inherent in every measurement conducted, but the error is very rarely taken into account when reporting a number or statistic. Sometimes, the error can be so large that the statistic basically doesn’t mean anything, and so it is important to keep this in mind when evaluating data.
(4) Interpret the data yourself whenever possible: many times when statistics are reported, they are followed by an interpretation of the data. A fictional example could be “This study has shown that 64% of people who drank lemonade could run faster than before the lemonade, therefore drinking lemonade makes people fitter.” 64% is the statistic, but the rest of the sentence is interpreting the data for us. However, the interpretation is a little silly and is not necessarily what the data is telling us. Whenever possible, it is always safest to examine the data yourself, come to your own conclusion, and then compare it to the interpretation given. This way, you can catch silly interpretations that aren’t really supported by the data.
(5) Make sure that the data is in the right context: this is very applicable to election and polling results. It is very common, especially with presidential elections, that people will start collecting polling data to determine the outcome of an election almost a year in advance. In that year, the candidates will campaign, undecided voters will decide based on speeches and debates, and sometimes decided voters will hear something they do or don’t like in a speech or debate and will change their mind. A lot can happen in that year, and therefore the chances that those poll results will accurately predict the election outcome is very low. Nate Silver, in his book The Signal and the Noise (which I highly recommend for anyone interested), actually does the math and determines that the likelihood of a candidate who is shown ahead in the polls a year in advance will actually win is between 52% and 81%, depending on the size of their lead. This is an example of when the data may have been correctly collected, but it just isn’t in the right context yet.
(6) Make sure that the data and the interpretation make sense: this one is a little tricky, because sometimes data that has been correctly collected and interpreted can give us very surprising information about the world. However, looking at data and making sure that the conclusion obeys physical laws and is, to some extent, realistic can weed out a lot of silly and tampered data. For example, in the Correlation vs. Causation post I said that there is a correlation between lemon imports and highway accidents. But, just thinking about this can quickly rule it out as being a correct interpretation; why would lemon imports have anything to do with highway accidents? Asking that question “why are these two variables connected” can shed a lot of light on whether the interpretation of data, or the data itself, is accurate.
It’s always important to question any data that you are presented with, and relying on your own knowledge and instinct can really help combat any forms of data manipulation.
Nate Silver, The Signal and the Noise
Charles Seife, Proofiness