Every measurement has some error associated with it. If anyone read this statement, they would most likely say “I knew that,” and most people do understand that on a basic level, measurements contain error. However, the extent to which this error affects our lives is not widely understood. Error has wide implications in almost every aspect of life, but particularly in polling and voting (things that will be discussed in more detail in the next post). This post is dedicated to simply understanding the mathematics of error.
So what is error? Error is a measure of the uncertainty inherent in a measurement. Every measurement has some amount of uncertainty (think about looking at a ruler, you can measure 4.3 inches, but you can’t be sure whether it is 4.32 inches or 4.33 inches because the ruler does not have tick marks to indicate that. Therefore, that is known as the uncertainty in that measurement).
Understanding the nature of error and data requires knowing the findings of Daniel Bernoulli in the late 1700s. He took sets of data from completely different contexts (e.g. astronomy data and archery data) and hypothesized that the distribution of data would be very similar. This, in fact, turned out to be the case; the “real” value always fell somewhere in the middle, with increasingly smaller amounts of data as you get farther away from the center. This is known today as the “bell curve,” or a normal distribution of data. In this bell curve, the center of data represents the mean.
So, how do these findings of Bernoulli (along with other mathematicians, such as Abraham De Moivre) relate to error? Well, the normal distribution actually reflects the error, because if the bell curve is very spread out, the standard deviation, and thus the error, is much higher. Therefore, standard deviation is a measure of how spread out the data is. Ratings of movies, politicians, restaurants, and even grades are all subject to error; that is, when the “measurement” is made and then repeated, there is a high chance that you will receive different numbers, and standard deviation is a mathematical measure of how varied that data is.
For example, (from Mlodinow’s book, cited below) a few years ago it was announced that we had an unemployment rate in the United States of 4.7%, and then a few months later it had changed to 4.8%. News sources declared that unemployment is slowly rising. However, this is not necessarily true; these measurements are subject to error, and therefore there is no way to tell whether that 0.1% was due to a true rise in unemployment or just due to error. Mlodinow states that if the unemployment rate was measured at noon and then remeasured at 1 PM, the likelihood is that the number would be different by a little bit due to error, but that does not mean that unemployment rose in an hour.
In the next post, I will go into more detail about how the idea of error applies to polls and voting!
Leonard Mlodinow, The Drunkard’s Walk: How Randomness Rules Our Lives