Combating Data Manipulation III: Polling

Polls are incredibly important and widely used, particularly during election season, and many people take those numbers to heart without thinking about where they came from.  People assume that when people saying Obama’s approval rating is 47%, 47% of the US population support Obama.

A fascinating subset of science has been dedicated to examining the error in polls, especially the portion of error that is ignored or misrepresented.  There are a few parts to error in polls:

(1) Margin of error: this is what polls mean when they show the error in the poll.  Whenever a poll is conducted, it comes with the assumption that the people sampled in the poll represent the entire population, which is obviously never true.  Therefore, the margin of error is the error that comes from having too small of a sample size that it cannot accurately reflect the entire population.  This can be statistically measured and is almost always mentioned when the poll is presented.

(2) Systematic Error (as described by Charles Seife in Proofiness; Wayne Journell refers to this as sampling bias): these are very rarely brought up and can alter poll data a huge amount.  The best example described by both Charles Seife and Wayne Journell involves the 1936 presidential election of Franklin Delano Roosevelt vs. Alf Landon.  The Literary Digest had become famous for having the most accurate polls in the country (they claimed their error was “within a fraction of 1 per cent” (Seife 105), and before the election they provided conclusive polls saying Alf Landon was going to win.  They had a massive sample size, but their result was incredibly wrong.  Why was it so wrong?  It has to do with what Journell refers to as convenience sampling, when pollsters tend to contact the population that is the easiest to get in touch with.  For example, for the 1936 Literary Digest poll, they contacted people by phones and using car registration data.  Unfortunately, this occurred during the Great Depression and a huge population of America did not have cars or phones, and those that did were rich and tended to, in that time, vote overwhelmingly Republican.  Secondly, it fell victim to the volunteer bias, which indicates that the people with the strongest opinions tend to respond to polls because they want their voice heard.  In the case of this election, Roosevelt was the incumbent and therefore the people who supported him were happy with the state of politics as it was, and therefore tended to not feel the need to respond to polls.  Those unhappy with the state of the country contributed more to polls. Thus, the Literary Digest had a teeny margin of error, but such a huge systematic error as to completely throw their poll off.

LiteraryDigest-19210219

(http://commons.wikimedia.org/wiki/File:LiteraryDigest-19210219.jpg)

Another example involves the 1948 presidential election of Dewey and Truman.  Gallup conducted a poll to determine who would win the election weeks in advance because Dewey had such a significant lead.  Gallup was so confident in his results that he stopped polling.  However, the result was again incorrect; Gallup assumed that the undecided voters would vote just as the decided voters did.  This turned out not to be the case and the result of the election was very different from the result of his poll.

truman-300x287

(http://www.archives.gov/exhibits/running-for-office/index.php?page=45)

It is important to note that polls are also subject to pure randomness; if a poll samples one population one hour, and then again the next hour, there is a good likelihood that the results may be different (maybe because the people changed their minds in an hour).  There is also almost a guarantee that if you sample two different populations, even if they are equally varied and diverse, you will get different results.  This is just due to random factors that affect these polls and, while these are usually not strong enough to completely throw off a poll (like in the previous examples), they do play a role in error.

Here’s another example (again from Seife’s book) that demonstrates response bias, or in other words when what people say to polls does not accurately reflect what they believe or what is the truth.  CDC in 2007 conducted a poll about the sex lives of Americans; the results said that the average man has sex with seven women, but the average women has sex with four men.  But…that makes no mathematical sense.  Assuming that they only polled heterosexual people, the female average cannot be different than the male average.  Even though they really calculated the median and not the mean, the probability of the two numbers being different is still very small.  Current analysts suggest that the discrepancy could be due to societal pressures causing people to change their answers from the true number.

While a lot of the previous example is speculation on the part of analysts (we can’t really show that people were lying, although in 2003 apparently some researchers showed that if you ask people how many sexual partners they have and then ask them again once attached to a lie detector machine, the number changes), it does bring some interesting concepts to light.  Polls are definitely not foolproof and these examples illustrate different ways that polls can be inaccurate despite having really good sample sizes (and thus small margin of error).  A pretty funny story Seife mentions involves an Associated Press poll for what Americans see for 2007.  AP published two articles on this one poll.  Headlines?  “Americans optimistic for 2007” and “Americans see doom, gloom for 2007.”  Here are the links: http://www.freerepublic.com/focus/f-news/1760662/posts and http://www.washingtonpost.com/wp-dyn/content/article/2006/12/30/AR2006123000502_2.html.  As you can see, the margin of error of 3 percent was mentioned at the bottom along with the sample size (because Associated Press made the false assumption that overall relevant error can be calculated from just the sample size).  This just shows that polls can be really deceptive and outright wrong sometimes, and it is very important to understand the parts that go into formulating a correct poll so we know what questions to ask next time we encounter an article based on poll results.

I encourage everyone to take a look at the links in the works cited section (it’s always important to verify information, also they’re pretty interesting) and, if this stuff has interested you, pick up Seife’s Proofiness.

Works Cited

http://math.arizona.edu/~jwatkins/505d/Lesson_12.pdf

http://www.npr.org/templates/transcript/transcript.php?storyId=96432107

Wayne Journell, Interdisciplinary Education: “Lies, Damn Lies, and Statistics: Uncovering the Truth Behind Polling Data”

Robert W. Pearson, Statistical Persuasion

Charles Seife, Proofiness

Leonard Mlodinow, The Drunkard’s Walk: How Randomness Rules Our Lives

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s