Combating Data Manipulation V: Law of Large Numbers

You conduct two polls to determine the percentage of people in a town that like Italian food.  For one poll, you ask and collect data from 5 people and determine that 80% of the people in that town like Italian food.  For the second poll, you ask and collect data from 1,000 people and determine that 60% of the people in that town like Italian food.  Which poll is more correct?

We explored this idea a little bit in a previous post about polling, but almost everyone will agree that the poll that sampled 1,000 people is more accurate than the pol that sampled 5 people.  But why is that?  The answer lies in a theorem developed by Jacob Bernoulli, now known as Bernoulli’s Theorem, or the Law of Large Numbers.

The idea is that the more trials used, the more accurate the probability will be or, according to Wolfram, “as the number of trials of a random process increases, the percentage difference between the expected and actual values goes to zero.”

So what does this NOT mean?  Say we somehow know that exactly 62% of the people in our town from the example above like Italian food and after sampling 1,000 people, we have 60%.  Some may say (understandably) that the next 1 or 2 or even 100 results will be above 62% to correct the difference and get us closer to the actual value of 62%.  This is commonly known as the gambler’s fallacy (based on the faulty idea that after 100 or 200 (or even more) tries at a slot machine, the gambler is then “due” for a win eventually).  But this is not necessarily correct – the probability that you’ll find someone in that town who likes Italian food is always 62%, so there is a 62% chance that the 1,001 person will like Italian food, but also a 38% chance that they won’t.

This is related to another fallacy, called the law of small numbers, which says that people tend to assume that a small sample of a population or of trials is representative of the larger population or of a larger probability, which is not necessarily true.  If you take the people 500-600 that you tested for Italian food affinity, it is not true that that probability should be 62%.  The Law of Large Numbers simply states that as you take more and more observations, the probability will get closer and closer to 62%.

This is an important law to keep in mind when looking at polling results, voting results, for insurance companies to figure out the probability that some event may happen, and many other examples.

The next post will be about Bayes’ Theorem and its legal applications 🙂

Works Cited

http://mathworld.wolfram.com/LawofLargeNumbers.html

http://math.arizona.edu/~jwatkins/J_limit.pdf

http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter8.pdf

http://hspm.sph.sc.edu/COURSES/J716/a01/stat.html

http://www.logicalfallacies.info/relevance/gamblers/

http://pirate.shu.edu/~hovancjo/exp_read/tversky.htm

Advertisements

2 thoughts on “Combating Data Manipulation V: Law of Large Numbers”

  1. I loved the ‘Combating Data Manipulation’ series! Very well written and easy to understand. My favorite line? ‘Counting is hard’. I’ll show it to my students (in 2 months). 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s