Combating Data Manipulation X: How to Combat Data Manipulation

We’ve spent the last 9 posts exploring the good and the bad of data.  We’ve seen examples of data being skewed or manipulated to advance a particular agenda, and we’ve seen examples of math being used to model aspects of life.  But, how can people recognize when numbers are being manipulated?  Here are a few steps to take when presented with a number:

(1) Ask where the number comes from: know who came up with the number and what methods they used.  For example, if given polling data, ask what the sample size of the poll was and how they found the people to poll.  You also may want to ask who conducted the poll, to see what kind of incentive they may have to twist the data.

(2) Ask yourself if you think they used correct methodology: do you really think that the pollsters adequately captured a representative of the whole population?  Do you think that makeup companies really conduct proper tests before determining how much strength or lift their mascara gives?  A lot of time, catching data manipulation is all about thinking and using your own intuition before taking a statistic or number as truth.  Using an example from an earlier post, it doesn’t take too much math or previous knowledge to think about the statement of Blue Dog Coalition: “Throughout the first 224 years (1776-2000) of our nation’s history, 42 U.S. presidents borrowed a combined $1.01 trillion from foreign governments and financial institutions according to the U.S. Treasury Department. In the past four years alone (2001-2005), the Bush Administration has borrowed a staggering $1.05 trillion” and recognize that it doesn’t make a whole lot of sense when you factor in inflation and the changing value of the dollar.

(3) Know the error: there is error inherent in every measurement conducted, but the error is very rarely taken into account when reporting a number or statistic.  Sometimes, the error can be so large that the statistic basically doesn’t mean anything, and so it is important to keep this in mind when evaluating data.

(4) Interpret the data yourself whenever possible: many times when statistics are reported, they are followed by an interpretation of the data.  A fictional example could be “This study has shown that 64% of people who drank lemonade could run faster than before the lemonade, therefore drinking lemonade makes people fitter.”  64% is the statistic, but the rest of the sentence is interpreting the data for us.  However, the interpretation is a little silly and is not necessarily what the data is telling us.  Whenever possible, it is always safest to examine the data yourself, come to your own conclusion, and then compare it to the interpretation given.  This way, you can catch silly interpretations that aren’t really supported by the data.

(5) Make sure that the data is in the right context: this is very applicable to election and polling results.  It is very common, especially with presidential elections, that people will start collecting polling data to determine the outcome of an election almost a year in advance.  In that year, the candidates will campaign, undecided voters will decide based on speeches and debates, and sometimes decided voters will hear something they do or don’t like in a speech or debate and will change their mind.  A lot can happen in that year, and therefore the chances that those poll results will accurately predict the election outcome is very low.  Nate Silver, in his book The Signal and the Noise (which I highly recommend for anyone interested), actually does the math and determines that the likelihood of a candidate who is shown ahead in the polls a year in advance will actually win is between 52% and 81%, depending on the size of their lead.  This is an example of when the data may have been correctly collected, but it just isn’t in the right context yet.

(6) Make sure that the data and the interpretation make sense: this one is a little tricky, because sometimes data that has been correctly collected and interpreted can give us very surprising information about the world.  However, looking at data and making sure that the conclusion obeys physical laws and is, to some extent, realistic can weed out a lot of silly and tampered data.  For example, in the Correlation vs. Causation post I said that there is a correlation between lemon imports and highway accidents.  But, just thinking about this can quickly rule it out as being a correct interpretation; why would lemon imports have anything to do with highway accidents?  Asking that question “why are these two variables connected” can shed a lot of light on whether the interpretation of data, or the data itself, is accurate.

It’s always important to question any data that you are presented with, and relying on your own knowledge and instinct can really help combat any forms of data manipulation.

Works Cited

Nate Silver, The Signal and the Noise

Charles Seife, Proofiness


Consequences of Climate Change: Ocean Acidification

Besides the temperature getting hotter, there are many other negative consequences of increased carbon dioxide in our atmosphere.  One of them is ocean acidification, where carbon dioxide actually dissolves in the seawater and makes it more acidic.

How does this work?

CO2 (carbon dioxide) reacts with H2O (water) to create carbonic acid, H2CO3.  With increasing carbon dioxide concentration in the air, this reaction happens more and generates more acid that makes the seawater more acidic.  According to the National Oceanic and Atmospheric Association (NOAA), acidity of the ocean has risen 30% since the Industrial Revolution (0.1 unit decrease in pH – a decrease in pH means an increase in acidity), which is definitely significant.

Why is this a problem?

Most marine organisms are very sensitive to temperature changes and changes in pH, since naturally the ocean’s temperature and pH stay relatively constant.  Therefore, drastic changes in pH can make certain environments unsuitable for the animals that live in them.

File:Coral Reef in the Red Sea.JPG

Found at:

But, the biggest problem has been for animals like coral, those that build skeletons made of calcium carbonate.  These animals also include shelled animals like oysters and clams.  Carbonic acid actually reacts with the calcium carbonate to create two aqueous solutions – meaning that the acid dissolves these skeletons.  In the case of coral, it can actually break down these coral reefs that provide a habitat to many of the species in the oceans.  This has implications for humans in an economic sense (it would severely hurt the fishing industry) but also for the stability of ecosystems around the world.

Works Cited

Evil Chlorine: The Science of DDT, PCBs, and CFCs

A lot of chlorine compounds (molecules with at least one chlorine atom) are considered “evil” or toxic, and they have subsequently been banned.  These include polychlorinated biphenyls, DDT (dichloro-diphenyl trichloroethane), and the very well known chlorofluorocarbons (CFCs).  But what makes these compounds so dangerous?  Here’s a little bit about each of them:

a) Chlorofluorocarbons (CFCs): CFCs originally were manufactured, not found in nature, to make refrigeration more efficient and less expensive.  Refrigeration works by liquid evaporating by absorbing heat from its surroundings, then the vapor condenses, returning to liquid state, then re-evaporates.  This process causes cooling and keeps the inside of refrigerators warm.  The ideal refrigerant has to vaporize within the proper temperature range and absorb lots of heat while doing so.  This means it has to be a very stable compound so it absorbs a lot of heat before it evaporates, instead of very quickly evaporating.  The discovery of CFCs revolutionized the refrigeration business because it was stable, nontoxic, nonflammable, almost odorless, and inexpensive – it functionally made the refrigerator a standard home appliance and made air conditioning an industry.

But…CFCs are so stable, they don’t break down by ordinary reactions.  These compounds can stay in the atmosphere from 40-150 years.  When factories and homes spew out CFCs, they get released into the lower atmosphere and drift around for years, eventually reaching the stratosphere.  Solar radiation then ruptures the CFCs into individual atoms or smaller molecules, which can be extremely damaging to our ozone layer.

Why?  Well, normally solar radiation breaks up O2 (what we breathe) into just a single oxygen atom, which can then react with O2 to make O3 (ozone).  The reverse reaction occurs as well, creating an equilibrium where the same amount of ozone being created is being destroyed and we have a constant amount of ozone and we are all happy.

Then chlorine comes in.  Chlorine reacts with ozone to make a compound ClO and O2.  It then reacts with another ozone molecule to make Cl and O2, functionally taking ozone and breaking it down to stable compounds that won’t create more ozone.  Even worse, because clorine is stable it stays in the atmosphere for decades and one single chlorine atom can destroy 100,000 ozone molecules before being deactivated.  And, scientists have found that for every 1% ozone depletion, 2% more UV radiation penetrates the atmosphere.  This was a huge problem before CFCs were banned in many countries (though not all).

File:Area of the ozone hole.jpg

The hole in our ozone before CFCs were banned in some countries.  Found at:

Now, of course, we use other compounds for refrigeration and air conditioning, a list of which can be found here:

b) Polychlorinated Biphenyls (PCBs): This is another extremely stable molecule used in electrical insulators and coolants in reactors, capacitors, and transformers.  Then, people started hearing about health problems among workers at PCB plants, including a skin condition called chloracne, which is among the first symptoms of PCB poisoning, followed by damage to immune, nervous, endocrine, and reproductive systems.

So what is a PCB?  It is a manufactured molecule in which all the hydrogen atoms in a biphenyl molecule have been substituted for chlorine atoms, which makes this molecule incredibly stable and persists for a long time in the environment.  The problem is that this stability causes biomagnification.  Biomagnification is a phenomenon where the concentrations of a toxin increase as you go up trophic levels (or the food chain); so if a small rat eats something with PCB in it, then a snake eats the rat, then a hawk eats the snake, the hawk has a much higher concentration of PCB than the rat.  This PCB can build up in fat cells and cause many health problems.

c) DDT: The same biomagnification problem is present in DDT.  DDT is a very stable pesticide molecule that was used in World War II as a delousing powder to stop typhus and kill disease carrying mosquitoes.  It was actually put in an aerosol can (which uses CFCs to work), which doubles the bad effect on the environment.

The big problem with DDT is that it also accumulates in animal tissues.  When this toxin reaches birds, the DDT actually inhibits an enzyme that supplies calcium to eggshells.  Then, birds lay eggs with really fragile shells that end up breaking before the offspring can hatch out.  This caused a significant decline in eagle, hawk, and falcon populations.

So, the common theme among all these chlorine compounds is that they are dangerous mostly due to their incredible stability; the bonds that the chlorine atoms form are very strong and do not break down or react easily.

But, these chlorine compounds also did wonders for the world – they revolutionized the idea of home refrigeration and air conditioning and opened up a lot of trade opportunities.  So, are these chlorine compounds good or bad?  A little of both: they’ve done a lot of good, but for now they are better off being left out of our environment.

Works Cited

Chemistry of PCBs and PBBs – by I. Pomerantz, J. Burke. D. Firestone, J. McKinney, J. Roach, W. Trotter – Environmental Health Perspectives, Vol. 24, June 1978, pp. 133-146

Chlorofluorocarbons and the Depletion of Stratospheric Ozone – by F. Sherwood Rowland, American Scientist Vol. 77 No. 1 (January-February 1989), pp. 36-45

Napoleon’s Buttons – by Penny Le Couteur, 2004

Combating Data Manipulation IX: Does Voting Work?

File:Vote with check for v.svg

Found at:

Does voting work?  This seems like an obvious question, all you do is have a bunch of people select a candidate and then count up the ballots.  But, there is actually a lot of mathematics behind figuring out the most effective method of voting so that the results accurately reflect what the population believes.

Kenneth Arrow, a prominent economist, developed a list of standards that a good voting method should have:

(1) Decisive: there should always be one winner.

(2) Pareto Principle: if all the voters vote for candidate A, then candidate A should win.

(3) Nondictatorship: no single voter should be able to decide the election

(4) Independence of Irrelevant Alternatives: If candidate A wins the election over B and C, then removing candidates B or C should not change that outcome.  Candidates B and C are considered “irrelevant” because they didn’t win the election.

This list seems pretty reasonable, but once we start applying these principles to find a voting system that meets them, things get far more complicated…

Plurality vs. Majority

There are many different ways to determine the winner of an election.  Two of them are known as plurality and majority rule.  The difference between them is extremely subtle:

a) Plurality: this rule says that the winner of an election is preferred  by the majority of the voters in the population.  So, if there are candidates A, B, and C and candidate A gets 45% of the votes, B gets 35%, and C gets 20%, candidate A wins.  This is used in the United States to determine the outcome of elections.

b) Majority: this rule says that the winner of an election is preferred over all the other candidates by the majority of the voters in the population.  The ballot would allow the voter to rank the candidates to show what their order of preference is, instead of just voting for one candidate.  The nice thing about this method is that, say you really love candidate A, but if candidate A is not going to win you would much rather candidate B win over C; this method allows you to state that in your ballot.  So, if we look at candidates A, B, and C again, we get something like the following data:

45% of the voters have the following preference: A>B>C

35% of the voters think the following: B>C>A

and 20% of the voters think the following C>B>A

In this case, B is the winner of this election, because 35%+20% = 55% of the voters prefer B to A and 45%+35%=80% of the voters prefer B to C, which are all majorities.  So, in this case, you look at the preferences of one candidate to another.

As you can see, it is the same election, but determining the election winner in different ways will give two different winners.

So, how do we determine which method is better?

How do these let us determine which rule described above (plurality or majority) is more fair?  Well, Arrow’s impossibility theorem states that actually no such voting system can meet all of these principles.  Here are some problems with the two rules:

a) The plurality rule tends to break rule #4 a lot; the candidates who did not win end up affecting the election a lot, as what happens in the United States elections occasionally.  For example, in the Bush v. Gore election, the election in Florida was so close that had Ralph Nader not been on the ticket, Gore could have won decisively.  In this case, the voting method did not meet Arrow’s principles of fair voting.

b) The majority rule falls prey to the Condorcet Paradox, which states that the preferences of a population can end up being  irrational.

Say there are three voters and three candidates, our previous A, B, and C.  They have the following preferences:

Voter 1: A>B>C

Voter 2: B>C>A

Voter 3: C>A>B

So now we use the majority rule by pitting each candidate against one other:

A vs. B: A is preferable to B for 2/3 of the voters, so A wins.

B vs. C: B is preferable to C for 2/3 of the voters, so B wins.

C vs. A: C is preferable to A for 2/3 of the voters, so C wins.

So, A is more preferable than B, B is more preferable than C, and C is more preferable than A, or


But wait…that doesn’t make any sense.  This is an example where, even though the preferences of individual voters are very rational, when using the majority rule the preferences of the population become suddenly irrational.  This is a big problem for the majority rule.

So, does voting work?

Each one of Kenneth Arrow’s principles seems very logical and appealing, but at least one is violated in every voting system.  There are more preferable systems, but according to Arrow, there does not exist a perfect voting system that effectively takes the beliefs of all individual voters an consolidates them into one final decision.  What is interesting is how the amount of parties affects these principles.  What if we had only two parties?  Would the voting system be fairer?  What if we had no parties at all?  All food for (scientific) thought…

For more information about voting, check out the previous post:

Works Cited

Combating Data Manipulation VIII: Correlation vs. Causation

“Correlation” and “Causation” are two words that are thrown around a lot when talking about data analysis and interpretation…but what do they actually mean?

What’s correlation?

Correlation is a statistical term that describes how closely two variables are related to each other.  This can most easily be shown on a linear graph:

File:Loi d'Okun.png

Found at:

The above graph shows a bunch of data points (ignore what the data points actually mean…this is just an example) and the line drawn through it is a best fit line, or a line that the people who drew this graph thought would best represent their data.  If the variables have a strong correlation, that means that those data points fit that line very well.  If the variables have a weak correlation, it means the data points fit the line a little bit.  Measures of correlation are used to determine how well related two variables are.

Here’s a list of weird correlations:  These variables have a relationship, as is clear from the graphs provided.

What’s causation?

Causation shows that a change in one variable causes a change in the other variable.  If you took a look at the Buzzfeed article, lemon imports are correlated with highway deaths, meaning that there is a relationship between the two, but anyone who said more lemons save lives on the road would be laughed at, because it’s silly to think that there is a causation.

So, what do people mean when they say correlation doesn’t imply causation?

They mean that just because two variables have a relationship does not mean that one causes the other.  One of the most common examples of this is the debate over whether vaccines cause autism.  While there is very little evidence to back this up, there are some graphs floating around the internet that show a clear correlation between diagnoses of autism and amounts of vaccinations.  But, just because these graphs show a positive linear slope does not mean that vaccines cause autism; this could be because as medical advances increase we are able to understand and diagnose more cases of autism, and separately medical advances cause us to develop more vaccines and that increased vaccination rates (for more information about vaccines, check out my recent post:

Another example was presented by Leonard Koppett, a sportswriter, who showed that there is a correlation between who wins the Super Bowl and changes in the stock market.  However, despite what some people believed after he stated this, this does not mean that the team that wins the Super Bowl can directly influence how the stock market works.

So, how do we determine which are causations?

One way is to perform controlled experiments to shed some light on whether one causes the other; if we can change one variable and see its changes on the other variable without any other factors affecting the outcome, then we can say with more certainty that there is a causation.

An easier way to see is if we can explain logically and rationally why there would be a causation.  It’s easy to throw out the lemon vs. highway accidents as just correlation because it doesn’t make any sense for lemon imports to affect highway accidents, unless somehow lemons were repeatedly falling off trucks and causing lots of accidents.

The autism vs. vaccines example is a little tougher, because to someone who hasn’t studied this extensively there could potentially be an explanation for causation, so then we have to wonder whether there could be other reasons for the correlation.  Once we do this, we realize that medical advancements can independently cause both, and then we have some rationale for this being just a correlation.  Unfortunately, a lot of misunderstandings about correlation vs. causation have caused misguided mistrust in vaccines, which can become a huge problem (for more information, see my recent post

This is a very important concept for making educated decisions, because given a graph of two variables with a positive correlation it is important to step back and think what the creators of the graph want it to say vs. what the graph actually says.  Sometimes it requires a little extra understanding and research, and sometimes it requires relying on experts, like those who research and publish papers on autism and vaccines.  However, everyone has the ability to interpret data and distinguish between correlation and causation.

Up next in Combating Data Manipulation – a little more about voting!

Works Cited

Cancer as an Ecosystem: How Cancer Cells and Invasive Species are Similar

For many people, cancer is a disease.  However, for certain scientists and mathematicians, cancer is more than that: it is a small ecosystem inside the body that has interesting connections to the ecology of invasive species.  These scientists are actually using modeling of invasive species to further examine the dynamics of cancer spreading.

File:Breast cancer cell (1).jpg

Picture of a breast cancer cell from a scanning electron microscope, found at:

Here’s how it works:

What’s an invasive species?

An invasive species is a species that originates from another area in the world, but is introduced by humans to a different ecosystem, either by accident or on purpose.  Invasive species are generalists, meaning they can easily adapt to new environments, they have no natural predators in their new environment, and they spread and reproduce very quickly.  Eventually, invasive species take over their new ecosystem, either eating the native species or eating the food of the native species so that the native species can no longer survive.

Invasive species have become such a problem with increasing travel around the world that scientists and mathematicians have been modeling the dynamics of invasive species for a long time.  Invasive species are introduced, usually at ports or coasts, and they establish an initial population.  Sometimes, there is a lag period between the initial population and when they begin to spread, but eventually the species will venture into new environments and establish new populations in new areas until they have taken over.

How is this similar to cancer?

The dynamics of invasive species have been shown to be similar to metastasizing cancer cells (or cancer cells that spread from one area of the body to another).  These cells have similar properties to invasive species: they have to easily adapt to new environments in the body and they spread and reproduce very quickly and they have no natural “predators” in the body.  Metastasizing cancer cells originate in one location of the body, set up an initial population, and begin to spread to other regions of the body.

File:Normal and cancer cells (labeled) illustration.jpg

Above is shown the differences between cancer cells and normal cells, found at:

So, why is this important?

Modeling cancer cells as similar to invasive species can give scientists further insights into how cancer moves throughout the body.  For example, it has been shown that cancer cells stay in their initial population without spreading for a while, very similar to what happens in invasive species. If we can apply our vast understanding of invasive species to cancer, according to the paper we can better model “initial density, metastatic seeding into the bone marrow and growth once the cells are present, and movement of cells out of the bone marrow niche and apoptosis of cells” (Chen and Pienta 2011).

Looking at cancer as a species in an ecosystem could be both an interesting and informative perspective!  If you want to know more about this, see the paper (the first link below).

Works Cited


The Truth Behind Chaos

What is chaos theory?

Chaos theory looks into nonlinear systems that cannot be predicted just given previous data.  If you look at a graph of chaos, it could look completely random (see the previous post about randomness at:  As mentioned in that post, random processes will occur completely differently even if you use the exact same conditions again.  Chaotic systems, on the other hand, can be generated from an equation and change greatly based on initial conditions.  But, if you play the same chaotic system twice with the same initial conditions, you would likely get the same process both times.

File:PWL Duffing chaotic attractor plot.gif

Found at:

This graph may seem completely random, but it is dependent on initial conditions and can be described using an equation.  If you run that same equation many times, it will always turn out looking like that graph.

The big idea about chaos is that it is unpredictable; chaotic systems such as the weather and some aspects of the economy, among other examples, can not be predicted completely accurately.  This is why we have economic models and weather models; these are approximations because we cannot ever accurately predict these systems.

But, isn’t chaos theory just the butterfly effect?

With regard to chaos theory, many people have heard of the butterfly effect.  The butterfly effect states that a butterfly can flap its wings in one part of the world and it causes a hurricane in another part of the world; this is supposed to represent a chaotic system.  This effect illustrates that even very small changes in the initial conditions of a chaotic system can cause huge results.  More specifically, this effect describes the chaotic system of the weather, suggesting that the weather is so unpredictable that it is impossible to model, because no one could ever predict all the butterflies flapping their wings and generating wind which could effect the weather.  This makes the weather especially complex and unpredictable.

The butterfly effect itself is a little bit of an exaggeration, because just a butterfly flapping its wings could not solely cause a hurricane.  But the general idea holds that those small changes can cause much bigger events later on.

Misconceptions about Chaos

Chaos colloquially means disorder, but this is not necessarily true in mathematics and with regard to chaos theory.  Chaos is ordered in the sense that the processes are described by an equation and can be modeled, even if those models are approximations.

As far as media representations of chaos theory go, one of the most well-known is the movie Jurassic Park.  For those of you who have not seen this movie, there is a character who is a mathematician who uses chaos theory to explain why it is impossible to predict the outcome of a dynamic system like an unknown ecosystem and therefore the park cannot be deemed to be safe.  He does an interesting, albeit not entirely correct, demonstration of chaos theory in one scene.  He puts a drop of water on the back of his hand and lets it roll off, and then repeats the same procedure and sees that the droplet rolls in a completely different direction the same time.  He then declares that one cannot predict the path of the droplet of water and there is an example of a chaotic system.

While we have seen that chaos systems definitely cannot be predicted, so in a sense the character is correct, his demonstration is a bit flawed.  Instead of demonstrating a chaotic process, he demonstrated a random one.  As mentioned previously, chaotic processes will be largely the same if repeated many times, while random processes will be entirely different every time they are repeated.  What the character should have done is ask the woman he is demonstrating chaos theory to where the droplet of water would run on his hand and then do the experiment once, showing that her prediction was inaccurate and there are too many variables in this system to predict the path of the water.

Despite this little misrepresentation, Jurassic Park definitely did popularize the concept of chaos theory, so that’s a good thing!  Also it’s a great movie.

File:Jurassic Park, US.JPG


Works Cited