View Full Version : Statistics 101:

bob the goat
November 6th, 2006, 03:08 PM
Due to a recent illness, I have done some digging for some medical facts, and have been like a kid in a candy store with all the new things that I didnít even know that I didnít know about. Most of the things were trivia, and probably not that useful to anyone in real life, but interesting to me none the less. The one thing that I did strike upon that riled me somewhat was the blatant abuse of statistics. To somewhat qualify myself, I have a Bachelors degree in Manufacturing Engineering, and an Associates in Mechanical Engineering. I have taken more statistics classes than any human should have to endure (so much so that I begun to like it, as disgusting as that thought is).

The thing that I want to talk about is the difference between correlation and cause.

First, a definition of Correlation: A measure of the strength of linear association between two variables. The relationship between two sets of data, that when one changes, the other is likely to make a corresponding change.

What drove me nuts when researching all these different things was how the people writing the article would interpret a correlation as a cause. Just because there is a correlation between two things does not mean that one is the cause of the other.

When doing research involving correlations there are two (or more) factors, and a cause for each of those factors. Ideally, you will discover that one factor is the cause of the other, allowing you to change one thing and affect both. i.e.

Let me give you some examples. Iím going to list the theory, then the data, then the conclusions...then Iím going to evaluate that conclusion based on sound statistical practice. Note, Iím making up numbers here, I didnít actually do any of these experiments.

Situation 1:
Theory: There is a correlation between people that wear cotton jumpsuits and crime.
Data: I observed 500 people. 250 in a mall and 250 in the county jail. In the mall there was only one person in a cotton jumpsuit, but in the jail nearly 98% wore jumpsuits.
Conclusion: Clearly, criminals are made so by the wearing of cotton jumpsuits.
My analysis: The conclusion is wrong. This is a great example of how there can be a correlation (criminals are forced to wear jumpsuits), but the cause is neither of the factors. You are not a criminal because you wear a jumpsuit, and crime is not caused by jumpsuit usage.

Situation 2:
Theory: The number of people shoveling their driveway is correlated with how much snow is received.
Data: For 10 days I observed my neighborhood. 4 days no one was shoveling and there was no snow. 4 days there were 8 people shoveling and there was 2Ē of snow. 2 days there were 12 people shoveling and there was 6Ē of snow.
Conclusion: Clearly the more people that there are shoveling, then the more snow we get. Seeing as it begins to snow before the people begin to shovel, clearly nature has a way of sensing how many people are going to shovel, and increases snow output accordingly.
My analysis: Clearly there is a correlation. You could even go so far as to say that one causes the other, however the researcher mixed up one cause with another.

A great example of this type of mistake was when I looked up the dangers of Diet Coke. They did research observing people from different age brackets and consumption levels of pop. They found that there was a strong correlation between weight gain an diet pop consumption. Many diet sites listed this as a strong proof that diet pop makes you fat. The site that did the research said in their conclusions that it seemed more obvious when looked at with comparisons throughout time that people that were more overweight switched to diet soda. Therefore, there is a correlation but it is that fat people drink diet pop, not that diet pop makes you fat.

Situation 3:
Theory: The number on your thermostat and the temperature in your house are correlated.
Data: The thermostat was set at 50, and the house was 50. The thermostat was set to 80 and the house was 80.
Conclusion: There is a perfect correlation between the two, and the number on the thermostat is the cause of the temperature in the house. If you manipulate the number on the thermostat, you are thereby manipulating the temperature in the house.
My analysis: That is Good statistics. Bad statistics would be if you saw the correlation and came to the conclusion that the temperature in the house controlled the number on the thermostat.

Situation 4:
Theory: There is a correlation between video game violence and violent children.
Data: A high percentage of violent children play video games. A high percentage of non-violent children play video games. Overall, childhood violence is on a decline, despite the rise in violence in video games.
Conclusion: Given that there are equal percentages of violent and non-violent children that play video games, and that game usage is on the rise, and that violence levels are falling, there is an inverse correlation. That means that when one goes up, the other goes down (think temperature v.s. snowfall, the lower the temp, the more snow.).
My analysis: This is good statistics.

So. The moral of the story is:
When you see data from research identify what the factors are, and identify what the causes of those factors are. Do not assume that because there is a correlation that one automatically causes the other, and if it does, make sure to identify which factor controls which.

November 6th, 2006, 04:10 PM
Like the Special K commercial which says women who eat breakfast weigh less. This doesn't mean that breakfast makes you weigh less. Damn Correlation.

November 6th, 2006, 05:11 PM
Bob...you've hit the nail on the head in what drives me insane about the majority of today's media (and politics for that matter)...THE SPIN!!

The foolish manner in which facts are twisted or misunderstood is maddening. And trying to clear it up and explain the actual situation to someone is even moreso.

I feel your pain. Ask my wife how often I rant at articles and blurbs in the news. These things rely on lazy mindsets...they know that a very small minority will actually go back and check the facts.


November 6th, 2006, 05:12 PM
Correlations could exist because of causality, a third factor effecting the two measured ones, or just randomness (the least likely).

Correlation is often used in the place of causality when causation is hard to disern or prove conclusively, espcially in chaotic systems, like biochemestry, the economy, the weather, etc.

For example, financial analylists often look for negitively correlated industries or countries to put their money in, a prossess called hedging. This is done to keep one consistantly making money dispite changing market conditions. Sometimes there is no discernable or provable reason(s) why things are this way, yet they are, and people take advantage of it all the time.

Like the Special K commercial which says women who eat breakfast weigh less. This doesn't mean that breakfast makes you weigh less. Damn Correlation.

Actually it makes sense. Eating breakfast means you are less likely to pig out at lunch or eat unheathy snacks before lunch. Breakfast does make you thin by itself, but rather there are many intervening influences, causes, and effects between them. It is not a simple, stright line correlation.

November 6th, 2006, 05:15 PM
Sometimes things just happen that have no effect on each other. I pained snowflakes onto my fingernails, and the weather got warmer. Two totally unrellated events, but if I wanted to do stupid statistics I would link the two in a heartbeat.

November 6th, 2006, 05:21 PM
Now you are just misuderstanding what statisics is, one could say eveyone that has eaten carrots has died, yet no causality is there. Same with the snow flakes.

This is not to say the dicipline is wrong, you just need to see how a thing works in time, see my added comment above, the quote and below.

November 6th, 2006, 05:46 PM
I don't misunderstand, I was making a point. Sometimes they link things that are totally unrelated, like my nails and the weather.

Andara Bledin
November 6th, 2006, 06:03 PM
It's not uncommon for those who go into a study looking for a specific result to (either unintentially or otherwise) misinterpret the statistical data in a manner not necessarily supported by the results.

It's actually very easy to use statistical data to support bald-faced lies by presenting it in a biased manner. The "video games cause violence" crowd has been doing just that (some, unwittingly) for years.


November 6th, 2006, 08:27 PM
I don't misunderstand, I was making a point. Sometimes they link things that are totally unrelated, like my nails and the weather.

Like what? Video games and violence? Those things have a corellation, but the correlation does not imply causation.

bob the goat
November 7th, 2006, 07:46 AM
There is a third term, coincidence. That is the random chance that things will appear similar. That is usually overcome by taking a lot of samples.

A good example of coincidence is Bob flips a coin and gets heads, then Tom flips a coin and gets heads. Bob flips a coin and gets tails, then Tom flips a coin and gets tails. Bob flips a coin and gets heads again, then Tom flips a coin and gets heads also. The conclusion could be made that whatever Bob flips is what Tom is going to flip. The reality is that there is a 50-50 chance that they will get the same thing, so it is not impossible that they will get the same thing (there is a 1 in 8 chance that it will happen). To disprove coincidence they will repeat the same test multiple times. If you were to do the same test with 15 flips there would be a .000305% chance that you would get the same results every single time. If you did 100 samples you get a 0.00000000000000000000000000000788% chance of it still being a coincidence (and that is a real number).

Most research involves hundreds if not thousands of participants or samples just to rule out coincidence.

Andara Bledin
November 7th, 2006, 11:37 AM
Like what? Video games and violence? Those things have a corellation, but the correlation does not imply causation.


Bobthegoat was giving an example of what would be coincidence that could be misleadingly stated as correlation or causation.

Of course, when you're dealing with the games & violence question, you first have to overcome the fact that there is no rise in violence to begin with, so the rest of any argument based on a rise in voilence is fundamentally flawed. Like trying to divide by zero.