Wednesday, February 9, 2011

Median Girls

"Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: "There are three kinds of lies: lies, damned lies and statistics."
- Mark Twain

I love statistics. Understandably, this makes me quite popular with the ladies. It also means I have no friends. But dammit, if loving statistics is wrong, then I don't want to be within three standard deviations of right.

In college, I took a stats course that focused on, get this, understanding how statistics works. Radical, I know. I never had to memorize any formulas or equations, and tests were open-note. This may sound like I went to school in Alabama, but I assure you, my memory and my diploma tell me I went to U.C. Irvine.

The course was invaluable. I learned the "how" of lying with statistics. I gained the ability to, at a glance, determine what flaws there might be in a particular application of statistics in the construction of an argument. In keeping with the random, bite-sized chinks of knowledge I'm trying to share with the denizens of the intertoobs who pass through here, I'll discuss one concept that came up the other day, one that I believe is essential to grasp: the difference between "mean" and "median."

"Average" is a red-flag word to a stats geek like me. "Mean" is the term we most often associate with the term "average." Mean is calculated by adding up all the data points (numbers) and dividing by the number of data points. For my example, I'll use made-up incomes of made-up people. I chose this because the misuse of averages that inspired this post came up in the context of trying to figure out what people in this country make, income-wise.

So, take five hypothetical persons. They make, in gross annual income, $25,000; $35,000, $55,000, $100,000; and $1,000,000. That last figure, as always to be said like this:

Their mean income is $243,000 (add up the numbers and divide by five).

Under certain circumstances, it is much more useful to understand what the "median" income is. Median is the middle number, when all number are lined up in order. So, in my example above (which is conveniently laid out in order already), the median is $55,000*. This means that for every income above that number, there is one below it. This shows us something about the distribution of income that is different from the "average" (mean).

There is also another term, "mode." This refers to the most frequent data point. In the set 1,2,3,3,3,4,5,5,6,7, the mode is 3. It isn't terribly useful in everyday thinking about stats, I have found.

To highlight the significance of the difference between median and mean, let's add two new people to our hypothetical. One makes $50,000, the other $500,000. The new mean income is $252,142.86 (rounding up to the penny). The new median? Trick question: It isn't new, it is still $55,000. So now answer this question: Did incomes rise? Well, "average" income did, based on the mean. But we still have as many people making less than $55,000 than making more. Try adding a multi-millionaire the the list, but add 15 new people making minimum wage ($14,500, if you use the Federal minimum of $7.25/hr, for 8 hour days, 5 days a week for 50 weeks a year). See what happens to the mean and median, and you'll begin to see that when people talk about the average income in this country, mean and median make a world of difference.

Bonus fact: 2009 median household income in the United States was $49,777. I couldn't find the mean, because it is widely acknowledged to be practically useless. Given that there were 403 billionaires in the United States in 2009, I can assure you the mean would be higher than $49,777.

* To calculate the median in a set of even numbers, just take the mean of the two middle numbers, like this:
The middle two in the set 1,2,3,4,5,6 are 3 and 4. Add  3+4, then divide by 2 and you get a median of 3.5.

1 comment:

  1. This article made me think of The Black Swan by Nassim Nicolas Taleb. It's not one-hundred percent related, but it does address the egregious misuse of statistics in finance, economics, and life in general. He makes an interesting case for the incorrect applications of the bell-curve and other such statistical phenomena and then takes it way past a logical conclusion. I still think it was interesting and worth reading...