Friday, June 19, 2015

Why are medians so hard? Tableau made it easy.

I have this salary data -- title, gender, length of service and the annual salary -- and I need to do some comparisons. Seems like some easy stuff. Well, it's not as easy as you might think using our typical data workhorses of Excel and SQL.

Salaries are one of those data sets where extreme ranges can skew distributions, making the mean (or average) a poor representation of the data. A small number of really high or low salaries can move the average out of whack from the rest of the data.

Excel does not have median as sum
option for pivot tables.
So, Viva La Median! Line all those salaries up in order and pick the one in the middle (or average the two in the middle) and you get a better sample to explain your data set. I have more than 10,000 rows of data with hundreds of job titles to compare, so I can't just use MEDIAN() at the bottom of an Excel column with so many titles.

So I whip out Excel's Pivot table, put my job titles in row and annual salary in the values and then ... wait. What? No median? I can average, sum, min, max ... but no median? There might be some solutions that I haven't really checked out, because I figured I'd just do this in MySQL.

Strike two. MySQL does not have a median aggregate function. Nor does PostgreSQL. There are some threads on Stack Overflow that might get me there, but ... my head hurts.

But I was able to solve this in about three minutes using Tableau. After importing my data, I set the Title on the Rows shelf and Annual Salary on the Text mark. I used the contextual menu for SUM(AnnualSalary) and change the Measure to Median, as show in the screenshot.

Once I had the data on the screen the way I wanted it, I went to the menus Worksheet > Export > Crosstab to Excel, which saved out the files as an Excel spreadsheet.

Of course, I can and did analyze and visualize the data within Tableau, but in this case I had a need to get the data out into another program as well.

  © Blogger template 'A Click Apart' by 2008

Back to TOP