Wednesday, November 23, 2011

Sometimes the numbers are just behind the story

My contribution to journalism in the last 20+ years has mostly been technical or behind the scenes. I started as a copy editor and page designer, but eventually moved into the web side of things around 2001. Since then, my life has been HTML, workflows and other online tools. It's been about getting the content online, not getting the content.

That's changed some recently as I've spent more time working with data, helping the reporters and editors of the Statesman's investigative team with their projects. Often that means I end up publishing the data online, which I've outlined some in this blog.

But sometimes the numbers just allow you to find the story. It isn't about the numbers, it's about the people and events behind the numbers.

A good example of this is the story I recently helped Statesman reporter Tony Plohetski and KVUE reporter Keli Rabon on the cost of settling lawsuits for the City of Austin. Through a public information request, we got a list of all the checks written for settlements in the last five years.

Sure, the data showed us that the police department was responsible for most of the cases that lead to settlements, but that wasn't all that surprising, really. What makes Tony and Keli's story compelling is the human faces that they were able to match to spending categories and check amounts ... that's far more interesting that searching through a database of checks. Heck, those are already online if you know where to look for them.

So I feel my work was an important part of the story, but don't ever forget we are writing about people's lives, not about numbers. That is the real story.

Friday, November 18, 2011

Tableizer ... from tab chart to html table

A not-so-quick post about converting a tab chart in Excel into an HTML table in a jiffy (toward the end), and more detailed info about how and why I made the table to begin with. ...

Yesterday I wrote a blog post comparing the success of Big 12 NCAA football teams against the betting line. I was interested in the data because this weekend the University of Texas (my alma mater, FWIW) is favored by 8 points over Kansas State, which seems absurd. Texas -- a team that has really only seen success when running the ball this year -- will probably be down to their fourth-string running back. Their quarterback hasn't done that well when taking to the air and will most likely be missing his best receiver. Kansas State, on the other hand, is clearly a running team that knows how to pound the ball and find a win.

I guess Vegas has to set the line to counter a bunch of Texas homers who bet on UT even though they will most likely lose? I dunno ... I don't follow betting much, but I did find a comment on talk radio interesting that Kansas State "doesn't get respect" in Vegas. I decided to see how they fare against the line, along with other Big 12 teams.

I found the data on and put it into a spreadsheet, using text-to-columns to clean up the data, including separating the Line column to have the W and L away from the point spread, which I didn't think I wanted to show.

At first I thought I would do a fever chart with teams in different color lines but couldn't figure out a good way to do it, so I put in a call to Matt Stiles to talk out what might be the best way to visualize this. He, of course, went immediately to the best answer of a simple chart with the teams as columns, the weeks as rows and the data points as the wins & losses.

Once I saw that in my head (and in an Excel example) I knew I could pull that off myself with a pivot table. Maybe I did it the hard way, but I knew I needed values for the wins and losses vs. the line, so I created new rows and used  the IF string command to convert them to numbers. I use the formula =IF(G2="W",1,0) which said if the cell is a W then make write the number 1, if not, write 0. This means only wins would be counted when I created my pivot table.

I discovered during the writing of this post that the IF statement wasn't a good one. There was one game that didn't have a line, which had "N" as a value. That ended up being counted as a loss with a zero. A better formula is =IF(G3="W",1,(IF(G3="N","N",0))) which says if the value is W, make it a 1. If not, check if the value is N and make it N, if not, make it 0 (which is all the L's)

I did the table with my columns as the teams, the rows as the weeks and my data points as the new 1/0 column. This allowed me to add up the wins vs. the line per team, then I manually ranked them from left to right.

To make it pretty and readable, I copied the values of my pivot table into a new sheet, then searched/replaced the 1 for Ws, the 0s for Ls and the blanks for Ns. I also color coded it in Excel, but really had to do that again later. It's about here I'm thinking, how am I going to show this online? Take a screenshot of my pretty Excel spreadsheet? That could work, but probably not the best way.

So, I pulled out an tool by Danny Sanchez called Tableizer, which allows you to copy and paste the tab-delimeted text from an Excel file and then poof! A table! This is something I used to do easily using BBEdit on a Mac, but alas the Macs have been phased out of our newsroom ;-(. Too bad Notepad++ doesn't have that handy-dandy function.

I did have to do some modification of the table after it was tableized ... searched and replaced the W with W so I could use a style to color the background of that cell. I did the same with the L and N cells.

So, enough talking about it ... here is the table:

Big 12 Teams vs betting line this season

WeekK-StateOK StateMissouOUKansasTexasTexas TechIowa StateBaylorTexas A&MTotal Big 12 Wins
Team Wins vs. Line8866555442

Betting line history via | Created with help of Tableizer | Hat tip to Matt Stiles for ideas on the visualization.

"W" is a win vs. the line, "L" is a loss vs. the line and "N" means there was no game or no line for that game.

Wednesday, November 02, 2011

Adding a Google Chart image to a Fusion Tables map

This past weekend, we ran a story in the Austin American-Statesman looking at five years of crime reports for borders along the Texas-Mexico border (Statesman analysis shows that statistics don't back up claims of rampant drug cartel-related crime along border). We felt it important to look at the actual numbers and to compare them with the anecdotal evidence offered in the Texas Border Security report that says the border is a "war zone."

You can read the story for the details of the report ... what I want to talk about here is this interactive graphic Crime rates along the Texas-Mexico border that used numbers from the analysis to put charts in the info windows of a Google map. I figure this might be a useful skill for others to learn, and if I can do it, others can do, and I'm sure it can be improved on.

The map is a combination of two tables in Google Fusion Tables, one for counties (the polygons) and one for cities (the points.) I created and uploaded those tables specifically for this display using data from the Texas Department of Public Safety. We gathered Texas Crime Reports from 2006 to 2010, and then plotted the yearly change for the 14 border counties, for the Sheriff's departments in those counties and for some of the larger city police departments. We looked at these cities and the unincorporated areas so we could compare rural and urban areas to see if there was much difference. You can look at the many, many worksheets created here

For these Fusion Table records, we just wanted the violent, property and total crime rates and percentage of change, so I didn't need all the more detailed numbers. I did add some columns to aid in coloring the polygons and points the way that I wanted to.

But what I really liked was the adding of the chart of the violent crime rate into the pop-up window. I basically used the Google Charts Wizard to make a graphic as an image, then changed the values in the image call to use tin the "Configure info window" in Google Fusion Tables.

The img tag looked something like this:

src="|2006|2007|2008|2009|2010&chxr=0,2006,2010&chxt=x&chs=300x150&cht=lc&chco=76A4FB&chds=0,1000&chd=t:{2006_Viole},{2007_Viole},{2008_Viole},{2009_Viole},{2010_Viole}&chls=1&chma=40,20,20,30&chtt=Violent+Crime" width="300" height="150" alt="Violent Crime"

The parts in curly braces were the variables that would pull in the data for that record.

I'm sure there are better, prettier and more complicated ways of adding charts to info windows, but this was an easy and successful first attempt!

  © Blogger template 'A Click Apart' by 2008

Back to TOP