Wednesday, November 23, 2011

Sometimes the numbers are just behind the story

My contribution to journalism in the last 20+ years has mostly been technical or behind the scenes. I started as a copy editor and page designer, but eventually moved into the web side of things around 2001. Since then, my life has been HTML, workflows and other online tools. It's been about getting the content online, not getting the content.

That's changed some recently as I've spent more time working with data, helping the reporters and editors of the Statesman's investigative team with their projects. Often that means I end up publishing the data online, which I've outlined some in this blog.

But sometimes the numbers just allow you to find the story. It isn't about the numbers, it's about the people and events behind the numbers.

A good example of this is the story I recently helped Statesman reporter Tony Plohetski and KVUE reporter Keli Rabon on the cost of settling lawsuits for the City of Austin. Through a public information request, we got a list of all the checks written for settlements in the last five years.

Sure, the data showed us that the police department was responsible for most of the cases that lead to settlements, but that wasn't all that surprising, really. What makes Tony and Keli's story compelling is the human faces that they were able to match to spending categories and check amounts ... that's far more interesting that searching through a database of checks. Heck, those are already online if you know where to look for them.

So I feel my work was an important part of the story, but don't ever forget we are writing about people's lives, not about numbers. That is the real story.

Friday, November 18, 2011

Tableizer ... from tab chart to html table

A not-so-quick post about converting a tab chart in Excel into an HTML table in a jiffy (toward the end), and more detailed info about how and why I made the table to begin with. ...

Yesterday I wrote a blog post comparing the success of Big 12 NCAA football teams against the betting line. I was interested in the data because this weekend the University of Texas (my alma mater, FWIW) is favored by 8 points over Kansas State, which seems absurd. Texas -- a team that has really only seen success when running the ball this year -- will probably be down to their fourth-string running back. Their quarterback hasn't done that well when taking to the air and will most likely be missing his best receiver. Kansas State, on the other hand, is clearly a running team that knows how to pound the ball and find a win.

I guess Vegas has to set the line to counter a bunch of Texas homers who bet on UT even though they will most likely lose? I dunno ... I don't follow betting much, but I did find a comment on talk radio interesting that Kansas State "doesn't get respect" in Vegas. I decided to see how they fare against the line, along with other Big 12 teams.

I found the data on and put it into a spreadsheet, using text-to-columns to clean up the data, including separating the Line column to have the W and L away from the point spread, which I didn't think I wanted to show.

At first I thought I would do a fever chart with teams in different color lines but couldn't figure out a good way to do it, so I put in a call to Matt Stiles to talk out what might be the best way to visualize this. He, of course, went immediately to the best answer of a simple chart with the teams as columns, the weeks as rows and the data points as the wins & losses.

Once I saw that in my head (and in an Excel example) I knew I could pull that off myself with a pivot table. Maybe I did it the hard way, but I knew I needed values for the wins and losses vs. the line, so I created new rows and used  the IF string command to convert them to numbers. I use the formula =IF(G2="W",1,0) which said if the cell is a W then make write the number 1, if not, write 0. This means only wins would be counted when I created my pivot table.

I discovered during the writing of this post that the IF statement wasn't a good one. There was one game that didn't have a line, which had "N" as a value. That ended up being counted as a loss with a zero. A better formula is =IF(G3="W",1,(IF(G3="N","N",0))) which says if the value is W, make it a 1. If not, check if the value is N and make it N, if not, make it 0 (which is all the L's)

I did the table with my columns as the teams, the rows as the weeks and my data points as the new 1/0 column. This allowed me to add up the wins vs. the line per team, then I manually ranked them from left to right.

To make it pretty and readable, I copied the values of my pivot table into a new sheet, then searched/replaced the 1 for Ws, the 0s for Ls and the blanks for Ns. I also color coded it in Excel, but really had to do that again later. It's about here I'm thinking, how am I going to show this online? Take a screenshot of my pretty Excel spreadsheet? That could work, but probably not the best way.

So, I pulled out an tool by Danny Sanchez called Tableizer, which allows you to copy and paste the tab-delimeted text from an Excel file and then poof! A table! This is something I used to do easily using BBEdit on a Mac, but alas the Macs have been phased out of our newsroom ;-(. Too bad Notepad++ doesn't have that handy-dandy function.

I did have to do some modification of the table after it was tableized ... searched and replaced the W with W so I could use a style to color the background of that cell. I did the same with the L and N cells.

So, enough talking about it ... here is the table:

Big 12 Teams vs betting line this season

WeekK-StateOK StateMissouOUKansasTexasTexas TechIowa StateBaylorTexas A&MTotal Big 12 Wins
Team Wins vs. Line8866555442

Betting line history via | Created with help of Tableizer | Hat tip to Matt Stiles for ideas on the visualization.

"W" is a win vs. the line, "L" is a loss vs. the line and "N" means there was no game or no line for that game.

Wednesday, November 02, 2011

Adding a Google Chart image to a Fusion Tables map

This past weekend, we ran a story in the Austin American-Statesman looking at five years of crime reports for borders along the Texas-Mexico border (Statesman analysis shows that statistics don't back up claims of rampant drug cartel-related crime along border). We felt it important to look at the actual numbers and to compare them with the anecdotal evidence offered in the Texas Border Security report that says the border is a "war zone."

You can read the story for the details of the report ... what I want to talk about here is this interactive graphic Crime rates along the Texas-Mexico border that used numbers from the analysis to put charts in the info windows of a Google map. I figure this might be a useful skill for others to learn, and if I can do it, others can do, and I'm sure it can be improved on.

The map is a combination of two tables in Google Fusion Tables, one for counties (the polygons) and one for cities (the points.) I created and uploaded those tables specifically for this display using data from the Texas Department of Public Safety. We gathered Texas Crime Reports from 2006 to 2010, and then plotted the yearly change for the 14 border counties, for the Sheriff's departments in those counties and for some of the larger city police departments. We looked at these cities and the unincorporated areas so we could compare rural and urban areas to see if there was much difference. You can look at the many, many worksheets created here

For these Fusion Table records, we just wanted the violent, property and total crime rates and percentage of change, so I didn't need all the more detailed numbers. I did add some columns to aid in coloring the polygons and points the way that I wanted to.

But what I really liked was the adding of the chart of the violent crime rate into the pop-up window. I basically used the Google Charts Wizard to make a graphic as an image, then changed the values in the image call to use tin the "Configure info window" in Google Fusion Tables.

The img tag looked something like this:

src="|2006|2007|2008|2009|2010&chxr=0,2006,2010&chxt=x&chs=300x150&cht=lc&chco=76A4FB&chds=0,1000&chd=t:{2006_Viole},{2007_Viole},{2008_Viole},{2009_Viole},{2010_Viole}&chls=1&chma=40,20,20,30&chtt=Violent+Crime" width="300" height="150" alt="Violent Crime"

The parts in curly braces were the variables that would pull in the data for that record.

I'm sure there are better, prettier and more complicated ways of adding charts to info windows, but this was an easy and successful first attempt!

Wednesday, September 21, 2011

5-layer limit for Google Fusion Tables

My "learning the hard way" lesson yesterday was discovering that you can only have five Google Fusion Table layers on a map at one time.

I was working on a map for an upcoming story that used both polygon areas and points. I needed to break the polygons into five different groups, and planned to do the same with the points. I figured I could create different layers in ArcMap using Select by Attributes, import each layer into Fusion Tables and then color them appropriately.

All that was fine as I worked on my five polygon layers first, but as I tried to add my first point layer, it wouldn't show up and I was at wit's end. I checked to make sure the layer was public, and that I hadn't copied and pasted the two sections of code incorrectly in my HTML file. I then backed down to my original five layers and swapped a point layer table ID in and it worked, so I knew it wasn't the table layer, it was something else.

Fearing there was some kind of limit, I searched around and found a couple of blog posts and this reference in the Gooble Maps Javascript API (under the headline Limits) that says "Up to five Fusion Tables layers can be added to a map."

What was my solution? For my polygon layers, luckily I had all the data in an single layer in ArcMap to begin with ... I had been trying to separate them so I could color them separately in Fusion Tables. What I did was add a new column to the data and made up my own numbering system to designate the different layers. Then, once in Fusion Tables, I set the fill color to use Buckets and then picked that column and set the colors manually based on the numbering system I had come up with.

I did the points more or less the same, adding a "marker" column based on this reference.

I'll publish a link to the map once the story runs.

Monday, September 12, 2011

Exporting to KML, putting layers together

I'm still learning quite a bit about using ArcMap and Google Fusion Tables together to publish  maps online of the homes destroyed in the Bastrop County Complex fire, and two things came up late last week that are worth sharing.

First, I usually use a wonderful tool called Shape To Fusion (or SHaPeEscape as we call it, after the domain name to upload my ArcMap shapefiles (my map data with points, polygons and other such goodies) to Google Fusion Tables. It's a freaking awesome tool by Josh Livni that is free to to use. But on this particular day, there were several files in the queue and/or it was backed up, so my files were not processing quickly enough for me.

I figured there had to be a way to export KML (the map file format that Fusion Tables/Google Earth needs) directly from ArcMap. I searched around and found this Export to KML 2.5.4 script for ArcGIS Desktop. It's a public domain plugin that you can download and install, allowing you to export our ArcMap layers into KML. I found it a little clunky, and it didn't seem to export all my other data -- just the columns with the shapes -- but it got me where I needed to be in a time of need. I still prefer ShpEscape, but the Export to KML plugin is worth playing with. Are there other, better plugins out there for this?

My next challenge was to layer together a shapefile of the fire boundary behind the dots of all the houses. For this, I used the Fusion Table Layer Builder tool. This allows me to show a single map with two (or more) different Fusion Tables. I thought at first I would just merge the tables, as I've done in the past with Census shapefiles data, but in this case there wasn't a foreign key to merge on. (In other words, I didn't have something in the boundary table that was the same in the list of houses.) The layer builder allowed me to start with the fire boundary, and then add the house list on top of it, giving me resulting javascript code that I could publish and pull into my site as an iframe. You can also set the size of the map, the centerpoint, the zoom level and what other Google Maps controls you want to display. It's a sharp little tool.

Wednesday, September 07, 2011

Two more days of fire mapping

Another day (or two) and another fire map (or three).

Tuesday I spent the day searching official data on where fires have been burning in Central Texas. I wanted outlines or addresses, something better than the dots the Texas Forest Services provides, or than the InciWeb data I wrote about the other day.

I spent the most time working with the USDA Forest Services's Remote Sensing Applications Center site, working with the Current Large Incident data. What was really nice about this was I could download the KML directly from them and load it into Fusion Tables. The result was this map of fire centerpoints which gives the scope of the large fires, but misses some others that were important to our area, like the Steiner Ranch fire. Even though our fire officials said it was a 162-acre fire, it didn't show up even though they are supposed to show fires over 100 acres. The other issue is the map doesn't update on it's own. Pity.

Of course, about the time I did that, someone pointed us to this Google Crisis Center map, which does update as more data becomes available, so that is nice. I like the Red Cross shelters, too, though I could've figured that out.

But today, we finally got some official releases of damaged houses from Bastrop County. I had hopes we could just upload the addresses into Fusion Tables, but as usually some of the addresses didn't map well. Mockingbird Ln addresses ended up on Mockingbird Hill several miles away, and some county roads didn't do well, either.

I put the address list into ArcGIS and geocoded them there, and got better results on all but three addresses. I ended up just hand-plotting those based on the maps from Bastrop County and publishing a map here.

Monday, September 05, 2011

Mapping the Central Texas Wildfires

There are a series of wildfires burning around Central Texas right now, and first and foremost I offer up prayers and condolences to those who have lost property in the fires. It's been a catastrophic weekend.

In response to the fires, I created two news applications involving maps ... one that allows readers to file reports about fires and damage that we can map, and the other that takes Texas Forest Service fire data and maps it.

The reader reports app (Submit fire reports | Mobile friendly form | Map of reports) was really a reworking of a Caspio database I had done last winter to map rolling blackouts by ERCOT. They ended by the time I built it, and I thought we would use it this summer during our record heat, but it appears we've escaped the summer without blackouts (let's hope!) Anyway, it took me about an hour to edit the forms and map to collect fire data instead of blackout data.

Caspio does a good job with their map mashup directions, making user-generated content like this pretty easy.

The second app of Texas Wildfire Incidents was created using a GeoRSS feed from InciWeb, fed into MapChannels and embeded on our page. Unfortunately, the data from the Texas Forest Service is not very comprehensive (descriptions are cut off; geolocations not always there) and the MapChannels map is buggy as a result. I just found that MapChannels site today and that bears some more investigation.

There is definitely more to be done with this story. Will have to dive into it some more tomorrow.

Saturday, September 03, 2011

Using ArcGIS, Fusion tables for Census story

I was recently able to attend to workshops sponsored by Investigative Reporters & Editors at the University of Missouri ... one on computer assisted reporting (really Excel and Access training) and one on mapping (using ArcGIS.) Both courses were great, but I was quite brain dead after seven solid days of training.

The the first fruits of that labor was published Saturday in the Austin American-Statesman and in a story about same sex households in Texas by Juan Castillo. In addition to helping Juan with deciphering and ranking the U.S. Census data, I created two online maps showing the change of same sex households in Texas counties and Central Texas communities.

Other than the program ArcGIS, I used several online tools to make the maps. I used IRE's project to download the data and shape files I needed. If you know what you are looking for, it's much easier than the Census Bureau's American Factfinder2. After doing some clipping and joining in ArcGIS, I created new shapefiles, which I then pushed to Google Fusion Tables using and awesome tool by Josh Livni called Shape to Fusion (or Shape Escape).

The display design on was lifted from my colleague Gerald Fullam from the Dayton Daily News, who modified code originally by the Chicago Tribune.

I'd done work with Fusion Tables before but this was a little more complicated because I had to snip the "places" shapefiles to include just communities in our five-county MSA, a task that ArcGIS makes much easier than trying to go through a massive file in Excel. It was cool.

So, I get a byline of sorts on and justify my existence and the training my paper paid for me to attend (thank you!)

(After looking at my colleague's print graphics, I wish I had done same sex household percentages for tracts in our 5-county MSA, and if I can get some other stuff out of the way this weekend or Tuesday, I might try to whip that up. )


Friday, April 15, 2011

A foray into Google Fusion Tables

I've really been admiring the work of Matt Stiles, Ryan Murphy and others at the Texas Tribune. The crew there has been leader in data journalism, an area I have a lot of interest in. One of the main tools on their data journalism workbench is Google Fusion Tables, which makes visualizing data over a geographical area pretty easy.

So, when an opportunity came up to work on a story here at the Statesman that merged data with geography, I asked Stiles for a 10 minute tour of Fusion Tables, and then dove right in. The result was this four-page set of graphics that map Fitnessgram results (as analyzed by the Michael and Susan Dell Center for Healthy Living) across Central Texas public school districts and across Texas House districts.

The public nature of Google Fusion Tables made this easier, in that geography for both Texas school districts and the Texas Legislature was available already on Fusion tables (again, thanks to Stiles.)

Mary Ann Roser did a fantastic job on the story, explaining why analysts found these statistics so alarming.

While I'm stoked about this first foray into Fusion Tables, there is much still to learn. I wanted to combine some of the maps on a single page, but when I brought in the new visualization using Javascript, it layered it on the current view instead of replacing it. I also think there is more to do pulling in table information from Fusion Tables using jQuery instead of the Caspio route I took out of familiarity.

There is lots to learn. At least I won't be bored.

Thursday, March 31, 2011

Transitioning Google Apps to new infrastructure

I've been waiting for this for years now ... to be able to use Google services like Picasa, Reader and Blogger from my personal Google Apps account where I run my family domain,

Finally, Google has made this possible. I got notice last night and immediately began the transition to the new infrastructure.

The biggest issue I've come across is I can no longer log into my Google Apps personal email from the same browser as my regular Gmail account, where I happen to manage my work email. (I was warned, so this was not a surprise.) It was nice to be logged into both accounts on different tabs in Firefox. I can't do that any longer, but I can switch back and forth using multiple sign-in. This hasn't been the end of the world ... at least not today. It's probably helps my work output as I don't have personal email pinging me all day.

(Update: I later found that I could log into two Google accounts at the same time for many applications, especially GMail. You can edit your Google Account settings to do that.)

Calendar and Docs are the two services I use the most beyond Gmail. I like having Docs on separate accounts, since I've been using my traditional Google account for work-related stuff. I can always share documents between the two accounts if needed. Calendar is a no-brainer ... all of those are shared to both accounts as well so it doesn't matter which account I'm in.

The service I've wanted this most in Apps is Picasa Web. I'm in the process of downloading my 1G+ worth of photos in order to re-upload into my version of Picasa. All of that content is personal so I've wanted to move it out of my now-work-related Google account into my personal Apps account. I may reconsider moving them if I lose all the dates, maps and such. It just may not be worth it.

Blogger is another service I've wanted to transition. Again ... personal content, personal account ... I just want to manage that from the "right" place. It's pretty simple to move: just use the traditional account to invite the Apps account as a contributor, and then make that Apps account an admin.

I'm waiting to deal with Reader, as I'm not sure what to do. I'm not a heavy RSS consumer, to be honest. I have lots of feeds set up, but I don't check them regularly. My Reader account is a mixture of industry stuff (news and technology) and personal stuff (mostly Scouting) so I'm not sure how to split the blend or even if I want to.

I don't have enough YouTube content to worry about moving it. We'll see. I don't use Voice much, either, but that might be one to consider.

Here are some other things others might have to deal with in transitioning to the new Google Apps infrastructure:

Saturday, March 19, 2011

Beluga is great fit for group events and breaking news

Last week during the the South by Southwest Interactive festival, I knew where all my friends and colleagues were and if their panel was a gem or a dud through a free group chat service call Beluga. The service was started by three former Google employees, and I hadn't heard of it before even though it had been in the news recently when it was bought by Facebook.

The group chat service (and there are others, like Yobongo and Groupme) is a great tool to use at a festival or other group event like SXSW, if for nothing else than to save all your Twitter followers from being flooded by massive amounts of tweets about a place where they wish they were but aren't, or never wanted to hear about in the first place.

The groups are called Pods, and it is very easy to set one up. When I created and Beluga account using Facebook Connect, the app found many of my friends that are on Beluga (probably my Facebook friends, but I didn't really think about it.) I could send a message to one of my friends, which creates a Pod, then add my other friends to the Pod.

When someone posts to a Pod you are in, you get a regular iPhone alert like a new text message, with the choice to view or close. (Beluga is available for Android as well.) When you are in the app, you can scroll back to see the chat log. You can mute a particular Pod or all alerts for an hour, until 8 a.m. or just turn them off. I found these mute options to be very handy as my SXSW buddies partied all night when I was at home with the kids.

A large pod can get pretty unwieldy, especially in the beginning as more and more people join or leave (you get a message each time.) I could see myself bailing quickly from a pod that was too large or active.

Our features editors at the Statesman used Beluga during the music festival when crowds dismantled a fence and rushed the stage at a free Stokes concert at Auditorium Shores. Editors and reporters were able to all quickly communicate with each other and deploy resources quickly and efficiently.

Monday, March 07, 2011

Looping Powerpoint presentation

It might seem easy to some of you, but I had never had to create a self-running PowerPoint presentation, so I thought I would share. (And be able to look it up myself again later :-)

If you have a PowerPoint presentation that you want to just loop over and over, it's pretty easy. I'm using {gasp} Office 2003.

  • Go the Slideshow menu to "Slide Transition." You get the right-pane window where you can set the "Advance slide" automatically at a set time.
  • If you want all the slides to move at the same speed, click "Apply to All Slides"
  • Now, under the Slide Show menu, go to "Set Up Show." Under "Show Type," choose "Browsed at a Kiosk" which will loop the show.
There's lots more about this here: Create a self-running presentation.

  © Blogger template 'A Click Apart' by 2008

Back to TOP