Friday, November 13, 2015

Statesman seeks news apps developer

June 2016 update: Both of these posts are now filled.

While the official application isn't up and running quite yet, I'm looking for a news apps developer (two, actually) to join our team at the Statesman. We've done some incredible work lately, and now it's time to build upon that with more, better and different. If you have interest, please fill out the official application. please email me at and include the normal stuff: resume, examples of your work, etc. Here is a job description:

The Statesman is looking for a news applications specialist for the interactives team. This developer journalist will work with reporters, editors and other team members to design and build interactive graphics, data visualizations and news applications to support journalism ventures.

Our projects vary in scope, platform and content:


  • Work as part of a newsroom team to bring our most important stories to our readers in a compelling way across all our various digital platforms.
  • Develop, code, test and debug news apps for mobile and wired platforms.
  • Share and expand knowledge with other team members, and to learn from the experience of others.
  • Research new technology and best practices and tools and analyze for best fit, usage, stability and performance.
  • Communicate with both technical and non-technical colleagues to serve as a bridge between content and digital design of applications and visualizations.
  • Some reporting and contacting sources for data and information.


  • Understanding of data structures and database management
  • Familiarity with web API’s and common data visualization libraries
  • Demonstrated ability to turn concepts into user-focused apps using HTLM5/CSS3/Javascript. We use Node.js for package management, PHP/WordPress, Python and Django. While experience in these areas is preferred, we recognize developers are adaptable, and so are we.

Friday, June 19, 2015

Why are medians so hard? Tableau made it easy.

I have this salary data -- title, gender, length of service and the annual salary -- and I need to do some comparisons. Seems like some easy stuff. Well, it's not as easy as you might think using our typical data workhorses of Excel and SQL.

Salaries are one of those data sets where extreme ranges can skew distributions, making the mean (or average) a poor representation of the data. A small number of really high or low salaries can move the average out of whack from the rest of the data.

Excel does not have median as sum
option for pivot tables.
So, Viva La Median! Line all those salaries up in order and pick the one in the middle (or average the two in the middle) and you get a better sample to explain your data set. I have more than 10,000 rows of data with hundreds of job titles to compare, so I can't just use MEDIAN() at the bottom of an Excel column with so many titles.

So I whip out Excel's Pivot table, put my job titles in row and annual salary in the values and then ... wait. What? No median? I can average, sum, min, max ... but no median? There might be some solutions that I haven't really checked out, because I figured I'd just do this in MySQL.

Strike two. MySQL does not have a median aggregate function. Nor does PostgreSQL. There are some threads on Stack Overflow that might get me there, but ... my head hurts.

But I was able to solve this in about three minutes using Tableau. After importing my data, I set the Title on the Rows shelf and Annual Salary on the Text mark. I used the contextual menu for SUM(AnnualSalary) and change the Measure to Median, as show in the screenshot.

Once I had the data on the screen the way I wanted it, I went to the menus Worksheet > Export > Crosstab to Excel, which saved out the files as an Excel spreadsheet.

Of course, I can and did analyze and visualize the data within Tableau, but in this case I had a need to get the data out into another program as well.

Sunday, March 08, 2015

NICAR15: Today I Learned ... I Want To

I'm writing this on the last day of #NICAR15, but I should've started this before #tapestryconf five days ago. Let this be a lesson to you on the way to Denver for #NICAR16. Take my advice:

Every night before you go to bed, take a couple of minutes to write down what you learned that day, and what you want to do with it. TiL and IWT for each and every day. Better yet, work on it all day. Do it on your phone if you have to. I'm doing this in the airport as I wait for my flight back to Austin, but I bet I miss stuff.

Getting to Tapestry 2015


  • That fog can bring an airport to its knees.

Tapestry 2015


  • The NYT is still awesome, but that's nothing new.
  • Meredith Broussard is full of energy and vigor, and it is contagious. Thank you for your enthusiasm.
  • Where is the YOU in my visualization. Chad Skelton gave a great talk at Tapestry about keeping the reader in mind when it comes to your presentations. Showing income; add a comparison calculator.
  • Ben Jones' 7 Data Story Types was thought provoking.
  • All the other talks were good, but this is the downside of not keeping the list as I go along.
  • Keep in mind the reasons stories interest people. It boils down to the basic needs of food, safety and companionship.


  • Make a "you" based visualization this month. Maybe our property tax project might be the avenue.

NICAR day 1: Programming in Python


  • Lots. Python is a lot easier to learn than I imagined.
  • I can make a scraper.
  • There is a drop down in Sublime to change your tab stops to spaces, and you can set how many.
  • Add this to user preferences in Sublime to show all whitespace: "draw_white_space": "all",
  • I already knew Tom Meagher was a kindred spirit, but he reconfirmed it. Bravo.


  • Create a script in Python where I can feed it a csv from Socrata and convert the Address field to six new fields.
  • Create a Python script using csvkit to combine multiple sheets within an excel file. (Tom says I need to look at  for this.

NICAR day 2


  • I can teach Regular Expressions on a bus, in a bar, and in a way people can understand. There is some secret sauce:
    • Explain the goal
    • Use
    • Use groups
    • Make it useful
  • importHTML(“url”,”table”,#) to pull tables into Google Spreadsheet.
    • I can do this in python
    • Or I can use Google Spreadsheets if I'm lucky
    • Or I can use (might be best of those). (I didn't see this demoed, but I Want To explore it)
  • Analytics Add-in for Excel is awesome for:
    • Making histogram of data
    • Simple regressions: but do it in SPSS for realz
    • Simple correlations
  • Mapbox meh
  • Augie Armendariz is awesome. A kindred spirit. I want to know a 1/4 of what he does.
  • ogr2ogr might come in handy some day to convert shapefiles
  • The csvkit in2sql is freaking awesome. Sniffs the csv and creates an import statement.


  • Show the Data Analysis Add-in to my classes, and use it more.
  • Teach the Regex class at NICAR in Denver.
  • Use csvkit to wrangle property tax data
  • Explore



  • There is some command-line mySQL that is cool, but I think csvkit is more awesome
  • You can reference the index of a field in a select statement in the group by and order by fields
SELECT name, job, year, sum(salary)
FROM myTable
GROUP BY 1,2,3


  • Review notes from command-line MySQL for goodies.
  • Look up using CASE in select statements. Liz did something interesting there I don't fully remember.
  • Look more into as I didn't get see the demos, but it looks interesting.



  • Just-enough-Django was some kind of awesome.
  • We can easily publish our admins to our local network. It is in the f'n docs.
  • Ben Welsh is also a kindred spirit.


  • Rebuild homicides in Django?
  • Do Restaurants back end in Django?
  • The wine guide comes first, I think.
  • Bring a gaggle of UT-Austin Journalism students to #NICAR16 in Denver, so they can be inspired early in their careers.
CORRECTION: I had an embarrassing spell correction error when describing Meredith Broussard's enthusiasm. I've fixed it now.

Thursday, January 15, 2015

The CPS "Missed Signs" project

I haven't written in the ol' blog here in some time, and there has been LOTS going on. It all culminated this past week with the publication of Missed Signs, Fatal Consequences, an immersive story-telling, data-driven project by the Austin American-Statesman.

The project started with obtaining through public record requests abuse- and neglect-related child fatality reports from Child Protective Services. The documents, required since 2009, are only available as a PDF, and the content is not saved as data to be analyzed in any manner. Great law, but of little use if no one is using the reports.

Before I go any further, let it be known unto all the world that Andrew Chavez is the genius behind all the online development and design for this project, including that awesome data explorer. He's taken our online immersive template, rebuilt it with bionics and really made it sing. It's something we can build with and on into the future (including this weekend with another immersive project.)

The first thing we did (my contribution) was to create a Caspio database to collect certain fields from the 780 documents. Caspio is derided by many for good reason, but it is actually pretty good for this purpose ... to collect hand-entered information in a structured manner. (I would perform morally-suspect tasks for a JSON output direct from Caspio, though. Might even pay for it.) It would've been great if we could scrape the PDFs for our data, but the reporters were doing on-the-read analysis that couldn't be done programmatically. We also put all the documents in DocumentCloud to help with reporting (and for later use in our online data explorer and links within stories).

I made lots of changes to the forms as we went along, responding to requests and storylines found as the reporters read through the reports. We picked at it for a year, adding reports as they came out.

And then around summertime, the investigative team got serious. Stories were reported out by Andrea Ball and Eric Dexheimer, and they eventually went near full-time on the project. As stories gelled and sources were found, our visual folks Kelly West (video) and Laura Skelding (photography) started working their magic.

And Andrew. Wow. He created a template system that helped us wrangle all this mass of content (15 stories and about 100 images) into a clean, integrated, responsive, online masterpiece. I learned so much during this project just dabbling my little bit in the code ... I'm giddy with excitement about all the tech. Just love it.

What was that tech? Oh, gawd. Andrew should be listing all of it out, but I'll give it a go:
  • We used Node locally with all kinds of added helpers, starting with Bootstrap as framework. Handlebars for templates. Leaflet for maps.
  • Highcharts and underscore for visualizations in the story. DC.js for the data explorer.
  • Backbone to bring out data across all page, like with the child pop-ups where we reveal basic data and point to source documents when a child is mentioned.
  • Lots of grunt, including grunt-generator to bake out all our files into flat files. The finished project is entirely self-contained html/images/javascript with no server-side processing. It could run anywhere.

  © Blogger template 'A Click Apart' by 2008

Back to TOP