Friday, November 13, 2015

Statesman seeks news apps developer

June 2016 update: Both of these posts are now filled.

While the official application isn't up and running quite yet, I'm looking for a news apps developer (two, actually) to join our team at the Statesman. We've done some incredible work lately, and now it's time to build upon that with more, better and different. If you have interest, please fill out the official application. please email me at cmcdonald@statesman.com and include the normal stuff: resume, examples of your work, etc. Here is a job description:


The Statesman is looking for a news applications specialist for the interactives team. This developer journalist will work with reporters, editors and other team members to design and build interactive graphics, data visualizations and news applications to support journalism ventures.

Our projects vary in scope, platform and content:

JOB DUTIES & TASKS:

  • Work as part of a newsroom team to bring our most important stories to our readers in a compelling way across all our various digital platforms.
  • Develop, code, test and debug news apps for mobile and wired platforms.
  • Share and expand knowledge with other team members, and to learn from the experience of others.
  • Research new technology and best practices and tools and analyze for best fit, usage, stability and performance.
  • Communicate with both technical and non-technical colleagues to serve as a bridge between content and digital design of applications and visualizations.
  • Some reporting and contacting sources for data and information.

SKILLS & EXPERIENCE:

  • Understanding of data structures and database management
  • Familiarity with web API’s and common data visualization libraries
  • Demonstrated ability to turn concepts into user-focused apps using HTLM5/CSS3/Javascript. We use Node.js for package management, PHP/WordPress, Python and Django. While experience in these areas is preferred, we recognize developers are adaptable, and so are we.

Friday, June 19, 2015

Why are medians so hard? Tableau made it easy.

I have this salary data -- title, gender, length of service and the annual salary -- and I need to do some comparisons. Seems like some easy stuff. Well, it's not as easy as you might think using our typical data workhorses of Excel and SQL.

Salaries are one of those data sets where extreme ranges can skew distributions, making the mean (or average) a poor representation of the data. A small number of really high or low salaries can move the average out of whack from the rest of the data.

Excel does not have median as sum
option for pivot tables.
So, Viva La Median! Line all those salaries up in order and pick the one in the middle (or average the two in the middle) and you get a better sample to explain your data set. I have more than 10,000 rows of data with hundreds of job titles to compare, so I can't just use MEDIAN() at the bottom of an Excel column with so many titles.

So I whip out Excel's Pivot table, put my job titles in row and annual salary in the values and then ... wait. What? No median? I can average, sum, min, max ... but no median? There might be some solutions that I haven't really checked out, because I figured I'd just do this in MySQL.

Strike two. MySQL does not have a median aggregate function. Nor does PostgreSQL. There are some threads on Stack Overflow that might get me there, but ... my head hurts.

But I was able to solve this in about three minutes using Tableau. After importing my data, I set the Title on the Rows shelf and Annual Salary on the Text mark. I used the contextual menu for SUM(AnnualSalary) and change the Measure to Median, as show in the screenshot.

Once I had the data on the screen the way I wanted it, I went to the menus Worksheet > Export > Crosstab to Excel, which saved out the files as an Excel spreadsheet.

Of course, I can and did analyze and visualize the data within Tableau, but in this case I had a need to get the data out into another program as well.

Sunday, March 08, 2015

NICAR15: Today I Learned ... I Want To

I'm writing this on the last day of #NICAR15, but I should've started this before #tapestryconf five days ago. Let this be a lesson to you on the way to Denver for #NICAR16. Take my advice:

Every night before you go to bed, take a couple of minutes to write down what you learned that day, and what you want to do with it. TiL and IWT for each and every day. Better yet, work on it all day. Do it on your phone if you have to. I'm doing this in the airport as I wait for my flight back to Austin, but I bet I miss stuff.

Getting to Tapestry 2015

TiL

  • That fog can bring an airport to its knees.

Tapestry 2015

TiL

  • The NYT is still awesome, but that's nothing new.
  • Meredith Broussard is full of energy and vigor, and it is contagious. Thank you for your enthusiasm.
  • Where is the YOU in my visualization. Chad Skelton gave a great talk at Tapestry about keeping the reader in mind when it comes to your presentations. Showing income; add a comparison calculator.
  • Ben Jones' 7 Data Story Types was thought provoking.
  • All the other talks were good, but this is the downside of not keeping the list as I go along.
  • Keep in mind the reasons stories interest people. It boils down to the basic needs of food, safety and companionship.

IWT

  • Make a "you" based visualization this month. Maybe our property tax project might be the avenue.

NICAR day 1: Programming in Python

TiL

  • Lots. Python is a lot easier to learn than I imagined.
  • I can make a scraper.
  • There is a drop down in Sublime to change your tab stops to spaces, and you can set how many.
  • Add this to user preferences in Sublime to show all whitespace: "draw_white_space": "all",
  • I already knew Tom Meagher was a kindred spirit, but he reconfirmed it. Bravo.

IWT

  • Create a script in Python where I can feed it a csv from Socrata and convert the Address field to six new fields.
  • Create a Python script using csvkit to combine multiple sheets within an excel file. (Tom says I need to look at  for this.

NICAR day 2

TiL

  • I can teach Regular Expressions on a bus, in a bar, and in a way people can understand. There is some secret sauce:
    • Explain the goal
    • Use regex101.com
    • Use groups
    • Make it useful
  • importHTML(“url”,”table”,#) to pull tables into Google Spreadsheet.
    • I can do this in python
    • Or I can use Google Spreadsheets if I'm lucky
    • Or I can use import.io (might be best of those). (I didn't see this demoed, but I Want To explore it)
  • Analytics Add-in for Excel is awesome for:
    • Making histogram of data
    • Simple regressions: but do it in SPSS for realz
    • Simple correlations
  • Mapbox meh
  • Augie Armendariz is awesome. A kindred spirit. I want to know a 1/4 of what he does.
  • ogr2ogr might come in handy some day to convert shapefiles
  • The csvkit in2sql is freaking awesome. Sniffs the csv and creates an import statement.

IWT

  • Show the Data Analysis Add-in to my classes, and use it more.
  • Teach the Regex class at NICAR in Denver.
  • Use csvkit to wrangle property tax data
  • Explore import.io

NICAR DAY 3

TiL

  • There is some command-line mySQL that is cool, but I think csvkit is more awesome
  • You can reference the index of a field in a select statement in the group by and order by fields
SELECT name, job, year, sum(salary)
FROM myTable
GROUP BY 1,2,3

IWT

  • Review notes from command-line MySQL for goodies.
  • Look up using CASE in select statements. Liz did something interesting there I don't fully remember.
  • Look more into Silk.co as I didn't get see the demos, but it looks interesting.

NICAR DAY 4

TiL

  • Just-enough-Django was some kind of awesome.
  • We can easily publish our admins to our local network. It is in the f'n docs.
  • Ben Welsh is also a kindred spirit.

IWT

  • Rebuild homicides in Django?
  • Do Restaurants back end in Django?
  • The wine guide comes first, I think.
  • Bring a gaggle of UT-Austin Journalism students to #NICAR16 in Denver, so they can be inspired early in their careers.
CORRECTION: I had an embarrassing spell correction error when describing Meredith Broussard's enthusiasm. I've fixed it now.

Thursday, January 15, 2015

The CPS "Missed Signs" project

I haven't written in the ol' blog here in some time, and there has been LOTS going on. It all culminated this past week with the publication of Missed Signs, Fatal Consequences, an immersive story-telling, data-driven project by the Austin American-Statesman.

The project started with obtaining through public record requests abuse- and neglect-related child fatality reports from Child Protective Services. The documents, required since 2009, are only available as a PDF, and the content is not saved as data to be analyzed in any manner. Great law, but of little use if no one is using the reports.

Before I go any further, let it be known unto all the world that Andrew Chavez is the genius behind all the online development and design for this project, including that awesome data explorer. He's taken our online immersive template, rebuilt it with bionics and really made it sing. It's something we can build with and on into the future (including this weekend with another immersive project.)

The first thing we did (my contribution) was to create a Caspio database to collect certain fields from the 780 documents. Caspio is derided by many for good reason, but it is actually pretty good for this purpose ... to collect hand-entered information in a structured manner. (I would perform morally-suspect tasks for a JSON output direct from Caspio, though. Might even pay for it.) It would've been great if we could scrape the PDFs for our data, but the reporters were doing on-the-read analysis that couldn't be done programmatically. We also put all the documents in DocumentCloud to help with reporting (and for later use in our online data explorer and links within stories).

I made lots of changes to the forms as we went along, responding to requests and storylines found as the reporters read through the reports. We picked at it for a year, adding reports as they came out.

And then around summertime, the investigative team got serious. Stories were reported out by Andrea Ball and Eric Dexheimer, and they eventually went near full-time on the project. As stories gelled and sources were found, our visual folks Kelly West (video) and Laura Skelding (photography) started working their magic.

And Andrew. Wow. He created a template system that helped us wrangle all this mass of content (15 stories and about 100 images) into a clean, integrated, responsive, online masterpiece. I learned so much during this project just dabbling my little bit in the code ... I'm giddy with excitement about all the tech. Just love it.

What was that tech? Oh, gawd. Andrew should be listing all of it out, but I'll give it a go:
  • We used Node locally with all kinds of added helpers, starting with Bootstrap as framework. Handlebars for templates. Leaflet for maps.
  • Highcharts and underscore for visualizations in the story. DC.js for the data explorer.
  • Backbone to bring out data across all page, like with the child pop-ups where we reveal basic data and point to source documents when a child is mentioned.
  • Lots of grunt, including grunt-generator to bake out all our files into flat files. The finished project is entirely self-contained html/images/javascript with no server-side processing. It could run anywhere.

Friday, July 11, 2014

Statesman looking for a news applications developer

Edit: This is filed. @adchavez is rockin' it.

We're formalizing our news and data interactives work at the Statesman and we are looking for a developer to join Rob Villalpando and I to form a new News Interactives Team (or some other snazzy name we think up.)

If you read this story, and say, “Yep, that’s me” or “I want to make a difference like that,” then you are the kind of person I’m looking for. If you can buy into this philosophy and help us do some of the same type of work they do, then you are a good candidate. If you understand 80% of the the posts on that blog, then you are probably a really good candidate. If you understand it AND were already a regular reader of that blog, then OMG CALL ME!

I’m looking of someone who loves to create and to share what they know and what they learned today. We’ll soak it up. We also have a lot to offer, so you’ll grow and learn, too.

Here is the full job description: Web Applications Developer. The whole application process runs through that.

If you just learn more about the job and see if you are a good fit, you can reach out to me:

Christian McDonald
Data Editor
Austin American-Statesman
cmcdonald@statesman.com

Monday, June 23, 2014

The immersive story presentation


While the New York Times' Snowfall story may not have been the first immersive story presentation -- as I like to call them -- it was the first that caught all our attention, and that of the Pulitzer committee. Since then, there have been an avalanche of such projects (pun intended) and this past weekend, the Statesman added our own: Drugs follow Eagle Ford energy boom.

Eagle Ford drugs did not break any new ground. In fact, it was a baby step for us built upon the work of others, inspired by the Globe and Mail's Magnetic North feature. We had a desire to get into this space, a good story by one of my UT-Austin students, Michael Marks, and some excellent photography by Jay Janner.

It was a method for us to discover the challenges we might face doing more of this kind of storytelling. It's really just the first step. My aim as the editor for our interactive team is to tell stories in completely different ways than a "long story with a bunch of pictures." I personally hold NPR's T-shirt project as seminal inspiration. We'll get there. (I could really use some help ... come help me build something bad ass.)

So, here is what we used:

  • Foundation: This is the responsive design HTML/CSS framework we used to build Eagle Ford drug (and our XGames package and the Austin Homicide Project and some others. It gives us flexibility to focus more on the content than on how to present it. Our goal is to be mobile-first, and this code base allows us to do that quickly. I started with nothing on Tuesday and had everything finished by Friday, including some new development I hadn't done before.
  • Slick: A javascript plugin that allowed us to embed a swipeable? gallery within the story. I had a BUNCH of great photos that I wanted to include. Slick was one way to do this, but we'll keep experimenting with this an other JS plugins. This brought in quantity, but I didn't do that great a job in giving these multitude of great photos the play they deserved. We need BIGGER. Some kind of lightbox treatment or something. Also, we have some issues with Firefox on a desktop where the usability is a bit buggy.
  • The inset concept I lifted liberally from the aforementioned Globe and Mail presentation. I didn't see anything like this native in Foundation, so I looked at how they accomplished this and tried to build in that functionality to our Foundation framework. It worked for the most part, but there are some issues to work out with IE where our insets don't break the right margin of the text.
I used this project to build a template for future projects, but as you can see it still has some issues. I'm not sophisticated enough as a coder to share all this on Github, but we'll get there eventually. I'm sure there are plenty of other better places to start out there anyway.


Saturday, May 17, 2014

Dude, your KML is too long

My Google Fusion Tables challenge this week was a single shapefile/kml that had about 25 different polygons in the same layer, which when I converted it to KML it put them all within the same tag. Fusion tables would only show some of the shapes on the map, even though they all showed in the KML preview of the data row.

What the Fusion Tables map showed

What the KML preview from the data showed (which is correct)

So I put a question about it into the Fusion Tables API Users Google Group. Folks there explained there is a limitation with Fusion Tables (or maybe the Maps API) that shows only the 10 largest polygons if there are multiple in a Placemark in the KML file. They suggested I use “Singleparts to multipart” tool in QGIS to split the shapes.


Well, I tried that, but I got the following error and couldn't make it work not matter what I tried in that dialog.



So, I peaked at the KML file in a text editor. After all, it's really just an XML file, which is by nature structured, so I hoped I could figure out how it worked and split the file myself. I could see the and tags with very long, multiple tags.

Word wrap is off, but the white line is really, really long.


So I duplicated the tag and then pulled out enough tags to make sure I had no more than 10 within a specific  Luckily there wasn’t too much data to this shape, or that would’ve been a nightmare. The final KML looked like this:


Now there are three tags, each with no more than 10 polygons.




  © Blogger template 'A Click Apart' by Ourblogtemplates.com 2008

Back to TOP