Tuesday, July 02, 2013

Getting more U.S. Census cities into Tableau

I've been working on a visualization about commuting populations and came across an issue getting cities to map in Tableau. My data included 381 "places" (cities and other similar municipal entities) in Texas, and the "City" geographic role that is baked into Tableau would only match with certainty about 150 of them by their name alone.

In the past, I've used Census "places" shapefiles in Google Fusion Table maps, and that would have served me well for a map alone, but I really wanted to pair this map with some bar charts that would work together.

So I had to find my own latitude and longitude for my Texas places, and while this worked, it took sufficient enough effort that it's worth explaining here. This isn't the same a having geographic boundaries for each city, but since I wanted to do a bubble map, this wasn't a problem.

The first challenge was to find a list of cities and their lat/longs, preferrably from a U.S. Census source so I could match with my data. The Census delivered with the 2010 Census Gazetteer Files. I found I could get every Census "place" in Texas, along with the 2010 population and lat/long for each one, all wrapped in a cute tab-delimited bow. Booya!

Now, I just had to merge that lat/long from my Gazetteer file into my commuting data file. Might seem easy enough, but there were 1752 records in the Gazetteer file compared to my 381 in my commuting data. A manual sort and copy/paste wasn't going to do it.

Microsoft Access to the rescue. I used a common field in both tables and join them, and then created a query with the fields I needed from both tables. This is the subtle power of Access that you can't get with Excel.

First, let take a look the important parts of my original commuter data:


I know from working with Census data that the "State" field -- the two two-digit value 48 -- and the "Place" field is part of a FIPS code designation used in almost all Census data. This is a good thing.

Now let's look at the important parts of the Gazetteer data:


The "GEOID" is indeed analogous to the "State" and "Place" fields in my commuter data. (I also have here the "INTPTLAT" and "INTPTLONG" fields, which are latitude and longitude values.) I used a simple string formula in a new column to combine my "State" and "Place" fields:

=B2&C2

That concatenates the two fields. "48" and "01000" become "4801000" and match my GEOID from the Gazeteer data. I also called it GEOID.

I did some further cleanup to my original file, getting rid of "city", "town" and similar suffixes from the "Place Name" field. I imported the two tables into Access, naming them CommuteList and PlaceList.

I write my Access queries in the SQL View because that's how I learned. (Someone needs to teach me the Design View method.) A simplified version is this:

SELECT CommuteList.PlaceName, CommuteList.PopEst, CommuteList.WorkEst, PlaceList.Latitude, PlaceList.Longitude
FROM CommuteList, PlaceList
WHERE CommuteList.GEOID = PlaceList.GEOID;

Looking at the WHERE line first, I'm joining the two tables by their GEOID fields. Using this methods, I'm only going to get results where the same GEOID exists in both tables. The SELECT line pulls the specific fields I need from each table.


Now each place has it's own latitude and longitude, so I can export this query as a new Excel spreadsheet and then use this to build my Tableau map and I'm assured all my cities will be included.

I'm still working on the visualization and it will publish later this week, and I'll update this entry when it does.

Post updated with link: Interactive: Commuter-adjusted populations in Texas.

1 comments:

Ben Jones 10:03 PM  

This is a super helpful tip, especially for local and hyperlocal data stories. Thanks for sharing.

  © Blogger template 'A Click Apart' by Ourblogtemplates.com 2008

Back to TOP