Data source to map Zip codes to Latitude and Longitude

[Update: For R users, I have since bundled this database into an R package, ‘zipcode’, now available on CRAN.]

When I need positions for zip codes, I use the “CivicSpace US ZIP Code Database by Schuyler Erle, August 2004”. I first found it thanks to Tom Boutell’s site (http://www.boutell.com/zipcodes/).

According to the README, it contains “over 98% of the ZIP Codes in current use in the United States” as of 2004. The ZIP file includes the data in CSV and a PostGIS-friendly SQL definition file. Schuyler Erle co-authored O’Reilly’s excellent Mapping Hacks, so the zipcode.zip file is also now mirrored on the Mapping Hacks website (http://mappinghacks.com/data/).

In addition to latitude and longitude, the data include city and state name and time zone:

"zip","city","state","latitude","longitude","timezone","dst"
"00210","Portsmouth","NH","43.005895","-71.013202","-5","1"
"00211","Portsmouth","NH","43.005895","-71.013202","-5","1"
"00212","Portsmouth","NH","43.005895","-71.013202","-5","1"
[...]
"99928","Ward Cove","AK","55.395359","-131.67537","-9","1"
"99929","Wrangell","AK","56.409507","-132.33822","-9","1"
"99950","Ketchikan","AK","55.875767","-131.46633","-9","1"

The database is based on the 1999-2000 U.S. Census Gazetteer files. While the ZIP Code Tabulation Areas fixed-width ASCII file lacks niceties like place names and time zone info, it does contain some basic population and geographic statistics:

  • Columns 1-2: United States Postal Service State Abbreviation
  • Columns 3-66: Name (e.g. 35004 5-Digit ZCTA – there are no post office names)
  • Columns 67-75: Total Population (2000)
  • Columns 76-84: Total Housing Units (2000)
  • Columns 85-98: Land Area (square meters) – Created for statistical purposes only.
  • Columns 99-112: Water Area (square meters) – Created for statistical purposes only.
  • Columns 113-124: Land Area (square miles) – Created for statistical purposes only.
  • Columns 125-136: Water Area (square miles) – Created for statistical purposes only.
  • Columns 137-146: Latitude (decimal degrees) First character is blank or “-” denoting North or South latitude respectively
  • Columns 147-157: Longitude (decimal degrees) First character is blank or “-” denoting East or West longitude respectively

The clincher for me is that the CivicSpace database contains nearly 10,000 more entries that the base Census file:

$ wc -l zipcode.csv
43205 zipcode.csv
$ wc -l zcta5.txt
33233 zcta5.txt