My recent post Nightlights: cool data, bad geocoding highlighted some of the geocoding challenges Steve Mosher has been finding as he works with this interesting “light pollution” data set.
It was also my first article reposted on Tal Galili’s fantastic R-Bloggers site which I have been following for a while. But even better than the surge of new visitors were the great comments and suggestions posted by members of the community. In this post, I’m going to walk through each suggestion to illustrate just how generous and helpful this community can be.
Our starting point is where we ended up in my first post, using ggplot2 to display the raster nightlights data and map overlay:
library(RCurl) library(R.utils) library(rgdal) library(raster) url_radianceCalibrated = "ftp://ftp.ngdc.noaa.gov/DMSP/web_data/x_rad_cal/rad_cal.tar" calibratedLights = "rad_cal.tar" hiResTif = "world_avg.tif" download.file(url_radianceCalibrated,calibratedLights,mode="wb") untar(calibratedLights) gunzip( paste(hiResTif, '.gz', sep='') ) hiResLights = raster( hiResTif ) # Eastern Mass., Cape Cod & Islands: e = extent(-71.5, -69.5, 41, 43) r = crop(hiResLights,e) p = rasterToPoints(r) df = data.frame(p) colnames(df) = c("lon", "lat", "dn") g = ggplot( data=df) + geom_point(aes(x=lon, y=lat, color=dn) ) g = g + scale_colour_gradient(low="black", high="white", trans="sqrt") g = g + theme_bw() + xlim(c(-71.5,-69.5)) + ylim(c(41,43)) g = g + opts( axis.text.x = theme_blank(), axis.text.y = theme_blank() ) g = g + borders("state", colour="yellow", alpha=0.5)
Note the mismatch between the data and map overlay and the weirdness in the map where points are missing on the North Shore:
Ben Bolker suggested a way to eliminate the artifacts which led me to this discussion on R-sig-Geo between Hadley Wickham and Paul Hiemstra which tipped me off to the existence of geom_path
layer in addition to the geom_polygon
layer which borders()
usually produces. Polygons are closed but paths need not be, so that helps. And ggplot2’s map_data()
function seems to grab the same data as borders()
:
b = map_data("state") g = ggplot( data=df) + geom_point(aes(x=lon, y=lat, color=dn) ) g = g + scale_colour_gradient(low="black", high="white", trans="sqrt") g = g + theme_bw() + xlim(c(-71.5,-69.5)) + ylim(c(41,43)) g = g + opts( axis.text.x = theme_blank(), axis.text.y = theme_blank() ) g = g + geom_path(data=b, aes(x=long,y=lat,group=group), colour="yellow", alpha=0.5)
Bonus: geom_path()
obeys the “alpha=0.5
” directive to set the transparency:
But Robert Hijmans really hit it out the park with two great suggestions. First, he pointed me towards a much, much better source of coastline data by using raster’s getData()
function to grab data from the GADM database of Global Administrative Areas:
usa = getData('GADM', country="USA", level=0)
Level 0 will get you country boundaries, Level 1 for state/province, and so on. So we’ll lose state boundaries, but these files are pretty big to start with and can take a lot longer to plot.
Also, be warned: apparently somebody sinned against The Church of GNU, so you may need to run gpclibPermit()
manually before running fortify()
on the SpatialPolygonsDataFrame
:
> f_usa = fortify(usa) Using GADMID to define regions. Note: polygon geometry computations in maptools depend on the package gpclib, which has a restricted licence. It is disabled by default; to enable gpclib, type gpclibPermit() Checking rgeos availability as gpclib substitute: FALSE Error: isTRUE(gpclibPermitStatus()) is not TRUE > gpclibPermit() [1] TRUE
With that hoop cleared, we can fortify()
and plot this new layer:
f_usa = fortify(usa) g = ggplot( data=df) + geom_point(aes(x=lon, y=lat, color=dn) ) g = g + scale_colour_gradient(low="black", high="white", trans="sqrt") g = g + theme_bw() + xlim(c(-71.5,-69.5)) + ylim(c(41,43)) g = g + opts( axis.text.x = theme_blank(), axis.text.y = theme_blank() ) g = g + geom_path(data=f_usa, aes(x=long,y=lat,group=group), colour="yellow", alpha=0.5)
The coordinates are still shifted, but—wow—what a beautiful coast line. Cape Ann on the North Shore is really there now:
Robert also points out an important mismatch in that GDAL returns the top left corner and resolution, so we could be off by a pixel or so. A quick call to xmax()
and ymax()
will fix this in our original raster:
xmax(hiResLights) = 180 ymin(hiResLights) = -90 r = crop(hiResLights,e) p = rasterToPoints(r) df = data.frame(p) colnames(df) = c("lon", "lat", "dn") g = ggplot( data=df) + geom_point(aes(x=lon, y=lat, color=dn) ) g = g + scale_colour_gradient(low="black", high="white", trans="sqrt") g = g + theme_bw() + xlim(c(-71.5,-69.5)) + ylim(c(41,43)) g = g + opts( axis.text.x = theme_blank(), axis.text.y = theme_blank() ) g = g + geom_path(data=f_usa, aes(x=long,y=lat,group=group), colour="yellow", alpha=0.5)
Hey, that’s not bad at all:
Looking at the final version leads me to wonder how much of the geocoding problem is position, and how much is resolution/blurring/smearing. The lights of Provincetown, for instance, look pretty good. Maybe the blob is too north by a few pixels, but at least it’s well contained by land now. On Nantucket, the blur is half in the harbor. Then again, on Nantucket, most of the lights are right on the harbor, from the ferry terminal and running east to main street. So the lights are just about where they should be. Perhaps they’re just blurred and therefore spill into the harbor?
But the real point of the post is to highlight the generosity of this community. For that, thanks. And again: welcome R-Bloggers readers!
October 23, 2010 at 3:29 AM
What a kind (and cool) post you wrote, you just made my day! 🙂
Good luck in your future posts.
Best,
Tal
October 26, 2010 at 9:11 AM
Hi Tal:
Thanks for the kind words.
Jeffrey
October 23, 2010 at 5:39 PM
Wow,
Robert is a life saver. That shift in nightlights has had me and a couple other guys stumped for a while. After looking at your stuff I wrote him and tried to explain the problem and one simple cal to GDALinfo explained it all.
I’ve contacted the agency responsible for the data, we will see if they fix the metadata in the file. Anyway’s I’m starting to look at ggplot2 .
October 26, 2010 at 9:11 AM
Hi Steven:
Robert Hijmans is the man, indeed. (He and Jacob van Etten wrote the raster package, so I guess we shouldn’t be surprised… 🙂
I am glad I could help push things forward.
Jeffrey
October 23, 2010 at 5:49 PM
Oh, on the position/intensity of the lights,
1. the date of the image is circa 1995 ish
2. the true resolution is 2.7km which was splatted to 1km
3. U might want to transform it into an actual radiance as opposed to the DN number. The file conatins the DN number which squashes the dynamic range a bit.
October 27, 2010 at 10:29 AM
Very nice post!
However after trying the code here on R 2.12 on Mac OSx 10.6.4 and installing all the required packages. I am getting the following error after running the first ggplot statement
> g = ggplot( data=df) + geom_point(aes(x=lon, y=lat, color=dn) )
> g = g + scale_colour_gradient(low=”black”, high=”white”, trans=”sqrt”)
> g = g + theme_bw() + xlim(c(-71.5,-69.5)) + ylim(c(41,43))
> g = g + opts( axis.text.x = theme_blank(), axis.text.y = theme_blank() )
> g = g + borders(“state”, colour=”yellow”, alpha=0.5)
Error in fortify(map(map, region, plot = FALSE, fill = TRUE)) :
could not find function “map”
Are there other packages required? Or is there a problem with the installation of spatstat on Mac (I did obtain a lot of warnings when installing it).
The rgdal package I have successfully installed via http://www.compmath.com/blog/2010/07/installing-package-on-mac-os-x/
Thanks for your excellent post,Jan
October 29, 2010 at 4:57 AM
Owkay! I solved it…it took a while to install the packages but now:
After adding
require(maps)
I all started working. 🙂 very nicely.
Only this link did not work via R
download.file(url_radianceCalibrated,calibratedLights,mode=”wb”)
so I downloaded it myself.
Any advice is welcome.
Cheers,
Jan
November 12, 2010 at 5:43 PM
Hi Jan:
Sorry for the delay, and thanks for the kind words.
download.file() should be in the ‘utils’ package — sorry if I left out the library import.
Jeffrey