Rasterizing Mapnik OpenStreetMap into ArcGIS

Written on 01 May 2012

In the struggle to get Mapnik styled OpenStreetMap data in to ArcGIS, I ended up settling on a fairly convoluted process that leaves us with raster datasets instead of nicely symbolized OSM vector data.

Long story short, the ESRI styling/symbology engines are vastly inferior when stacked up next to Mapnik; attempting to get OSM vector data styled in ArcMap, in a way that users will recognise it, is nigh on impossible.

My solution? Pre rendering Mapnik tiles, georeferencing them and loaded them in to an ArcGIS Raster Dataset.

Steps to rasterizing OSM into ArcGIS

Setup a PostGIS Mapnik & Open Street Map environment:

Mapnik Installation

Mapnik and PostGIS
Download a Planet-sized OSM data file or a specific country file from the likes of Geofrabrik or Cloudmade; I personally opted for the compressed OSMs and not the protobuf PBFs, as I feel the tools for parsing raw OSM XML are a bit more mature
Load your OSM data files into PostGIS using osm2pgsql (or Osmosis if that’s more your cup of tea)
Use the generate_tiles.py script (from OpenStreetMap) to generate 256x256 tiles at the relevant zoom levels; this will take some time depending on your hardware
Write a small utility script (probably in Python) to georeference the tiles from the above step. You will need to create a PGW file for each PNG by calculating the coordinates based on the tile name and folder structure. There is plenty of sample code over here. You may need to play around with the PNG output formats from Mapnik. It is worth running a few samples to double check the file sizes of empty tiles so you can elect to ignore them.

I chose to copy georeferenced tiles to another directory tree, ignoring the empty ones. This helps reduce number the number of files.

Loading tiles into ArcGIS

Now you are faced with a few options. You could load all your georeferenced tiles in to an ArcGIS Raster Catalog or a Raster Dataset. However, you will be loading tens of thousands of tiles (especially given they are 256s).

The other option is to use the gdal_merge.py utility from the GDAL tool chain to merge tiles together. This is also a very time consuming process once you get down to zoom level 15 and below.

I opted to create very tall tile images with GDAL by stacking tiles based on the Y coordinate. You can easily end up with hundreds of enormous tiles to load into ArcGIS, but ultimately gives you far fewer image files to move around (if you need to do that).

I have yet to benchmark which is a fastest/most efficient route, but I am starting to think that a direct load into a Raster Dataset would have been better a better choice. I’ll report back with any observations.

Once you’ve loaded the tiles, you’ll want to build some Billinear pyramids in the dataset to get any kind of drawing performance.

Et Voila!

You should now have some nicely rendered Mapnik datasets in ArcMap/GIS; something the ESRI styling engine could not achieve. Bear in mind that with any significant GIS processing task, the more hardware you can throw at it, the better. I would also recommened doing this on a per territory basis (unless you’re doing the world), as generate_tiles.py will take a long time generating empty tiles for nodata areas.

There is also a nice workflow opportunity to programatically create the target raster datasets (if you are doing multiple zoom levels) and loading the data in to the relevant dataset. It would also be worth creating a generic layer file which references standardized data sources for each zoom level so you can easily create new layers in your mapping document.