Loading Raster Data with ArcGIS

Written on 04 Dec 2011

For those stuck in the land of ESRI ArcGIS 9.3.1 and required to support a closed source platform, management of raster data can sometimes be a little frustrating. There are a plethora of tools to make the loading of vector based GMLs a lot easier, but ultimately (if you do not license Image Server or use MapServer) your hands are quite tied when you need to load and re-project a large volume of raster or orthophotographic data.

ArcGIS 9.3.1 seems to have a fundamental limitation in that it cannot re-project unmanaged raster data. This means that creating a simple unmanaged file geodatabase (where the entries in the database are merely links to the file system where your TIFFs/ECWs sit) is effectively useless unless you want to use the files in their native projection.

Reprojecting Raster Data in ArcGIS 9.3.1

Recently I have found the best way of storing and projecting raster data appears to be loading it in to a managed raster dataset in a file geodatabase and then applying on-the-fly reprojections.

Unfortunately this then presents a new problem, as loading (or mosaicing) a large tile in to a raster dataset via ArcCatalog is slow and unreliable at the best of times. Also, you need to have plenty of free disk space on hand to make this work.

Loading Raster Data in Parallel

My best solution to this problem so far is to parallelize the loading of the rasters so you are not just pushing 100 image tiles into a very slow queue, but you are pushing 100 image tiles into 10 different queues, whereby your overall load times should be 10% of what they would be if you just ran a straight forward load into a single raster dataset.

Fortunately my rambling here is not just theoretical, as I have recently put this in to practice. Whilst I am fully aware that 9.3.1 is well behind the curve of cutting edge ESRI product, I believe that the Python scripting support is a gift from the GIS gods.

Parallel Loading Process

You can follow these steps to set up and run a parallel process flow. I am using the conveniently round number of 100 total raster tiles:

Using the Python scripting interface, write a loop to create 10 raster datasets in your file geodatabase. Be sure to pass in your interpolation method (nearest neighbour for rasters, bilinear or cubic for orthos).
Generate a list of all the tiles you wish to load an split this in to 10 smaller lists. I have stored these lists in 10 simple text files named accordingly.
Write a script which takes in the name of the raster dataset, the location of the file GDB and a reference to the text file listing 10 of the rasters. It would be wise to put in some form of logging so you can skip tiles that have already been loaded into this specific dataset.
Using Python’s subprocess module, loop over and and run 10 separate instances of the above script, sequentially moving through the 10 file lists.

After this is done you should have 10 Python process churning away in parallel, and they should take roughly a similar time to load in all their tiles. After these processes have finished, you can either mosaic the 10 rasters in to 1 very large raster dataset or, more simply, create a layer which references all 10 raster datasets.

Whilst there is obviously a bit of overhead in setting these scripts up, they are very much reusable and you will save a lot of time compared to trying to load 100 tiles in to a single dataset.