Using non-uniformity to tile a large elevation dataset.

The Mapzen dataset (which is a big inspiration for this project) is split up into 1° squares, each with the same resolution and projection. Pretty much all elevation datasets take this approach.

Tiling is a great.

  • It makes it easy to download part of the dataset to cover a particular area of interest.
  • It’s good for read performance: I find very large rasters take longer to read by GDAL.
  • It generally make the dataset easier to work with: it can be split over storage devices, downloads can be resumed, single files can be loaded into memory.

Tiling especially makes sense for applications like Mapzen and Google Maps where they also use/sell a map tile product where tiles are produced for different zoom levels.

But I’m indenting gpxz to be focussed on analytical usecases as opposed to map tiles so it’s worth considering if there’s a better option. I think tiling is still important for the reasons above, but I’m considering a design without a fixed resolution and projection:

  • The dataset is still spit into tiles each covering a 0.25° WGS-84 square area.
  • Each tile could have a different resolution. This helps keep size down while preserving high-resolution datasets: you could have 1m resolution for tiles over the UK but 2km resolution over oceans.
  • Each tile could have a different coordinate system (that of the highest-quality dataset in that tile). This means you don’t need to interpolate the source datasets just to get them into a common CRS, and a coordinate system can be chosen which best represents the local area.

Tools like gdal abstract away most complexities of projection transforms so this system would work transparently for many usecases, and it can always be reprojected by the user. There are some drawbacks though:

  • GDAL’s virtual VRT files don’t have full support for projection and resolution differences, and VRTs are a really useful tool I reach for a lot.
  • It could introduce artefacts at the seams between tiles in different projections/resolutions, or at least make it harder to evaluate merge seams.
  • It’s important to include a buffer around elevation tiles, but when the tiles use different transforms this creates multiple different elevation values for the same location in different tiles (only in the buffer area though).
  • Tiles might have to be further buffered to cover a square in epsg:4316, adding more discrepancies and redundant space usage.

Non-uniform tiles are going to be good at least while I’m building and investigating the dataset. I think they could work well for the final distribution too, but worst case I could always reproject.