How LERC raster compression works

3 November 2021

What is LERC compression?

LERC (limited error raster compression) is a format for compressing raster images.

If you’re using GDAL version 2.4 or later, you can LERC compress a raster just like you would with Deflate:

gdal_translate -co COMPRESS=LERC -co MAX_Z_ERROR=0.01 raster.tif raster.compressed.tif

LERC is a lossy compression algorithm: it is able to achieve very high compression ratios, but the data is modified slightly in the process. Unlike subjective lossy compression methods like jpeg, LERC provides a guarantee: the error in a pixel after compression won’t exceed MAX_Z_ERROR.

As we’ll see later, LERC-compressed data has a number of additional properties that make it great for scientific analysis.

How LERC compression works

I was surprised to see that the LERC compression algorithm is fairly grokkable, so I’m including a section here on how it works. If you’re not interested, skip straight to Properties of LERC-compressed data

tldr: data is rounded to the nearest 2 * MAX_Z_ERROR, then losslessly compressed.

Imagine we have this 3x4 raster we want to compress with a maximum error of 0.1

35.254	35.254	41.039	50.369
35.254	35.254	40.837	48.253
29.836	35.254	39.598	46.181

Original data \(x\).

The first step in the LERC algorithm is to rescale each pixel \(x_i\) by a factor of \(2 \cdot \mathrm{maxError}\) and shift it to start at 0.

\[s_i = \frac{x_i - \mathrm{min}(x)}{2 \cdot \mathrm{maxError}} = \frac{x_i - 29.836}{0.2}\]

27.090	27.090	56.015	102.665
27.090	27.090	55.005	92.085
0.000	27.090	48.810	81.725

Scaled data \(s\).

Next, each scaled pixel \(s_i\) is rounded to the nearest integer \(r_i\). This is the only non-reversible step of the algorithm, and it’s where loss/error is introduced.

Because the original data was scaled by \(2 * \mathrm{maxError}\), the rounding in this step will modify the data by a maximum of \(\mathrm{maxError}\) in either direction (in the unscaled units), preserving the bounded loss guarantee!

\[r_i = \lfloor s_i + 0.5 \rfloor\]

27	27	56	103
27	27	55	92
0	27	49	82

Rounded data \(r\).

So far all we’ve done is convert our data from floating point numbers to integers. The last step is to actually compress those integers.

The integers produced by the lossy scaling process are usually smaller than the maximum 32 bit integer 4,294,967,295, so compression is done by using the fewest possible bits to represent the highest scaled integer:

\[\mathrm{nBits} = \lceil \log_2 (\max(r_i)) \rceil = \lceil \log_2 (103) \rceil = 7\]

Our example data can be represented as 7 bit integers instead of 32 bit floats, for a compression ratio of \(\frac{7}{32} = 0.2\), which is much better than standard lossless algorithms typically achieve!

0011011	0011011	0111000	1100111
0011011	0011011	0110111	1011100
0000000	0011011	0110001	1010010

7-bit representation of rounded data \(r\).

To decompress the data, you need to know \(\mathrm{\min}(x)\), \(\mathrm{maxError}\) and \(\mathrm{nBits}\) (all of which are stored in a header by LERC) and invert the scaling:

\[\bar{x}_i = 2 \cdot \mathrm{maxError} \cdot r_i + \min{x}\]

35.136	35.136	40.936	50.336
35.136	35.136	40.736	48.136
29.736	35.136	39.536	46.136

Data after a compress-decompress cycle \(\bar{x}\).

You can see that the new data values differ slightly from the original ones, but by less than 0.1. Here’s the errors:

0.018	0.018	0.003	-0.067
0.018	0.018	0.001	0.017
0.	0.018	-0.038	-0.055

Compression error \((x - \bar{x})\).

How LERC compression actually works

I made a few simplifications in outlining the algorithm, here are the additional details:

If \(\mathrm{nBits}\) is more than the number of bits used to store the original data, the data is stored uncompressed.
If \(\mathrm{maxError}\) is 0, the data is stored uncompressed.
Null values are removed before compression. A binary mask is built indicating which pixels are null, then this mask is compressed using Run-length encoding.
Raster file formats like .geotiff split data into blocks/tiles/strips. LERC compresses these blocks independently: each gets its own header, null mask, and compressed data section.

Properties of LERC-compressed data

LERC errors are deterministic: all pixels with a value \(z\) will have the same value \(\bar{z} = x + \mathrm{err}(z)\) after compression. This means that areas of constant value (like lakes in an elevation dataset) will still have constant value after compression.
- This only applies within a tiff block. You may want to increase block size larger than the scale of constant-value features.
When compressing random noise, errors are uniformly distributed between \(-\mathrm{maxErr}\) and \(\mathrm{maxErr}\).
- Large areas of constant value may skew this distribution.
Errors are introduced by rounding, so two values that differ by less than \(2 \cdot \mathrm{maxErr}\) will either have the same value or be exactly \(2 \cdot \mathrm{maxErr}\) apart.
LERC skips compression if it would increase size for that tile, so it’s fine to use opportunistically with a very small \(\mathrm{maxErr}\).
LERC builds a compressed mask for NODATA values before compressing the valid values, so LERC works well with sparse data.
LERC’s compression ratio depends on the range (max - min) of values. When compressing data that has large macro variations compared to local noise, a tiled raster will compress better.
LERC compression is stable: repeated compression with the same \(\mathrm{maxErr}\) value won’t change the values after the first compression.
There is no error on the smallest value in the data. So if you have a land-only DEM with a min value of 0 denoting water, it will still be 0 after LERC compression.

Thanks to Ákos Halmai for pointing out a mathematical mistake in this post (now fixed)!