Sampling geotiffs with zarr and tifffile

To sample multiple points from an array in Open Topo Data, I use rasterio something like this:

z = []
with rasterio.open(raster_path) as f:
    for row, col in points:
        window = rasterio.windows.Window(col, row, 1, 1)
        z_array = fh.read(
            indexes=1,
            window=window,
        )
        z.append(z_array[0][0])

Rasterio (via gdal) is well tested and supports a huge range of raster formats and several interpolation schemes. But the reading is faster with zarr and tifffile:

f = tifffile.imread(path, aszarr=True)
za = zarr.open(f, mode="r")
z = za.get_coordinate_selection((rows, cols))
f.close()

For non-integer rows and cols, you’ll need to do interpolation yourself. Nearest-neighbour interpolation can be done by rounding the rows and cols:

f = tifffile.imread(path, aszarr=True)
za = zarr.open(f, mode="r")
rows = np.round(rows).astype(int)
cols = np.round(cols).astype(int)
z = za.get_coordinate_selection((rows, cols))
f.close()

while bilinear interpolation needs a bit more numpy:

def bilinear_interpolation(squares, rows, cols):
    # Check input shapes.
    n_points = len(rows)
    assert squares.shape[0] == n_points
    assert squares.shape[1] == 2
    assert squares.shape[2] == 2

    # Use fractional rows/cols to simplify formula.
    r = rows % 1
    c = cols % 1

    # Do interpolation.
    f = np.asarray(squares)
    bl = np.zeros(n_points, dtype=squares.dtype)
    bl += f[:, 0, 0] * (1 - r) * (1 - c)
    bl += f[:, 1, 0] * r * (1 - c)
    bl += f[:, 0, 1] * (1 - r) * c
    bl += f[:, 1, 1] * r * c
    return bl


def buffer_indices_to_squares(rows, cols):
    r_squares = []
    c_squares = []
    for row, col in zip(rows, cols):
        r = int(row)
        c = int(col)
        r_squares.append(np.array([[r, r], [r + 1, r + 1]]))
        c_squares.append(np.array([[c, c + 1], [c, c + 1]]))
    r_squares = np.array(r_squares)
    c_squares = np.array(c_squares)
    return r_squares, c_squares

r_squares, c_squares = buffer_indices_to_squares(rows, cols)
r_flat = r_squares.flatten()
c_flat = c_squares.flatten()
squares_flat = za.get_coordinate_selection((r_flat, c_flat))
squares = squares_flat.reshape(r_squares.shape)
z = bilinear_interpolation(squares, rows, cols)
f.close()

zarr adds a little latency for single-point samples, but for multiple points performance is much better: I suspect it caches tiff blocks better.

Points timings