Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Utilities

`io`, `dataarray`, `plot` — the connective tissue

UNEP
IMEO
MARS

Modules:

  • georeader/io.py (113 LOC) — NetCDF safe-open

  • georeader/dataarray.py (145 LOC) — xarray bridge

  • georeader/plot.py (336 LOC) — matplotlib helpers

Role: the connective tissue. Each module is small, focused, and invisible until you need it. Diagrams: none. These files don’t have ASCII art — they’re pure utility code.


1. io.py — NetCDF backend roulette

The whole module is one helper plus one URL-detector:

Why “safe” open exists

NetCDF has three on-disk formats (NetCDF3, NetCDF4/HDF5, NetCDF-Java) and xarray has three backends (scipy, h5netcdf, netcdf4) with overlapping but not identical support. Different sensor providers ship different formats, and a backend that handles one will fail on another with cryptic errors.

safe_open_netcdf cycles through engines in a sensible order until one works:

Input typeEngine order
Remote URL (OPeNDAP)netcdf4 only (the others don’t support remote)
Local file or file-like objecth5netcdfscipynetcdf4

h5netcdf is tried first for local files because it’s typically faster than netcdf4 and handles HDF5 well; scipy second because it’s small and self-contained; netcdf4 last because it’s the most comprehensive but heaviest dep.

Sharp edges

This is the function that makes the EMIT / PRISMA / EnMAP readers work cleanly across providers without users worrying about which library wraps which format.


2. dataarray.py — the xarray bridge

Four functions that translate between GeoTensor and xr.DataArray. This is the substrate seam between georeader and xr_toolz (the climate-side sibling library — see geotoolz.md §10).

The four functions

Why this matters

The whole point of having both geotoolz (RS substrate) and xr_toolz (climate substrate) as separate libraries is that georeader is the bridge. A workflow that reads with georeader, runs a climate analysis in xr_toolz, and writes back via georeader looks like:

gt = georeader.read_from_bounds(reader, bounds=AOI)
da = georeader.dataarray.toDataArray(gt)            # GeoTensor → DataArray
result_da = xr_toolz.detrend.RemoveClimatology(clim)(da)
result_gt = georeader.dataarray.fromDataArray(result_da)  # back to GeoTensor
georeader.save_cog(result_gt, "/out/result.tif")

Five lines, no metadata loss. The conversion is cheap (it’s just rebuilding coord arrays, no data copy if you’re careful with .values).

Sharp edges


3. plot.py — matplotlib helpers

Four functions for visualising geospatial data with matplotlib:

FunctionWhat it does
show(data, add_colorbar_next_to=False, ...)Display a GeoTensor / RasterioReader on an axis with proper extent + georeferencing
add_shape_to_plot(shape, ax=None, ...)Overlay Shapely geometries / GeoDataFrames
plot_segmentation_mask(mask, color_array=None, ...)Discrete-class raster with categorical colormap and legend
colorbar_next_to(im, ax, fig=None, ...)Attach a colorbar that doesn’t squeeze the main axis

show(data, ...)

The workhorse. Reads the GeoData’s transform and bounds, calls ax.imshow(data.values.transpose(1, 2, 0)) for RGB or ax.imshow(data.values) for single-band, sets the extent so axis ticks display in geographic coordinates, optionally adds a colorbar via colorbar_next_to.

Used in the README quickstart:

plot.show((gt_rgb / 3500).clip(0, 1))

The /3500 and .clip(0, 1) are because S2 reflectance values are stored as int16 with a scale factor of 10000, and a viewable RGB needs floats in [0, 1]. Note this still uses GeoTensor arithmetic — the clipped result has the same transform/CRS, so the axis ticks are correct geo coords.

add_shape_to_plot(shape, ax=None, ...)

Accept Polygon, MultiPolygon, list of geometries, or GeoDataFrame. Draws on the supplied axis or plt.gca(). Reprojects the geometry to the axis’s CRS if needed (via the crs_geometry= arg). The standard “draw the AOI on top of the imagery” pattern.

plot_segmentation_mask(mask, color_array=None, ...)

Class-label rasters need a categorical colormap and a legend, not a continuous colorbar. This function:

  1. Converts the integer label raster to RGB using color_array (or a sensible default palette).

  2. Adds a legend with one entry per class.

  3. Handles nodata transparency.

Useful for showing CNN segmentation outputs in notebooks.

colorbar_next_to(im, ax, fig=None, ...)

The “don’t squeeze the main axis” colorbar. Uses mpl_toolkits.axes_grid1.make_axes_locatable to allocate a thin axis to the right of the main one, sized in the same proportion as the figure. Standard matplotlib idiom that never quite works on the first try; this packages it.

Sharp edges


4. Why these three are grouped here

None of io, dataarray, plot carry the architectural weight of the modules in Parts I–IV. They’re small, single-purpose, and you reach for them when you need them rather than designing pipelines around them. Grouping into one chapter avoids three thin chapters with little to say.

The pattern they share: delegate to a well-maintained external library, paper over the sharp edges that come up in real RS workflows. safe_open_netcdf papers over the engine-format mismatch; to/fromDataArray papers over the xarray-vs-rasterio metadata convention gap; show papers over the matplotlib boilerplate for georeferenced display.


5. Connection to geotoolz

These three modules don’t have direct geotoolz operators wrapping them, but each shows up indirectly:

Next chapter: Sentinel-2 — the Sentinel-2 SAFE reader (1845 LOC, the largest single file in the package).