Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Abstract reader

The type protocols (`GeoData`, `GeoDataBase`)

UNEP
IMEO
MARS

Module: georeader/abstract_reader.py (257 LOC) Role: the duck-typing contract that lets GeoTensor (in-memory) and RasterioReader (lazy on-disk) be passed interchangeably to every function in the package.


1. The one-line idea

Anything with transform, crs, shape is GeoDataBase. Anything that additionally knows how to materialise its data (values, load(), read_from_window()) is GeoData (alias AbstractGeoData). Most of georeader.read, georeader.window_utils, georeader.mosaic etc. type-annotate against these protocols, so the same function body works on either substrate.

This is the seam that the Reader reconciliation design wants to widen — make RasterioReader and AsyncGeoTIFFReader (and any future sensor-specific or raw-byte reader) all honour the same protocol, then user code does reader_class=... strategy injection.


2. The type hierarchy

┌─────────────────────────────────────────────────────────────────────────┐
│                    GEOREADER TYPE HIERARCHY                              │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  GeoDataBase (Protocol)           Minimal interface for geospatial data │
│  ├── transform: Affine            Pixel → coordinate mapping            │
│  ├── crs: Any                     Coordinate reference system           │
│  └── shape: Tuple                 (C, H, W) or (H, W) dimensions        │
│       │                                                                  │
│       ▼                                                                  │
│  AbstractGeoData (Protocol)       Adds read capabilities                │
│  ├── values: ndarray              Array data                            │
│  ├── fill_value_default           Nodata value                          │
│  └── load(): GeoTensor            Read all data                         │
│       │                                                                  │
│       ├──────────────────────┬──────────────────────┐                   │
│       ▼                      ▼                      ▼                   │
│  RasterioReader         GeoTensor              Custom Readers           │
│  (Lazy file access)     (In-memory)           (User-defined)           │
│                                                                          │
│  GeoData = Union[AbstractGeoData, GeoTensor]  ← Common type alias       │
└─────────────────────────────────────────────────────────────────────────┘

Two splits to notice:

In the file, AbstractGeoData = GeoData is set as a back-compat alias — older code refers to AbstractGeoData; new code should prefer GeoData.


3. The protocol contract

What you must supply to be a GeoData:

┌─────────────────────────────────────────────────────────────────────────┐
│                    REQUIRED PROPERTIES                                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Property              Type                  Description                 │
│  ──────────           ──────                ───────────                 │
│  transform            rasterio.Affine       6-element affine matrix     │
│  crs                  Any (CRS-like)        EPSG code, WKT, or CRS obj  │
│  shape                Tuple[int, ...]       (C, H, W) or (H, W)         │
│  values               ndarray               Raster data array           │
│  fill_value_default   number                Nodata/fill value           │
│                                                                          │
│  Required Methods:                                                       │
│  ─────────────────                                                       │
│  load() → GeoTensor   Read all data into memory                         │
│                                                                          │
│  Derived Properties (computed from above):                              │
│  ──────────────────────────────────────────                             │
│  width                shape[-1]             Number of columns           │
│  height               shape[-2]             Number of rows              │
│  bounds               From transform+shape  (minx, miny, maxx, maxy)    │
│  res                  From transform        (xres, yres) pixel size     │
│  footprint            Polygon               Bounding polygon in CRS     │
└─────────────────────────────────────────────────────────────────────────┘

GeoData provides default implementations for the derived properties on top of the required five — so a custom reader only has to wire up the required ones, and gets bounds / res / footprint for free.

The base class deliberately raises NotImplementedError for dtype, dims, and fill_value_default rather than inventing a default — these are sensor-specific and any subclass that ignores them is going to bite users elsewhere.


4. Reading the source

The whole file is short enough to read in one sitting. Key landmarks:


5. FakeGeoData — describing a grid without owning one

fake = FakeGeoData(
    crs="EPSG:4326",
    transform=Affine.translation(-122.5, 37.5) * Affine.scale(0.001, -0.001),
    shape=(3, 1000, 1000),
)

Why this exists:

  1. Pre-flight calculations. window_from_bounds(fake, bounds, crs_bounds) lets you ask “if I had a raster with this transform and shape, what window would my AOI be?” — before opening any file.

  2. Designing output grids. When mosaicking or reprojecting, you build a FakeGeoData describing the target grid, then pass it to read.read_reproject_like(src, dst=fake).

  3. Testing. Cheap fixtures for window-math tests.

shape is Optional[Tuple[int, ...]] — if you only need transform + crs for a coordinate calc, you can leave it as None. width and height then raise ValueError("Shape is not defined") if accessed, which is correct: you said you didn’t have one.


6. same_extent — the equality predicate

def same_extent(geo1: GeoData, geo2: GeoData, precision: float = 1e-3) -> bool:
    return (
        geo1.transform.almost_equals(geo2.transform, precision=precision)
        and window_utils.compare_crs(geo1.crs, geo2.crs)
        and (geo1.shape[-2:] == geo2.shape[-2:])
    )

Three observations on the design:

Used by GeoTensor.__add__ etc. to refuse gt1 + gt2 when extents disagree (see Chapter 1 §12).


7. How this protocol shows up downstream

Almost every public function in georeader.read and georeader.window_utils is annotated data: GeoData or data: GeoDataBase. Concretely:

Caller-side typeWhat you typically pass
GeoDataBaseFakeGeoData, GeoTensor, RasterioReader — anything with the three structural attrs
GeoData / AbstractGeoDataGeoTensor, RasterioReader, custom readers — anything that can materialise
GeoTensor (concrete)only when an in-memory ndarray is genuinely required

This means most of your geotoolz operators should be typed data: GeoData (not GeoTensor) so they accept lazy readers transparently — the load() happens once, inside the operator, and is cheap when the operator is the leaf.


8. Sharp edges


9. The shape of a custom reader (what’s in the next chapter)

The next chapter unpacks RasterioReader — the canonical GeoData subclass. Reading it is the fastest way to understand what a real implementation of this protocol looks like and what conveniences (.isel, .read_from_window, multi-file stacks, overviews, VSI cloud paths) sit on top of the bare contract above.

Next chapter: RasterioReader.