Geostack — composable operators on real RS imagery

Geostack worked examples¶

A self-contained, chronologically ordered notebook walkthrough of the pipekit + geotoolz + geopatcher stack. Each notebook is a single self-contained slice that builds on the previous one. The arc:

#	Notebook	Substrate	What it shows
01	Composition core walkthrough	scalars	`Operator`, `Sequential`, `Graph`, `Fanout`, `Branch`, `Switch`, `Tap`, `Snapshot`, `ShapeTrace`, `Identity`, `Const`, `Lambda`, `Sink`, `ModelOp`, pickling — the entire composition algebra against plain Python ints. No GIS setup; read this first if `Operator` is new.
02	Pipeline idioms	scalars + small numpy	The pipekit idiom gallery: `Profile`, `Histogram`, `Try`, `Coalesce`, `Retry`, `Cache`, `AssertShape`/`AssertDType`/`AssertHasAttribute`, `Quarantine`, plus build-your-own recipes for the few primitives not yet in pipekit (Spy, Diff, Provenance, Subsample, ApplyToBands).
03	Operators on Sentinel-2 — Lake Tahoe	real S2 L2A (MPC)	First real-data notebook. STAC search → `GeoTensor` → named ops from `gz.radiometry` / `gz.indices` / `gz.cloud` / `gz.mask`. Ends with a fully instrumented pipeline (`AssertShape
04	Image processing — Caldor fire	pre/post Sentinel-2 (MPC)	Full `gz.radiometry` display chain (`ToFloat32
05	Patching — grid → process → stitch	real S2	`geopatcher.SpatialPatcher` + `gz.patch_ops.{GridSampler, ApplyToChips, Stitch}` — the canonical three-op tiled-inference pipeline. Compares `SpatialHann` vs `SpatialBoxcar` windows against the full-scene reference.
06	ML patches — augmentations + inference	real S2	Same patcher machinery, but the per-chip op is `gz.ModelOp(model, method="predict")` and the stitch is `SpatialHardVote(n_classes=3)`. Demonstrates `gz.augment.Compose([RandomFlip, RandomRotate90, BrightnessJitter, GaussianNoise])` for training-chip augmentation, plus `SpatialJitteredStride` for jittered training-chip sampling.
07	Deployment shapes	mixed	The capstone: thirteen deployment patterns (notebook exploration, ETL, FastAPI, tile server, orchestrator, regulatory artifact, benchmark, audit, hot-reload, streaming, …) showing where the same operator algebra fits across production contexts.

The order is pedagogical, not strictly historical: 01–02 establish the algebra, 03–04 ground it in real imagery and the named-op surface, 05–06 add the patcher / ML layer, 07 sketches deployment. Notebooks 01–02 run against plain scalars and need no MPC access; 03–06 fetch Sentinel-2 from Microsoft Planetary Computer (anonymous read — no auth); 07 mixes both.

Deep dives¶

Once the applied walkthrough is comfortable, two deep-dive families go underneath the surface of the stack:

`notebooks/patching/` — `geopatcher`¶

#	Notebook	What it shows
01	Intro — sliding-window inference	The four `SpatialPatcher` axes (`Geometry`, `Sampler`, `Window`, `Aggregation`) end-to-end on a single raster.
02	Geometries gallery	All five geometry types: `Rectangular`, `SphericalCap`, `KNNGraph`, `RadiusGraph`, `PolygonIntersection`.
03	Samplers gallery	Where anchors go: `RegularStride`, `JitteredStride`, `Random`, `PoissonDisk`, `Explicit`.
04	Field backends	One Patcher, five `Field` adapters: `RasterField`, `XarrayField`, `RioXarrayField`, `XvecField`, `GeoPandasField`, `DaskField`.
05	Temporal + spatiotemporal	`TemporalPatcher` along the time axis, then `SpatioTemporalPatcher` composing space × time.
06	Streaming reconstruction	Zarr accumulator → real GeoTIFFs without materialising the full grid in memory.

Framework recipes (`notebooks/patching/recipes/`)¶

Recipe	Bridge
Grain MapDataset	`SpatialPatcher` → JAX `grain.MapDataset`
JAX vmap	`SpatialPatcher` → `jax.vmap` batched inference
torch Dataset	`SpatialPatcher` → `torch.utils.data.Dataset`

`notebooks/catalog/` — `geocatalog`¶

#	Notebook	What it shows
01	Intro — build → query → load	Build a catalog from a real Sentinel-2 L2A archive on MPC (eight scenes over Lake Tahoe), query it, mosaic and time-stack the matches.
02	Backends	Raster, xarray, and vector catalog backends.
03	Set algebra	`query`, intersect, union — composable catalog operations.
04	DuckDB at scale	DuckDB-backed catalogs over millions of items.
05	Catalog ↔ Patch bridge	`CatalogDomain` plugs the multi-file archive into the same `SpatialPatcher` pipeline.

The applied walkthrough (01–07 at the top) shows what the stack does on real data; the deep dives show every knob on each axis with worked examples.

Layout¶

projects/geostack/
├── pyproject.toml          # standalone "geostack" package
├── README.md               # this file
├── src/geostack/           # shared real-data loaders (data.py)
├── tests/                  # smoke tests for the loaders
└── notebooks/
    ├── 01_composition_core.ipynb
    ├── 02_pipeline_idioms.ipynb
    ├── 03_operators_lake_tahoe.ipynb
    ├── 04_image_processing_caldor.ipynb
    ├── 05_patching_grids.ipynb
    ├── 06_ml_patches_augment.ipynb
    ├── 07_deployment_shapes.ipynb
    ├── patching/           # 6 deep dives + recipes/ (3 framework bridges)
    └── catalog/            # 5 deep dives

Each *.ipynb ships with an executed copy (figures inline). To re-execute a single notebook against fresh MPC data:

pixi run -e geostack jupyter nbconvert --to notebook --execute --inplace \
    projects/geostack/notebooks/03_operators_lake_tahoe.ipynb

Reproducing¶

The parent research_notebook pixi file defines a geostack feature / environment that bundles all the deps (geotoolz, geopatcher, geocatalog, planetary-computer, pystac-client, rioxarray, matplotlib, ipykernel, nbconvert, scipy, duckdb, pyogrio, netcdf4, xvec, …).

# One-time install
pixi install -e geostack

# Re-execute scoped subsets (each task targets one notebook tier).
pixi run -e geostack execute-geostack            # applied walkthrough (01–07)
pixi run -e geostack execute-geostack-patching   # patching/ deep dives + recipes
pixi run -e geostack execute-geostack-catalog    # catalog/ deep dives

# Convenience: applied + patching + catalog in one shot.
pixi run -e geostack execute-geostack-all

# Smoke-test the geostack.data loaders against MPC / GBIF / Natural Earth.
pixi run -e geostack test-geostack

# Or run a single notebook
pixi run -e geostack jupyter nbconvert --to notebook --execute --inplace \
    projects/geostack/notebooks/03_operators_lake_tahoe.ipynb

For non-pixi users, the standalone pyproject.toml here pins the same deps; uv pip install -e projects/geostack (or pip install -e .) into an activated venv works equivalently.

Why these scenes?¶

Lake Tahoe (10SGJ), June 14 2024 (0.01 % cloud) — peak-green Sierra window. NDVI and NDWI light up cleanly, the lake provides a hard-water signal, and the surrounding granite + forest gives contrast. Used by notebooks 03, 05, 06.
Caldor fire core (10SGH), Aug 9 vs Nov 27 2021 — the second S2 scene ever to cross the Sierra Nevada crest. Pre / post bracketing shows a dramatic dNBR severity gradient. Used by notebook 04.

Both AOIs are on Microsoft Planetary Computer’s anonymous Sentinel-2 L2A read path — no API key, no signed URLs to manage — so the reproduction is one pixi run from a clean clone.

Cross-references back to geotoolz¶

The notebooks link out to the geotoolz docs for concept pages and API reference (Concepts, Define an operator, Branching pipelines, Integration with geocatalog & geopatcher, Core API). Those links point at the canonical source on github.com/jejjohnson/geotoolz. This project is the applied companion; geotoolz/docs is the library reference.