Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Geostack — composable operators on real RS imagery

Geostack worked examples

A self-contained, chronologically ordered notebook walkthrough of the pipekit + geotoolz + geopatcher stack. Each notebook is a single self-contained slice that builds on the previous one. The arc:

#NotebookSubstrateWhat it shows
01Composition core walkthroughscalarsOperator, Sequential, Graph, Fanout, Branch, Switch, Tap, Snapshot, ShapeTrace, Identity, Const, Lambda, Sink, ModelOp, pickling — the entire composition algebra against plain Python ints. No GIS setup; read this first if Operator is new.
02Pipeline idiomsscalars + small numpyThe pipekit idiom gallery: Profile, Histogram, Try, Coalesce, Retry, Cache, AssertShape/AssertDType/AssertHasAttribute, Quarantine, plus build-your-own recipes for the few primitives not yet in pipekit (Spy, Diff, Provenance, Subsample, ApplyToBands).
03Operators on Sentinel-2 — Lake Tahoereal S2 L2A (MPC)First real-data notebook. STAC search → GeoTensor → named ops from gz.radiometry / gz.indices / gz.cloud / gz.mask. Ends with a fully instrumented pipeline (`AssertShape
04Image processing — Caldor firepre/post Sentinel-2 (MPC)Full gz.radiometry display chain (`ToFloat32
05Patching — grid → process → stitchreal S2geopatcher.SpatialPatcher + gz.patch_ops.{GridSampler, ApplyToChips, Stitch} — the canonical three-op tiled-inference pipeline. Compares SpatialHann vs SpatialBoxcar windows against the full-scene reference.
06ML patches — augmentations + inferencereal S2Same patcher machinery, but the per-chip op is gz.ModelOp(model, method="predict") and the stitch is SpatialHardVote(n_classes=3). Demonstrates gz.augment.Compose([RandomFlip, RandomRotate90, BrightnessJitter, GaussianNoise]) for training-chip augmentation, plus SpatialJitteredStride for jittered training-chip sampling.
07Deployment shapesmixedThe capstone: thirteen deployment patterns (notebook exploration, ETL, FastAPI, tile server, orchestrator, regulatory artifact, benchmark, audit, hot-reload, streaming, …) showing where the same operator algebra fits across production contexts.

The order is pedagogical, not strictly historical: 01–02 establish the algebra, 03–04 ground it in real imagery and the named-op surface, 05–06 add the patcher / ML layer, 07 sketches deployment. Notebooks 01–02 run against plain scalars and need no MPC access; 03–06 fetch Sentinel-2 from Microsoft Planetary Computer (anonymous read — no auth); 07 mixes both.

Deep dives

Once the applied walkthrough is comfortable, two deep-dive families go underneath the surface of the stack:

notebooks/patching/geopatcher

#NotebookWhat it shows
01Intro — sliding-window inferenceThe four SpatialPatcher axes (Geometry, Sampler, Window, Aggregation) end-to-end on a single raster.
02Geometries galleryAll five geometry types: Rectangular, SphericalCap, KNNGraph, RadiusGraph, PolygonIntersection.
03Samplers galleryWhere anchors go: RegularStride, JitteredStride, Random, PoissonDisk, Explicit.
04Field backendsOne Patcher, five Field adapters: RasterField, XarrayField, RioXarrayField, XvecField, GeoPandasField, DaskField.
05Temporal + spatiotemporalTemporalPatcher along the time axis, then SpatioTemporalPatcher composing space × time.
06Streaming reconstructionZarr accumulator → real GeoTIFFs without materialising the full grid in memory.
Framework recipes (notebooks/patching/recipes/)
RecipeBridge
Grain MapDatasetSpatialPatcher → JAX grain.MapDataset
JAX vmapSpatialPatcherjax.vmap batched inference
torch DatasetSpatialPatchertorch.utils.data.Dataset

notebooks/catalog/geocatalog

#NotebookWhat it shows
01Intro — build → query → loadBuild a catalog from a real Sentinel-2 L2A archive on MPC (eight scenes over Lake Tahoe), query it, mosaic and time-stack the matches.
02BackendsRaster, xarray, and vector catalog backends.
03Set algebraquery, intersect, union — composable catalog operations.
04DuckDB at scaleDuckDB-backed catalogs over millions of items.
05Catalog ↔ Patch bridgeCatalogDomain plugs the multi-file archive into the same SpatialPatcher pipeline.

The applied walkthrough (01–07 at the top) shows what the stack does on real data; the deep dives show every knob on each axis with worked examples.

Layout

projects/geostack/
├── pyproject.toml          # standalone "geostack" package
├── README.md               # this file
├── src/geostack/           # shared real-data loaders (data.py)
├── tests/                  # smoke tests for the loaders
└── notebooks/
    ├── 01_composition_core.ipynb
    ├── 02_pipeline_idioms.ipynb
    ├── 03_operators_lake_tahoe.ipynb
    ├── 04_image_processing_caldor.ipynb
    ├── 05_patching_grids.ipynb
    ├── 06_ml_patches_augment.ipynb
    ├── 07_deployment_shapes.ipynb
    ├── patching/           # 6 deep dives + recipes/ (3 framework bridges)
    └── catalog/            # 5 deep dives

Each *.ipynb ships with an executed copy (figures inline). To re-execute a single notebook against fresh MPC data:

pixi run -e geostack jupyter nbconvert --to notebook --execute --inplace \
    projects/geostack/notebooks/03_operators_lake_tahoe.ipynb

Reproducing

The parent research_notebook pixi file defines a geostack feature / environment that bundles all the deps (geotoolz, geopatcher, geocatalog, planetary-computer, pystac-client, rioxarray, matplotlib, ipykernel, nbconvert, scipy, duckdb, pyogrio, netcdf4, xvec, …).

# One-time install
pixi install -e geostack

# Re-execute scoped subsets (each task targets one notebook tier).
pixi run -e geostack execute-geostack            # applied walkthrough (01–07)
pixi run -e geostack execute-geostack-patching   # patching/ deep dives + recipes
pixi run -e geostack execute-geostack-catalog    # catalog/ deep dives

# Convenience: applied + patching + catalog in one shot.
pixi run -e geostack execute-geostack-all

# Smoke-test the geostack.data loaders against MPC / GBIF / Natural Earth.
pixi run -e geostack test-geostack

# Or run a single notebook
pixi run -e geostack jupyter nbconvert --to notebook --execute --inplace \
    projects/geostack/notebooks/03_operators_lake_tahoe.ipynb

For non-pixi users, the standalone pyproject.toml here pins the same deps; uv pip install -e projects/geostack (or pip install -e .) into an activated venv works equivalently.

Why these scenes?

Both AOIs are on Microsoft Planetary Computer’s anonymous Sentinel-2 L2A read path — no API key, no signed URLs to manage — so the reproduction is one pixi run from a clean clone.

Cross-references back to geotoolz

The notebooks link out to the geotoolz docs for concept pages and API reference (Concepts, Define an operator, Branching pipelines, Integration with geocatalog & geopatcher, Core API). Those links point at the canonical source on github.com/jejjohnson/geotoolz. This project is the applied companion; geotoolz/docs is the library reference.