Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

RS cloud-native stack

obstore, georeader, geotoolz, titiler, lonboard — and how they fit

UNEP
IMEO
MARS

Scope: a rundown of obstore, RasterioReader, async-geotiff, georeader, geotoolz, titiler, and lonboard — what each is for, how they depend on one another, and which combinations match common workflows. (The lazycogs library ships an xarray.DataArray carrier and is part of the dense / cube stack; see ../geostack_notes.md Stack B for that discussion.) Status: reference document, not a design proposal. Companion to the per-design documents in georeader/, geotoolz/, geodatabase/, readers/, and types/ — read this first to orient on the ecosystem, then dive into a specific design when you want the full spec.

Where each topic is fully specified

This file owns the visual ecosystem map (the layered diagram, the strategy-comparison tables, the bytes-paths triage diagram, the end-to-end flows). For the deep design of any one tool, follow the pointer:

TopicFull design
Reader Protocol surface (AsyncGeoData added; RasterioReader widening)georeader/reader_protocol.md
Cloud byte transport — defer to obspec (no Protocol of our own)types/bytestore.md
AsyncGeoTIFFReader — thin adapter over developmentseed/async-geotiffgeoreader/reader_async_geotiff.md
geotoolz operator librarygeotoolz/geotoolz.md
GeoCatalog + builders + DuckDB backendgeodatabase/
GeoSlice + samplers + stitchtypes/geoslice.md
Per-sensor readers (geostationary, MODIS, …)readers/

The layered diagram

There’s a clean six-layer stack across this set, much of which is shipped by DevSeed / Kyle Barron.

                ┌──────────────────┐    ┌────────────────┐
                │    lonboard      │    │    titiler     │
   viz / UX  →  │  Jupyter/deck.gl │    │  HTTP tile API │
                └────────┬─────────┘    └────────┬───────┘
                         │                       │
                         │ GeoTensor / Arrow     │ XYZ tiles (PNG/JPEG)
                         │                       │
                ┌────────▼───────────────────────┴───────┐
   compute   →  │              geotoolz                  │
                │   Operator · Sequential · Graph        │
                └────────────────┬───────────────────────┘
                                 │ GeoTensor in / GeoTensor out
                                 │
                ┌────────────────▼───────────────────────┐
   substrate →  │     georeader (carriers + index)       │
                │   GeoTensor · GeoSlice · GeoData       │
                │   GeoCatalog                           │
                └─────┬───────────────────────┬──────────┘
                      │                       │
                      ▼                       ▼
              ┌─────────────┐         ┌──────────────────┐
   readers →  │ Rasterio-   │         │ async-geotiff    │
              │ Reader      │         │ (async, external,│
              │ (sync,      │         │  COG-only)       │
              │ in georead.)│         │                  │
              └──────┬──────┘         └────────┬─────────┘
                     │                         │
                     ▼                         ▼
              ┌──────────┐         ┌──────────────────────────┐
   transport →│  GDAL /  │         │   obstore  /  fsspec     │
              │  VSI     │         │  (Rust async / Py hybrid)│
              └─────┬────┘         └────────────┬─────────────┘
                    │                           │
                    └──────────────┬────────────┘
                                   ▼
                         ┌─────────────────┐
   storage   →           │  S3/GCS/Azure   │
                         └─────────────────┘

Bottom-up dependency direction. Anything above can call anything below; nothing reaches up. The substrate box (georeader) carries only the typesGeoTensor, GeoSlice, GeoData, GeoCatalog. The reader implementations sit one layer down. RasterioReader is in georeader’s own source tree but functionally peers with async-geotiff. (georeader also ships sensor-specific readers — S2_SAFE_reader, EMIT, EnMAP, etc. — those live alongside RasterioReader and produce GeoTensor the same way.)

Note on RasterioReader’s bytes path. The diagram shows RasterioReader going through GDAL VSI by default. That’s the common path, but RasterioReader can also delegate to obstore or fsspec via rasterio’s opener= parameter — so it isn’t strictly bound to the GDAL/VSI lane. See § “What’s actually inside RasterioReader below.


Per-tool rundown

1. obstore (transport)

What it is. Python bindings to the Rust object_store crate. A unified async API for S3, GCS, Azure Blob, and local filesystems — plus HTTP. Made by DevSeed.

Why it matters. It’s roughly 10× faster than fsspec + aiobotocore for parallel cloud reads, because the Rust runtime handles HTTP/2, connection pooling, and range coalescing properly. Drop-in for anywhere you’d reach for s3fs or fsspec.

Deps. pyo3 runtime; nothing in Python land.

Use cases.

Talks to. Everything above it that does cloud I/O. async-geotiff builds on it directly. georeader.RasterioReader historically uses GDAL’s VSI for cloud, but moving its remote path onto obstore is on the table.

2. RasterioReader (file reading — sync, in georeader)

What it is. The default sync reader in georeader. Wraps rasterio.open with lazy-but-windowed semantics. Universal driver coverage (every GDAL format), GDAL/VSI for cloud paths by default, fresh-per-read for process-safety. The right answer ~90% of the time.

See: georeader/reader_protocol.md for the full design (Protocol surface, opener=/fs= knobs, three-bytes-paths triage); Tutorial Ch. 3 for the current implementation.

3. async-geotiff (file reading — async, external)

What it is. An async GeoTIFF / COG reader. Tiles fetched via HTTP range requests through obstore; TIFF IFDs parsed concurrently; no GDAL. By DevSeed. The basis for georeader’s planned AsyncGeoTIFFReader.

Why it matters. Lights up workloads where many tile reads happen simultaneously (tile servers, hyper-parallel batch inference) — asyncio.gather(*[read_window(w) for w in 1000_windows]) avoids thread-per-request overhead and exploits HTTP/2 multiplexing.

See: georeader/reader_async_geotiff.md for the full design.

4. georeader (substrate)

What it is. The Python library that owns the geospatial substrate types and I/O orchestration.

ComponentRole
GeoTensornp.ndarray subclass carrying transform, crs, dims, fill_value_default. The numpy-shaped, geo-aware, ufunc-protocol-friendly substrate.
GeoSliceA spatiotemporal descriptor — bounds, interval, resolution, crs. Unit of work passed between samplers and loaders.
GeoDataHigher-level container / abstract reader interface. Likely the base type for things like S2_SAFE scenes, EMIT scenes, and other multi-product readers.
RasterioReaderSync, rasterio-backed reader. Lazy-but-windowed: open a file once, read sub-windows on demand. The default I/O path.
GeoCatalogCatalog of files / scenes. Wraps a GeoDataFrame with IntervalIndex + geometry; query / intersect / union live here.

Why it matters. This is the API surface RS users hold. Everything above (geotoolz, titiler indirectly, lonboard indirectly) consumes GeoTensor. Everything below (async-geotiff, obstore) is plumbing that produces them.

Deps. Hard: numpy, rasterio, shapely, geopandas, pyproj. Optional: obstore / async-geotiff if/when remote-async paths are wired in.

Use cases. All RS workflows. Loading a Sentinel-2 scene, reading a bbox from a Landsat archive, building a catalog over 1M files, sampling chips for ML, saving COGs.

Talks to. Below: rasterio (sync), obstore / async-geotiff (async paths). Above: geotoolz for transformations, titiler and lonboard for serving / viz.

5. geotoolz (computation)

What it is. The composable Operator library on top of GeoTensor. Operator, Sequential, Graph, plus the curated RS modules (indices, radiometry, cloud, compositing, pansharpen, sar, hyperspectral, sampling, inference, catalog_ops, presets).

Why it matters. Without it, every RS pipeline is bespoke glue code. With it, Sequential([MaskClouds(...), TOAToBOA(...), NDVI(...)]) — declared in YAML or Python.

See: geotoolz/README.md for the navigation entry; geotoolz/geotoolz.md for the full design report (architecture, 12-module surface, end-to-end examples, xr_toolz coexistence).

6. titiler (serving)

What it is. A dynamic tile server built on FastAPI + rio-tiler. Serves XYZ / WMTS / OGC tiles from COGs, STAC items, or MosaicJSON. Comes with a viewer UI and OGC-compliant endpoints. By DevSeed.

Why it matters. Once you’ve produced a COG (NDVI, classification map, segmentation result), you want to look at it on a web map. titiler does on-the-fly tiling: a request for tile z/x/y triggers a rio-tiler read of just that COG window, color-mapped, returned as PNG/JPEG. No pre-rendering, no tile cache management.

Deps. fastapi, uvicorn, rio-tiler, morecantile. Doesn’t depend on georeader or geotoolz directly — it consumes COGs (or STAC), which the lower stack produced.

Use cases.

Talks to. Below: reads COGs from object storage (potentially via obstore / async-geotiff if you wire rio-tiler that way; default is GDAL/VSI). Above: HTTP clients, lonboard, web maps, leafmap.

7. lonboard (visualization)

What it is. Geospatial visualization in Jupyter using deck.gl. Renders huge vector data (millions of features) and raster tiles efficiently by binary-streaming GeoArrow to the browser. Recently added raster (XYZ tile-layer) support. Also by DevSeed.

Why it matters. Folium / ipyleaflet choke on 100k+ features. lonboard ships GeoArrow over the kernel-frontend boundary as a typed buffer and lets deck.gl’s WebGL render it; the result is hundreds of millions of points / lines / polygons interactive in a notebook.

Deps. pyarrow, geopandas, anywidget, deck.gl-py-bindings. Doesn’t depend on georeader or geotoolz directly — accepts geopandas / GeoArrow / image arrays.

Use cases.

Talks to. Below: takes geopandas / arrays directly, or pulls raster tiles from a titiler URL.


The two readers compared

The choice of reader is usually the first decision in any pipeline.

PropertyRasterioReaderasync-geotiff
Lives ingeoreaderexternal (DevSeed)
Sync / asyncsyncasync
TransportGDAL / VSIobstore (Rust async)
Driver supportevery GDAL driver (TIFF, JP2, NetCDF, HDF5, GRIB, ENVI, …)TIFF / COG only
Format-spec coveragefull, including non-tiled rastersCOG-shaped (tiled)
CRS / warpingGDAL warping, full PROJ stackminimal — bytes → numpy
Open costlow (header + metadata)low
Read cost (small bbox)one VSI range request × N tiles, sequentialone async batch of tile reads, parallel
Read cost (whole file)streaming sequential readparallel tiles
Concurrencyneeds threadpool; GDAL not fully thread-safenative asyncio
Memory footprintbounded by window sizebounded by tile fan-out
Best forsingle scenes, non-TIFF data, CRS-heavy work, batch jobstile servers, parallel batch reads of many COGs, random sampling across thousands of COGs
Worst forcloud-heavy 1000-files-at-a-timenon-TIFF rasters, GDAL-only quirks

A practical rule:

        Is the file a COG in cloud storage,
        and do I need many concurrent reads?
                       │
                       ▼
                ┌─────────────┐
                │             │
                ▼             ▼
              YES            NO
                │             │
                ▼             ▼
        async-geotiff   RasterioReader.
                        (the safe default —
                         all formats, sync)

Both are interchangeable behind one Protocol surface — pipelines accept a reader_class= argument and the rest of the code is unchanged. See georeader/README.md for the protocol surface that makes them swappable, and georeader/reader_protocol.md for the refactor that locks it in.

What about lazy-cogs? The developmentseed/lazycogs library is a STAC-driven lazy loader that returns xarray.DataArray, not GeoTensor. It’s a peer of stackstac / odc-stac for the dense-cube / xarray stack rather than a georeader reader option. See ../geostack_notes.md Stack B for that discussion.


obstore vs fsspec compared

Once you’re below the reader layer, you’re choosing how the bytes themselves move. The two real options are obstore and fsspec. They overlap in scope but differ in shape, language backbone, and ecosystem fit.

Propertyobstorefsspec
Language backboneRust (object_store crate via pyo3)Pure Python with per-backend extensions
API styleObject store (get(key), get_range(key, off, len), put, list)Filesystem (open(path), seek, read, cat, glob)
Sync / asyncAsync-native; sync helpers ride on topSync-native; async bolt-on (asynchronous=True)
HTTP backendRust hyper — HTTP/2, multiplexing, range coalescingPer-backend lib (varies; often HTTP/1.1)
BackendsS3, GCS, Azure, HTTP, local, in-memoryAll of the above + FTP, SFTP, HDFS, ADLS, OCI, GitHub, Dropbox, Google Drive, …
Throughput on 1k parallel ranges~10× over s3fs+aiobotocoreBaseline
Ecosystem integrationNew (zarr 3, async-geotiff, lazycogs, obstore-rs consumers)Wide and mature: pandas, xarray, zarr ≤ 2, dask, geopandas, parquet readers, anything that wraps fs.open()
AuthNative credential chains (AWS / GCS / Azure SDKs) compiled inPer-backend; quality varies
Install footprintOne Rust binaryTiny core; per-backend extras (s3fs, gcsfs, adlfs)
MaturityNew (2024+); fast-movingMature (2017+); stable, ubiquitous

A practical rule:

        Is the workload "read many byte ranges from
        S3/GCS/Azure as fast as possible"?
                       │
                       ▼
                ┌─────────────┐
                │             │
                ▼             ▼
              YES            NO
                │             │
                │             ├─► Niche backend (FTP, SFTP, GitHub, …)?
                │             │     └─► fsspec (only option)
                │             │
                │             └─► Need to plug into pandas/xarray/zarr/dask?
                │                   └─► fsspec (the universal adapter)
                │
                ▼
              obstore.
              (5–10× faster on parallel COG reads)

The two coexist comfortably. New code paths in async-geotiff and lazycogs default to obstore; older code paths in geopandas, pandas, xarray, and zarr ≤ 2 go through fsspec. georeader.GeoCatalog uses obstore for its parquet round-trip when reading remote catalogs because that’s the hot path; but it can fall back to fsspec for niche storage.

For our async path: AsyncGeoTIFFReader accepts any obspec.AsyncStore (obstore.S3Store / GCSStore / AzureStore, etc.) via store=. We don’t ship a ByteStore Protocol of our own — obspec is the upstream Protocol and obstore is the reference implementation. Sync reads with niche backends still go through RasterioReader(fs=fsspec_fs). See types/bytestore.md for the (one-page) rationale and the small geotoolz.io.open_store(url) factory.


What’s actually inside RasterioReader

The main diagram shows RasterioReader going through GDAL / VSI. That’s the default, but it isn’t the only option. rasterio.open(...) accepts an opener= callable (added in rasterio 1.4) that GDAL uses for byte-level reads, which means RasterioReader can route bytes through fsspec or obstore instead of GDAL’s built-in HTTP client. Three paths, all sync, all sitting under the same Python class:

              ┌──────────────────────┐
              │   RasterioReader     │
              └──────────┬───────────┘
                         │
                         ▼
              ┌──────────────────────┐
              │  rasterio.open(...)  │
              └──────────┬───────────┘
                         │
              ┌──────────┼──────────────────┐
              ▼          ▼                  ▼
       ┌──────────┐ ┌──────────────┐ ┌──────────────────┐
       │ GDAL VSI │ │ opener=fs.open│ │ opener=<custom>  │
       │ (libcurl)│ │  (fsspec)     │ │ (obstore-aware)  │
       └────┬─────┘ └──────┬────────┘ └────────┬─────────┘
            │              │                   │
            └──────────────┴───────────────────┘
                           │
                           ▼
                  S3 / GCS / Azure / …
PathWho fetches the bytesWhen you’d use it
GDAL VSI (default)libcurl inside the GDAL binary — /vsis3/, /vsigs/, /vsiaz/, /vsicurl/. Pure C; no Python in the byte-fetching loop.Anything S3/GCS/Azure/HTTPS. Just works; the fastest non-async option for general use.
opener=fs.open (fsspec)Python file-like via fsspec; GDAL calls back into Python for each byte range. Slower than VSI because of the Python ↔ C trip per range.Niche backends GDAL doesn’t natively support (FTP, SFTP, GitHub, custom auth flows), or when the rest of the pipeline already holds an fsspec filesystem and re-using it simplifies the auth story.
opener=<custom obstore callback>Python adapter wrapping obstore.ObjectStore.get_range, given to GDAL via opener=.Possible in principle. In practice you’d bypass RasterioReader entirely and use async-geotiff directly — already that path, without GDAL in between.

Two takeaways from the diagram:

For the opener= / fs= constructor knobs and the refactor that wires them in, see georeader/reader_protocol.md §“RasterioReader refactor”.


Three concrete combined flows

Flow A — single-scene inference, all sync, simplest

# Read → process → save
reader = georeader.RasterioReader("s3://bucket/scene.tif")
gt = reader.load()                                     # GeoTensor
ndvi = geotoolz.indices.NDVI(red_idx=2, nir_idx=3)(gt) # GeoTensor
georeader.save_cog(ndvi, "s3://out/ndvi.tif")          # COG written

# Then serve it:
#   titiler --src s3://out/ndvi.tif
#
# Then look at it:
import lonboard
lonboard.Map(layers=[lonboard.BitmapTileLayer(
    data="http://localhost:8000/cog/tiles/{z}/{x}/{y}.png?url=s3://out/ndvi.tif"
)])

Stack used: rasteriogeoreader.RasterioReadergeoreader.GeoTensorgeotoolz → COG → titilerlonboard. No async needed, no catalog. RasterioReader is the right reader here — single scene, GDAL-friendly, no concurrency need.

Flow B — catalog-driven async batch processing

# Build / open a catalog of 50k COGs in S3
catalog = georeader.catalog.open_catalog("s3://bucket/s2_eu.parquet")     # uses obstore

# Define a per-tile pipeline — same as Flow A but as an Operator
per_tile = geotoolz.Sequential([
    geotoolz.cloud.MaskClouds(qa_band_idx=-1, bits=[10, 11]),
    geotoolz.indices.NDVI(red_idx=2, nir_idx=3),
    geotoolz.catalog_ops.WriteCOG(path_template="s3://out/{tile_id}.tif"),
])

# Run across the catalog
geotoolz.catalog_ops.CatalogPipeline(
    catalog,
    per_tile,
    reader_class=AsyncGeoTIFFReader,                   # async-geotiff under obstore
    n_concurrent=64,
).run()

Stack used: obstoreasync-geotiffgeoreader.GeoTensorgeotoolz.Sequentialgeoreader.save_cog → S3. The async path lights up because the workload is I/O-bound across thousands of small reads. RasterioReader would also work here — just sequentially over a threadpool — but async-geotiff will be 5–10× faster for cloud-COG-heavy workloads.

Flow C — ML dataloader with async COG fan-out + chip sampler

catalog = georeader.catalog.open_catalog("s3://bucket/s2_eu.parquet")
sampler = geotoolz.sampling.RandomSampler(catalog, chip_size=(256, 256), length=100_000)

# Each chip is one async COG window read — no full file fetched.
# An AsyncGeoChipDataset wraps AsyncGeoTIFFReader.read_window calls inside
# the dataloader's worker loop (sync facade or per-worker event loop).
loader = torch.utils.data.DataLoader(
    AsyncGeoChipDataset(sampler, reader_class=AsyncGeoTIFFReader),
    batch_size=32, num_workers=8,
)

for batch in loader:
    preds = model(batch)
    # ...

Stack used: obstoreasync-geotiffgeoreader.GeoSlicegeotoolz.sampling → torch. The win is that 100k random chips across 50k COGs costs only the bytes in 100k tiny range requests, not 50k full files. RasterioReader would technically work but you’d pay GDAL’s VSI overhead on every chip — async-geotiff’s pure-Rust HTTP path is a meaningful speedup at this fan-out.


How they overlap, where the seams are

A few places where two tools could do the same job and you have to pick:


In one sentence

obstore moves bytes; RasterioReader (sync, GDAL-backed, in georeader) and async-geotiff (async, cloud-COG-heavy) turn those bytes into numpy slices via two complementary strategies; georeader wraps those slices as GeoTensor and indexes them as GeoCatalog; geotoolz composes operators over GeoTensor; titiler serves the resulting COGs as web tiles; lonboard shows them in Jupyter.