v2.5 — Per-AOI pixel-level cloud climatology
Given an AOI (point with radius, polygon, or bbox up to ~50 km), use the actual QA bands of every intersecting scene to compute the truthful clear-fraction time series — on demand, in seconds.
This is the single-AOI, on-demand version of the pixel-level problem.
It shares the AOI flow with satellite_viewer (the same input shape —
draw a polygon, pick a sensor, pick a date range) but the output is the
clear-fraction climatology for that AOI, not a thumbnail preview.
Goal¶
For one user-supplied AOI and time window:
- Discover intersecting scenes (reuse
satellite_viewer.search). - For each scene, lazily read the QA band windowed to the AOI via
georeader. - Decode the QA band to a per-pixel clear mask using a per-sensor decoder.
- Reduce the mask to a single number: clear-pixel fraction inside the AOI for this acquisition.
- Return the (datetime, clear_fraction, scene_id) time series plus
aggregates (
clear_obs_count,mean_clear_fraction,longest_clear_gap_days).
Question answered¶
“Across all scenes that touched this AOI in , what fraction of the AOI’s pixels were actually clear in each scene, and what does the resulting clear-fraction time series look like?”
Validates v3’s algorithm at user-facing scale before committing to the global compute. Also a standalone product: ecologists, agronomists, plume hunters all care about “how often can I see this specific field / hotspot / region clearly?” — exactly the v2.5 deliverable.
Scope¶
- AOI: single shapely geometry, size-capped (default 50 km bbox side; user knob). Above the cap the dashboard refuses the request with a “use v3 / batch mode” hint.
- Sensors: starts with
sentinel-2-l2a(SCL band). Addlandsat-c2-l2(QA_PIXEL) next. MODIS / VIIRS deferred. - Time window: user-supplied. Multi-year is fine — the work is proportional to scenes, not to area.
- Compute target: 1–30 s per request on a laptop / single notebook server.
Algorithm¶
1. items = satellite_viewer.search(sensor, aoi, t0, t1, cloud_lt=None, max_items=1000)
2. for item in items: # → parallelisable
qa = georeader.read_window(item.assets[qa_band_key], aoi)
clear_mask = decoder(qa, clear_classes=cfg.clear_classes)
clear_frac = clear_mask.mean() # float in [0, 1]
record (item.datetime, item.id, clear_frac)
3. emit:
- per-scene time series: [(datetime, scene_id, clear_frac, eo:cloud_cover)]
- aggregates: clear_obs_count = sum(frac > clear_threshold),
mean_clear_fraction = mean(frac),
longest_clear_gap_days = max(gap between frac > threshold)The per-item work is the same code that v3 will run in its inner loop — the abstraction split is “v2.5 = one AOI, run interactively” vs. “v3 = every cell of the global grid, run as a batch.”
Architecture¶
projects/satellite_climatology/
└── src/satellite_climatology/
├── grid.py # (shared with v1/v2/v3) — not used by v2.5
├── sensors.py # (shared) — adds qa_band_key, clear_classes per sensor
├── qa_decoders/ # (shared with v3)
│ ├── s2_scl.py
│ ├── landsat_qapixel.py
│ └── modis_state1km.py
├── operators.py # (shared with v3) — ReadQA, DecodeQA, CellClearFrac
└── aoi_pipeline.py # the v2.5 entry point — runs the per-scene op over an AOIRepo wiring:
satellite_viewer.search— reused unchanged for discovery. The same(sensor, aoi, t0, t1) -> GeoDataFrame[id, datetime, geometry, …]surface is what v2.5 consumes.geotoolzholds the per-scene pipeline as a serialisableOperatorgraph:The same graph is reused in v3.per_scene = ( ResolveAsset(asset_key=cfg.qa_band) # item -> signed COG URL | ReadQA(reader="georeader", window=aoi) # URL -> ndarray | DecodeQA(decoder=cfg.qa_decoder, clear_classes=cfg.clear_classes) | CellClearFrac() # ndarray -> float in [0, 1] )get_config()serialises the whole thing so the v3 batch job and the v2.5 interactive call produce bit-identical numbers.georeaderis the heavy lifting: lazy windowed reads of the COG / HDF / Zarr asset, clipped to the AOI bbox at the asset’s CRS, no full-scene download.geocatalogis not used at v2.5 scale — we don’t need a persistent index for a single AOI’s worth of items. v3 brings it back when the catalog scan goes global.geopatcheris not used at v2.5 (single AOI, no tiling). v3 brings it back.
Output¶
A pandas DataFrame, not a Zarr band:
columns: ["datetime", "scene_id", "sensor", "scene_cloud_pct", "clear_fraction"]
shape: (n_scenes, 5)
index: reset
crs: no geometry (the AOI is shared across rows)Plus the three scalar aggregates returned alongside.
UI integration¶
A new tab in the satellite_viewer dashboard, or a sibling subapp.
Same AOI input shape as the preview tool. Output panes:
- Clear-fraction time series — scatter or line of
clear_fractionvs.datetime, colour-coded by sensor. - Climatology stats card —
clear_obs_count,mean_clear_fraction,longest_clear_gap_days,n_scenes. - Side-by-side scenes — for each acquisition, RGB thumbnail (from
STAC
rendered_preview, reused fromsatellite_viewer) next to the QA-derived clear mask preview at AOI scale. - Comparison with scene-level — overlay
eo:cloud_cover(the v2 answer) vs.1 − clear_fraction(the v2.5 answer). Where they diverge is the entire point of v2.5.
Compute budget¶
Per AOI, default 50 km bbox cap:
- 1 year of S2 over a 50 km AOI ≈ 50–150 items.
- Per item: one windowed COG read of the SCL band, AOI-clipped → ~50 KB to ~5 MB raw bytes, dominated by TCP / TLS / signing.
- → ~100 items × 100–300 ms (parallelised with
asyncio/ thread pool) = 3–10 s wall-clock. - Memory: peak ~50 MB for the largest single SCL window.
This is the dashboard-tractable target. Above the size cap or beyond ~5-year windows the user gets routed to v3.
Risks & open questions¶
- Per-sensor QA decoder is real work — S2 SCL is the easiest (single class enum, well-documented). Landsat QA_PIXEL is bit-packed. MODIS state_1km is bit-packed and version-dependent. Ship S2 only for M5.
- “Clear” definition is a user knob. Default for S2 SCL: classes
{4, 5, 6, 11} (veg, bare, water, snow) treated as clear; {3, 8, 9, 10}
(shadow, cloud_med, cloud_high, thin_cirrus) treated as not clear;
{7} (unclassified) treated as not clear by default. Expose
clear_classesin the dashboard so users can flip e.g. snow on/off. - CRS handling — S2 SCL is per-MGRS-tile UTM; AOI is EPSG:4326.
Reprojection on read (or AOI reprojection to the item’s CRS) is
georeader’s job. Verify it does the right thing for a multi-tile AOI in M5’s first day. - AOI larger than one MGRS tile — request is split into per-tile
reads, merged at AOI scale.
georeaderalready handles this for S2; verify and bench. - Out-of-MGRS AOIs (e.g., polar > 80°) need explicit handling since S2 doesn’t tile there. Just exclude with a clear error.
- Async vs. thread pool —
georeader’s API is sync (last I checked). Wrap withconcurrent.futures.ThreadPoolExecutorfor per-scene parallelism. Re-evaluate ifgeoreaderadds an async API.
Acceptance¶
- Single notebook + Panel/Streamlit subapp answering “draw an AOI, pick a date range, get a clear-fraction time series” in < 15 s for the default 50 km × 1 year case.
- Sanity:
clear_fraction≤ 1 in every row.- For cells fully inside a single scene of low
eo:cloud_cover:clear_fraction ≈ 1 − eo:cloud_cover / 100(within ~10%). - For cells inside the cloudy half of a heterogeneous scene:
clear_fraction≪1 − eo:cloud_cover / 100(the point of v2.5).
- The per-scene operator graph runs unchanged in a v3 prototype over a 1° × 1° patch (same numbers as the dashboard).
Out of scope¶
- Global compute (that’s v3).
- Multi-sensor fusion (different sensors have different “clear” semantics; defer until each sensor’s decoder ships).
- Atmospheric correction quality flags (only “clear or not”).
- Sub-pixel cloud masking — we accept the QA band’s spatial resolution.
- Real-time / incremental updates — every request re-fetches.