Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

v4 — Coverage planner (available / acquired / gap)

v4 — Coverage planner

A heatmap dashboard that overlays three views of the same global grid — what imagery is available, what we have acquired, and where the gap is (and, eventually, what to task) — sliceable by sensor, time window, and AOI, viewable globally or zoomed to a single location.

This is the planning/operations layer on top of the satellite_climatology substrate. v1–v3 answer “how observable is each pixel?”; v4 reframes those numbers as a coverage ledger: availability minus our holdings equals the gap we might act on.

Questions answered

The three layers

The dashboard is organised as three toggleable layer-groups over the same (sensor, time, lat, lon) grid. The user picks layer → metric → sensor → time window, then global or AOI.

LayerMeaningBuilt fromStatus
🟦 AvailableWhat could / did exist to be observedv1 overpasses + v2 scene scan + v3 clear-fractionreuses v1–v3
🟩 AcquiredWhat we hold, as two sub-layers (see below)catalog scansnew in v4
🟥 GapAvailable − Acquired, weighted into a priorityderivednew in v4

Available (reuses v1–v3)

No new compute — these are the bands v1/v2/v3 already write: overpasses, scenes_count, mean_scene_cloud_pct, cloud_free_scene_count, clear_fraction, … The Available layer is a re-labelling of the climatology product as “supply”.

Acquired — two sub-layers

Per the design decision, “what we’ve seen” is tracked as two distinct, independently toggleable sub-layers:

  1. Public clear observations — the cloud-free usable supply from the public catalog (i.e. v2’s cloud_free_scene_count / v3’s clear_obs_count). This is “what anyone could have gotten.”
  2. Our holdings — what this project has actually ingested, sourced from an external PostGIS holdings table (see Holdings source below). It is an already-populated catalog of owned tiles with footprint, acquisition date, sensor, cloud %, and (optionally) processing status — so we query it, we don’t rebuild it. Gridded into the same cells to produce held_count, held_clear_count, days_since_last_held, held_max_gap_days.

Keeping them separate lets the map distinguish “the data exists and is clear, we just don’t have it” (a pure download decision) from “no clear data exists at all” (a tasking decision).

If the holdings table carries a processing-status column, “held” can be refined into held / processed / validated — a third toggle showing not just “do we have the tile” but “have we finished processing it”.

Gap — derived, future-flagged tasking

Computed cell-wise from the layers above, no new I/O:

deficit            = max(0, desired_clear_obs − held_clear_count)
staleness          = days_since_last_held                     # recency pressure
unmet_supply       = cloud_free_scene_count − held_clear_count  # exists-but-unowned
priority_score     = w_d·norm(deficit)
                   + w_s·norm(staleness)
                   + w_u·norm(unmet_supply)
                   + w_r·revisit_need(aoi)                    # user-supplied weight
taskable           = SENSOR_TASKABILITY[sensor]               # bool, mostly False
Desired cadence — per sensor

The deficit needs a target to subtract from, and a useful cadence differs by an order of magnitude across sensors (MODIS is ~daily, Landsat is ~biweekly). So the target is per sensor, expressed as desired clear observations per month:

Why two apps (the 2 km question)

A single location of interest is ~2 km; a global heatmap cell is ~11 km (0.1°). These are different scales, not a single tunable:

So v4 ships two surfaces, mirroring the v2 (global) / v2.5 (per-AOI) split that already exists:

App A — Global Coverage Heatmap (the overview)

App B — AOI Coverage Drill-down (the detail)

A nice-to-have bridge: clicking a cell in App A pre-fills App B’s AOI.

Heatmap metric menu

Per cell × sensor × time-bin. The dropdown is grouped by layer:

LayerMetrics
Availableoverpasses (theoretical), scenes_count, mean_scene_cloud_pct, cloud_free_scene_count, clear_fraction (pixel), pixel_max_gap_days
Acquiredheld_count, held_clear_count, days_since_last_held, held_max_gap_days
Gapdeficit, unmet_supply, priority_score, taskable (mask)

Diverging colormaps for difference/gap metrics; sequential for counts; “days since” gets a perceptually-reversed ramp (fresh = cool, stale = hot).

Time-window aggregation

When the time slider spans several monthly bins, the cell value must be reduced over the window. This is an explicit aggregation selector, not a hidden choice:

ModeCell valueBest forCaveat
Rate (mean/month)average per-month over the windowcomparing coverage cadence; comparable across window sizeshides within-window variation
Total (sum)summed count over the window“how much did we get in total”a longer window always looks bigger
Most-recentvalue of the latest bin (or days since last)freshness / “is it stale”ignores history
Worst month (min/max)the limiting bin (fewest clear obs / longest gap)gap & tasking — the worst month drives the decisionpessimistic by design
Scrub / animateno reduction — step through binsseeing seasonalitynot a single map

Smart per-metric defaults so the selector rarely needs touching: availability counts → Rate; days_since_last_heldMost-recent; deficit / priority_scoreWorst month. The user can override.

Data model

Extends the shared Zarr (README) with two new band-groups. Same dims (sensor, time, lat, lon), same partial-population contract (the UI reads whatever bands exist):

# v4 — acquired (our holdings, gridded from the holdings catalog)
held_count             (sensor, time, lat, lon) int16
held_clear_count       (sensor, time, lat, lon) int16
days_since_last_held   (sensor, time, lat, lon) float32
held_max_gap_days      (sensor, time, lat, lon) float32

# v4 — gap (derived; can be recomputed on read, or materialised)
deficit                (sensor, time, lat, lon) int16
unmet_supply           (sensor, time, lat, lon) int16
priority_score         (sensor, time, lat, lon) float32

Holdings source — an external PostGIS table

The holdings layer is not a new catalog and not part of this repo — it is an existing, externally-maintained PostGIS table of tiles we own, kept in a separate private repo/service. v4 just reads it over a standard PostgreSQL/PostGIS connection. The only columns it needs (mapped from whatever the table actually calls them):

holdings (PostgreSQL + PostGIS): one row per owned tile
  datetime   : timestamp        # acquisition time   -> time binning
  sensor     : text             # platform/sensor    -> per-sensor held bands
  geometry   : geometry(4326)   # footprint          -> grid cells
  cloud      : numeric|null     # cloud %            -> clear/cloudy split
  (optional)                    # a processing-status column -> held/processed/validated

Access — credentials and the column mapping live in a gitignored .env (never committed), read by a generic reader. Nothing project-specific is imported; it’s a plain SQLAlchemy + GeoPandas read:

# .env (gitignored): COVERAGE_DB_* for the connection, COVERAGE_TILES_* to map
# the generic column names above to the table's real columns.
from satellite_climatology.holdings import fetch_holdings   # generic PostGIS reader
gdf = fetch_holdings(aoi=aoi, start=t0, end=t1)
# -> GeoDataFrame[datetime, sensor, cloud, geometry]

Gridding the holdings into held_* bands reuses v2’s exact footprint_to_cells reducer (or the precomputed cell/H3 index if the table already carries one). No ingest to write — the source of truth already exists and is maintained outside this project.

Note: where a held sensor also has a public Available source (e.g. EMIT via NASA CMR, EnMAP/Landsat via STAC), v4 shows both layers and can compute the gap; sensors present on only one side show only that layer.

Precompute pipeline (geocatalog → DuckDB → Zarr)

The dashboard never hits a STAC API live — App A reads only the precomputed Zarr. The build pipeline maps onto geocatalog’s Source → Bundle → Catalog → GeoSlice flow using its real API:

  1. Scan → GeoParquet. geocatalog.from_stac_search(...) queries each sensor’s STAC endpoint over the build window into a GeoCatalog (one CatalogRow per item: footprint + interval + eo:cloud_cover + asset hrefs), persisted with to_geoparquet(...) partitioned by (sensor, year, month) so the DuckDB backend prunes.
  2. Aggregate in DuckDB. Open the GeoParquet as a DuckDBGeoCatalog and run the gridding + stats as DuckDB SQL (spatial extension does the footprint→cell join; GROUP BY (sensor, time_bin, cell)) → scenes_count, mean_scene_cloud_pct, cloud_free_scene_count. The held_* bands come from the external holdings table instead: holdings.fetch_holdings(...) returns the owned tiles (footprint + datetime + sensor + cloud), fed through the same footprint_to_cells reducer. DuckDB is out-of-core, so a sensor-year stays laptop-tractable.
  3. Write Zarr. Pivot the aggregated table into the dense (sensor, time, lat, lon) arrays (via geocatalog’s xarray_backendxr.Dataset) and write the shared Zarr.
  4. Gap is cheap. deficit / unmet_supply / priority_score derive from the bands above — materialise in the same Zarr or compute on read.

App B’s per-AOI discovery is the same library queried live instead of precomputed: query(catalog, bounds=aoi.bounds, time=(t0, t1)) (equivalently satellite_viewer.search, the v0→v2.5 contract) feeding the v2.5 windowed-QA path. See v4_coverage_planner_api.md for concrete signatures + demos.

Heavy lifting stays in columnar/SQL land (DuckDB, out-of-core); Zarr holds only the final dense product the UI streams. v1 overpass and v3 pixel bands write into the same Zarr by their own paths — v4 owns the v2 scene-level + holdings bands.

projects/satellite_climatology/
└── src/satellite_climatology/
    ├── grid.py            # (shared v1+) global grid + footprint_to_cells
    ├── sensors.py         # (shared)
    ├── catalog.py         # (v2) availability DuckDBGeoCatalog scan
    ├── holdings.py        # (v4 NEW) generic external PostGIS reader (.env creds) + grid
    ├── coverage.py        # (v4 NEW) available/acquired/gap band assembly
    ├── tasking.py         # (v4 NEW, stub) taskability table + request hook
    └── apps/
        ├── global_heatmap_{panel,streamlit}.py   # App A
        └── aoi_drilldown_{panel,streamlit}.py    # App B (reuses satellite_viewer.search)

Rationale for living here rather than a standalone project or a satellite_viewer tab:

Library + thin UIs (Panel / Streamlit / notebook) matches how satellite_viewer is already structured.

External dependencies: geocatalog (availability scan, GeoParquet, DuckDB, Zarr write) and a standard PostgreSQL/PostGIS client (SQLAlchemy + GeoPandas) for the holdings table. The holdings DB is external and private — its credentials and column mapping come from a gitignored .env (never committed). holdings.py isolates it behind a small interface (fetch_holdings(bbox|aoi, start, end) -> GeoDataFrame), so the apps stay generic and the holdings layer degrades gracefully to “off” when the DB is unconfigured or unreachable.

Tasking (future-flagged)

First, a distinction the Gap layer depends on: “request” means two different actions. If a scene already exists over a target but we don’t hold it (unmet_supply > 0), the action is ingest — download + process it (the available-but-unheld backlog, which applies to every sensor with an Available source, Landsat and Sentinel-2 included). Only when no scene exists does tasking — commanding a new acquisition — come into play, and only for pointable instruments. See v4_coverage_planner_api.md §“Request semantics”.

Public systematic sensors — Sentinel-2, Landsat, MODIS, VIIRS — cannot be tasked; they image on a fixed schedule (so their gap is closed by ingest, never tasking). Tasking only applies to:

So in v4 the tasking layer is designed but inert:

This keeps the “what to request” story coherent end-to-end without pretending we can task Sentinel-2.

Compute & scale

Milestones (suggested)

  1. M4.0 — Available-only global heatmap. App A over the v2 Zarr (single sensor, 1 year, 0.1°). Just the supply layer. Validates the tile-rendering + time-slider + AOI-clip UI.
  2. M4.1 — Holdings from the external DB. holdings.py reads the external PostGIS table (creds via gitignored .env); grid owned tiles into held_* bands. Add the Acquired layer (both sub-layers
    • the held/processed/validated toggle) to App A.
  3. M4.2 — Gap layer. coverage.py deficit/priority; the desired knob and weight sliders; diverging colormaps.
  4. M4.3 — App B (AOI drill-down). Reuse satellite_viewer.search + v2.5 path; per-AOI time series of all three layers; cell→AOI handoff from App A.
  5. M4.4 — Tasking hook (inert). SENSOR_TASKABILITY, the taskable mask, the flag-for-request table, the stubbed submit_request.
  6. M4.5 — v3 bands wired in as they complete (no new UI work).

Risks & open questions

Out of scope (v4)