v4 — Coverage planner (available / acquired / gap)
v4 — Coverage planner¶
A heatmap dashboard that overlays three views of the same global grid — what imagery is available, what we have acquired, and where the gap is (and, eventually, what to task) — sliceable by sensor, time window, and AOI, viewable globally or zoomed to a single location.
This is the planning/operations layer on top of the
satellite_climatology substrate. v1–v3 answer “how
observable is each pixel?”; v4 reframes those numbers as a coverage
ledger: availability minus our holdings equals the gap we might act on.
Questions answered¶
- “Across the globe, where is imagery available for this sensor and time window?” (and how cloud-free / how frequent)
- “Where have we actually acquired data — and where are our holdings thin or stale?”
- “Where is the gap largest, and (future) where could we task a satellite to close it?”
- All three, clipped to an AOI and constrained to a date range, with a global view as the default.
The three layers¶
The dashboard is organised as three toggleable layer-groups over the same
(sensor, time, lat, lon) grid. The user picks layer → metric →
sensor → time window, then global or AOI.
| Layer | Meaning | Built from | Status |
|---|---|---|---|
| 🟦 Available | What could / did exist to be observed | v1 overpasses + v2 scene scan + v3 clear-fraction | reuses v1–v3 |
| 🟩 Acquired | What we hold, as two sub-layers (see below) | catalog scans | new in v4 |
| 🟥 Gap | Available − Acquired, weighted into a priority | derived | new in v4 |
Available (reuses v1–v3)¶
No new compute — these are the bands v1/v2/v3 already write:
overpasses, scenes_count, mean_scene_cloud_pct,
cloud_free_scene_count, clear_fraction, … The Available layer is a
re-labelling of the climatology product as “supply”.
Acquired — two sub-layers¶
Per the design decision, “what we’ve seen” is tracked as two distinct, independently toggleable sub-layers:
- Public clear observations — the cloud-free usable supply from the
public catalog (i.e. v2’s
cloud_free_scene_count/ v3’sclear_obs_count). This is “what anyone could have gotten.” - Our holdings — what this project has actually ingested, sourced
from an external PostGIS holdings table (see Holdings source
below). It is an already-populated catalog of owned tiles with footprint,
acquisition date, sensor, cloud %, and (optionally) processing status — so
we query it, we don’t rebuild it. Gridded into the same cells to produce
held_count,held_clear_count,days_since_last_held,held_max_gap_days.
Keeping them separate lets the map distinguish “the data exists and is clear, we just don’t have it” (a pure download decision) from “no clear data exists at all” (a tasking decision).
If the holdings table carries a processing-status column, “held” can be refined into held / processed / validated — a third toggle showing not just “do we have the tile” but “have we finished processing it”.
Gap — derived, future-flagged tasking¶
Computed cell-wise from the layers above, no new I/O:
deficit = max(0, desired_clear_obs − held_clear_count)
staleness = days_since_last_held # recency pressure
unmet_supply = cloud_free_scene_count − held_clear_count # exists-but-unowned
priority_score = w_d·norm(deficit)
+ w_s·norm(staleness)
+ w_u·norm(unmet_supply)
+ w_r·revisit_need(aoi) # user-supplied weight
taskable = SENSOR_TASKABILITY[sensor] # bool, mostly Falsedesired_clear_obsis a per-sensor target (clear looks / month), adjustable per selected sensor — see Desired cadence below. The deficit is relative to it.- Tasking is future-flagged, not built. The schema carries a
taskableflag and the UI shows a “request” affordance on high-priority cells, but it is a no-op placeholder. See Tasking below.
Desired cadence — per sensor¶
The deficit needs a target to subtract from, and a useful cadence differs by an order of magnitude across sensors (MODIS is ~daily, Landsat is ~biweekly). So the target is per sensor, expressed as desired clear observations per month:
- Each selected sensor gets its own compact target input — you only see a control for a sensor you’ve chosen. Pick 2 sensors → 2 inputs. (This is the right read of the “won’t that be a lot of sliders?” worry: the count is bounded by your sensor selection, not the full registry.)
- Each is pre-filled from the sensor’s nominal revisit (e.g. S2 ≈ 6/mo, Landsat ≈ 2/mo, MODIS ≈ 30/mo), so the defaults already yield a meaningful gap and you rarely touch them.
- The gap is computed per sensor against its own target. A combined “any clear look” view (sum of held vs. sum of targets, or a single cross-sensor target) is an optional toggle.
Why two apps (the 2 km question)¶
A single location of interest is ~2 km; a global heatmap cell is ~11 km (0.1°). These are different scales, not a single tunable:
- 2 km globally = ~162 M cells — heavy to compute and invisible when zoomed to the whole Earth.
- 0.1° globally = 6.48 M cells — the existing shared grid; fast; the right granularity for a world map.
So v4 ships two surfaces, mirroring the v2 (global) / v2.5 (per-AOI) split that already exists:
App A — Global Coverage Heatmap (the overview)¶
- Reads the precomputed Zarr at the shared 0.1° grid (H3 res-5 as an equal-area alternative — see v2).
- Controls: layer-group, metric, sensor(s), time-window slider, global vs. AOI clip.
- Renders as raster/choropleth tile layers on the basemap (the satellite_viewer basemap switcher carries straight over).
- Compute: v2 scene-level scan for the MVP (laptop-tractable), with v1 overpasses as the cheap theoretical-available underlay. v3 pixel-level bands appear in the same dropdowns as they get computed.
App B — AOI Coverage Drill-down (the detail)¶
- On-demand for a single AOI (up to the ~50 km cap from v2.5), at native / pixel resolution (down to ~2 km cells or true per-pixel).
- Reuses
satellite_viewer.search()verbatim for discovery (the v0→v2.5 contract) and the v2.5 windowed-QA path for true clear-fraction. - Renders the per-AOI time series of available / acquired / gap, plus a fine-grained local heatmap and the per-scene table.
- This is where the 2 km scale lives and makes sense.
A nice-to-have bridge: clicking a cell in App A pre-fills App B’s AOI.
Heatmap metric menu¶
Per cell × sensor × time-bin. The dropdown is grouped by layer:
| Layer | Metrics |
|---|---|
| Available | overpasses (theoretical), scenes_count, mean_scene_cloud_pct, cloud_free_scene_count, clear_fraction (pixel), pixel_max_gap_days |
| Acquired | held_count, held_clear_count, days_since_last_held, held_max_gap_days |
| Gap | deficit, unmet_supply, priority_score, taskable (mask) |
Diverging colormaps for difference/gap metrics; sequential for counts; “days since” gets a perceptually-reversed ramp (fresh = cool, stale = hot).
Time-window aggregation¶
When the time slider spans several monthly bins, the cell value must be reduced over the window. This is an explicit aggregation selector, not a hidden choice:
| Mode | Cell value | Best for | Caveat |
|---|---|---|---|
| Rate (mean/month) ⭐ | average per-month over the window | comparing coverage cadence; comparable across window sizes | hides within-window variation |
| Total (sum) | summed count over the window | “how much did we get in total” | a longer window always looks bigger |
| Most-recent | value of the latest bin (or days since last) | freshness / “is it stale” | ignores history |
| Worst month (min/max) | the limiting bin (fewest clear obs / longest gap) | gap & tasking — the worst month drives the decision | pessimistic by design |
| Scrub / animate | no reduction — step through bins | seeing seasonality | not a single map |
Smart per-metric defaults so the selector rarely needs touching:
availability counts → Rate; days_since_last_held → Most-recent;
deficit / priority_score → Worst month. The user can override.
Data model¶
Extends the shared Zarr (README) with two new band-groups.
Same dims (sensor, time, lat, lon), same partial-population contract (the
UI reads whatever bands exist):
# v4 — acquired (our holdings, gridded from the holdings catalog)
held_count (sensor, time, lat, lon) int16
held_clear_count (sensor, time, lat, lon) int16
days_since_last_held (sensor, time, lat, lon) float32
held_max_gap_days (sensor, time, lat, lon) float32
# v4 — gap (derived; can be recomputed on read, or materialised)
deficit (sensor, time, lat, lon) int16
unmet_supply (sensor, time, lat, lon) int16
priority_score (sensor, time, lat, lon) float32Holdings source — an external PostGIS table¶
The holdings layer is not a new catalog and not part of this repo — it is an existing, externally-maintained PostGIS table of tiles we own, kept in a separate private repo/service. v4 just reads it over a standard PostgreSQL/PostGIS connection. The only columns it needs (mapped from whatever the table actually calls them):
holdings (PostgreSQL + PostGIS): one row per owned tile
datetime : timestamp # acquisition time -> time binning
sensor : text # platform/sensor -> per-sensor held bands
geometry : geometry(4326) # footprint -> grid cells
cloud : numeric|null # cloud % -> clear/cloudy split
(optional) # a processing-status column -> held/processed/validatedAccess — credentials and the column mapping live in a gitignored .env
(never committed), read by a generic reader. Nothing project-specific is
imported; it’s a plain SQLAlchemy + GeoPandas read:
# .env (gitignored): COVERAGE_DB_* for the connection, COVERAGE_TILES_* to map
# the generic column names above to the table's real columns.
from satellite_climatology.holdings import fetch_holdings # generic PostGIS reader
gdf = fetch_holdings(aoi=aoi, start=t0, end=t1)
# -> GeoDataFrame[datetime, sensor, cloud, geometry]Gridding the holdings into held_* bands reuses v2’s exact
footprint_to_cells reducer (or the precomputed cell/H3 index if the table
already carries one). No ingest to write — the source of truth already
exists and is maintained outside this project.
Note: where a held sensor also has a public Available source (e.g. EMIT via NASA CMR, EnMAP/Landsat via STAC), v4 shows both layers and can compute the gap; sensors present on only one side show only that layer.
Precompute pipeline (geocatalog → DuckDB → Zarr)¶
The dashboard never hits a STAC API live — App A reads only the
precomputed Zarr. The build pipeline maps onto geocatalog’s
Source → Bundle → Catalog → GeoSlice flow using its real API:
- Scan → GeoParquet.
geocatalog.from_stac_search(...)queries each sensor’s STAC endpoint over the build window into aGeoCatalog(oneCatalogRowper item: footprint + interval +eo:cloud_cover+ asset hrefs), persisted withto_geoparquet(...)partitioned by(sensor, year, month)so the DuckDB backend prunes. - Aggregate in DuckDB. Open the GeoParquet as a
DuckDBGeoCatalogand run the gridding + stats as DuckDB SQL (spatial extension does the footprint→cell join;GROUP BY (sensor, time_bin, cell)) →scenes_count,mean_scene_cloud_pct,cloud_free_scene_count. Theheld_*bands come from the external holdings table instead:holdings.fetch_holdings(...)returns the owned tiles (footprint + datetime + sensor + cloud), fed through the samefootprint_to_cellsreducer. DuckDB is out-of-core, so a sensor-year stays laptop-tractable. - Write Zarr. Pivot the aggregated table into the dense
(sensor, time, lat, lon)arrays (via geocatalog’sxarray_backend→xr.Dataset) and write the shared Zarr. - Gap is cheap.
deficit/unmet_supply/priority_scorederive from the bands above — materialise in the same Zarr or compute on read.
App B’s per-AOI discovery is the same library queried live instead of
precomputed: query(catalog, bounds=aoi.bounds, time=(t0, t1))
(equivalently satellite_viewer.search, the v0→v2.5 contract) feeding the
v2.5 windowed-QA path. See v4_coverage_planner_api.md
for concrete signatures + demos.
Heavy lifting stays in columnar/SQL land (DuckDB, out-of-core); Zarr holds only the final dense product the UI streams. v1 overpass and v3 pixel bands write into the same Zarr by their own paths — v4 owns the v2 scene-level + holdings bands.
Architecture (recommended: inside satellite_climatology)¶
projects/satellite_climatology/
└── src/satellite_climatology/
├── grid.py # (shared v1+) global grid + footprint_to_cells
├── sensors.py # (shared)
├── catalog.py # (v2) availability DuckDBGeoCatalog scan
├── holdings.py # (v4 NEW) generic external PostGIS reader (.env creds) + grid
├── coverage.py # (v4 NEW) available/acquired/gap band assembly
├── tasking.py # (v4 NEW, stub) taskability table + request hook
└── apps/
├── global_heatmap_{panel,streamlit}.py # App A
└── aoi_drilldown_{panel,streamlit}.py # App B (reuses satellite_viewer.search)Rationale for living here rather than a standalone project or a satellite_viewer tab:
- The grid, Zarr schema, sensor list, time-binning, and
geocatalogscan are already designed for this project — v4 adds bands and an ingest, not a new substrate. - The v1/v2/v3 dashboard is already milestone M8; App A is that dashboard with the Acquired/Gap layers added.
- App B reuses
satellite_viewer.search— the v0→v2.5 contract the docs already commit to — so it doesn’t fork discovery logic.
Library + thin UIs (Panel / Streamlit / notebook) matches how
satellite_viewer is already structured.
External dependencies: geocatalog (availability scan, GeoParquet,
DuckDB, Zarr write) and a standard PostgreSQL/PostGIS client (SQLAlchemy +
GeoPandas) for the holdings table. The holdings DB is external and
private — its credentials and column mapping come from a gitignored .env
(never committed). holdings.py isolates it behind a small interface
(fetch_holdings(bbox|aoi, start, end) -> GeoDataFrame), so the apps stay
generic and the holdings layer degrades gracefully to “off” when the DB is
unconfigured or unreachable.
Tasking (future-flagged)¶
First, a distinction the Gap layer depends on: “request” means two
different actions. If a scene already exists over a target but we don’t
hold it (unmet_supply > 0), the action is ingest — download + process
it (the available-but-unheld backlog, which applies to every sensor with
an Available source, Landsat and Sentinel-2 included). Only when no scene
exists does tasking — commanding a new
acquisition — come into play, and only for pointable instruments. See
v4_coverage_planner_api.md §“Request
semantics”.
Public systematic sensors — Sentinel-2, Landsat, MODIS, VIIRS — cannot be tasked; they image on a fixed schedule (so their gap is closed by ingest, never tasking). Tasking only applies to:
- Commercial constellations (Planet, Maxar/SecureWatch, Capella SAR, ICEYE, Airbus) via their ordering/tasking APIs.
- A few targeted instruments (e.g. EMIT’s target-request list).
So in v4 the tasking layer is designed but inert:
SENSOR_TASKABILITYmaps each sensor → bool; the Gap layer’staskablemask greys out cells where tasking is impossible (most of them).- The UI exposes a “flag for request” button on high-
priority_scorecells/AOIs that currently just records the intent to a local table. tasking.pydefines thesubmit_request(provider, geometry, window)signature with aNotImplementedErrorbody, so wiring a real provider later is a localised change, not a redesign.
This keeps the “what to request” story coherent end-to-end without pretending we can task Sentinel-2.
Compute & scale¶
- App A MVP = v2 scene-level scan. Per v2’s budget this is a catalog query + grid reduce — minutes-to-hours for a year of one sensor, laptop-tractable; no pixel reads.
- v1 overpasses = ~100-line skyfield script, negligible cost; gives the theoretical-available underlay immediately.
- Holdings gridding = trivial (we own ≪ the public catalog).
- v3 pixel-level global = the cluster job from v3; appears in App A’s dropdowns as those bands fill in. Not on the v4 critical path.
- App B = on-demand v2.5 windowed reads for one AOI; seconds.
Milestones (suggested)¶
- M4.0 — Available-only global heatmap. App A over the v2 Zarr (single sensor, 1 year, 0.1°). Just the supply layer. Validates the tile-rendering + time-slider + AOI-clip UI.
- M4.1 — Holdings from the external DB.
holdings.pyreads the external PostGIS table (creds via gitignored.env); grid owned tiles intoheld_*bands. Add the Acquired layer (both sub-layers- the held/processed/validated toggle) to App A.
- M4.2 — Gap layer.
coverage.pydeficit/priority; thedesiredknob and weight sliders; diverging colormaps. - M4.3 — App B (AOI drill-down). Reuse
satellite_viewer.search+ v2.5 path; per-AOI time series of all three layers; cell→AOI handoff from App A. - M4.4 — Tasking hook (inert).
SENSOR_TASKABILITY, thetaskablemask, the flag-for-request table, the stubbedsubmit_request. - M4.5 — v3 bands wired in as they complete (no new UI work).
Risks & open questions¶
- Defining “desired” — resolved: per sensor. Per-sensor target (clear obs / month), pre-filled from nominal revisit, one control per selected sensor (see Desired cadence). Per-AOI override is a later refinement.
- Holdings provenance — the external DB is the source of truth. No manual ingest to keep in sync — the Acquired layer reads the externally maintained PostGIS table directly. The residual risk is the opposite: the v4 dashboard is read-only against a DB it doesn’t own, so its freshness tracks that external pipeline.
- Private-data hygiene + DB access. Credentials and the table/column
mapping live only in a gitignored
.env(never committed); the code is a generic PostGIS reader with no private imports. The DB needs network reachability — isolate behindholdings.fetch_holdings(...)so the Acquired layer degrades to “unavailable” (not crash) when it’s unconfigured or unreachable. - Sensor-set mismatch. The holdings sensors may differ from the v1–v3 Available sensors (which lean optical: S2/Landsat/MODIS/VIIRS). The Acquired and Available layers only overlap where a sensor is in both — e.g. EMIT. Document per-sensor which layers exist so the Gap isn’t computed across a mismatched pair.
- Double-counting at tile edges. Inherits v3’s MGRS-overlap semantics — document that overlapping scenes are independent observations.
- Equal-area vs. lat/lon. Coverage counts are biased by cell area in EPSG:4326 (polar cells are tiny). Offer the H3 option for any area-normalised metric.
- Colormap honesty. “Gap” maps can imply urgency where a sensor simply
doesn’t overpass (poles for S2). Always show
taskable/ availability alongside gap so a gap isn’t mistaken for an actionable one. - Time-window semantics — resolved: an aggregation selector. See Time-window aggregation — default is rate (mean/month), with most-recent for freshness and worst-month for gap/priority.
Out of scope (v4)¶
- Real tasking-API integration (future; hook only).
- Automated download orchestration (v4 recommends; it doesn’t fetch).
- Pixel-level global compute (that’s v3; v4 just surfaces it).
- Cost/quota modelling for commercial tasking.
- ML gap-prediction — v4 reports observed/derived gaps, not forecasts.