Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

03 — Operators on Sentinel-2 (Lake Tahoe NDVI)

Composing a Sentinel-2 NDVI pipeline

This notebook is the operator-composition slice of the canonical Lake Tahoe scenario: cloud-free Sentinel-2 NDVI over Lake Tahoe, summer 2024. The end-to-end multi-repo flow (catalog discovery → patching → operators) lives in geocatalog/docs/notebooks/end_to_end_lake_tahoe.ipynb. Here we focus on the geotoolz part: composing named operators from gz.radiometry, gz.indices, gz.cloud, and gz.mask into linear and DAG-shaped pipelines.

Scenario.

AOI bbox (EPSG:4326)(-120.10, 38.92, -119.93, 39.27) (Lake Tahoe main body)
Date range2024-06-01 to 2024-07-15 (peak-green Sierra window)
STACMicrosoft Planetary Computer (planetarycomputer.microsoft.com/api/stac/v1)
Collectionsentinel-2-l2a
MGRS tile10SGJ (fully covers the AOI; the other candidate 11SKD only clips a sliver)
Cloud cover< 5 %

The named operators we use:

StepOperatorFrom
DN → reflectanceDNToReflectance(scale=1e-4)gz.radiometry
Percentile stretch for the RGB previewPercentileClip(p_min=2, p_max=98)gz.radiometry
NDVINDVI(red_idx=2, nir_idx=3)gz.indices
NDWI (water/vegetation complement)NDWI(green_idx=1, nir_idx=3)gz.indices
Cloud-class mask from SCLMaskFromSCL(classes=SCL_CLOUDS_AND_INVALID)gz.cloud
Drop masked pixelsApplyMask(mask=..., fill_value=np.nan)gz.mask

When you need to write a new operator that does not yet live in geotoolz, the Define an operator recipe walks through the contract.

1. Load one Sentinel-2 scene via Microsoft Planetary Computer

We use pystac-client to find one low-cloud scene in the AOI / date range, then rioxarray to load three bands: B04 (Red, 10 m), B08 (NIR, 10 m), and the SCL scene-classification band (20 m, resampled to the 10 m grid with nearest-neighbour). Red + NIR go into one carrier rn_gt (the reflectance pipeline input); SCL goes into a separate carrier scl_gt (the cloud-mask source).

import geotoolz as gz
import matplotlib.pyplot as plt
import numpy as np
import planetary_computer
import pystac_client
import rioxarray
import xarray as xr
from georeader.geotensor import GeoTensor
from geotoolz.cloud import SCL_CLOUDS_AND_INVALID
from rasterio.enums import Resampling
BBOX = (-120.10, 38.92, -119.93, 39.27)  # Lake Tahoe main body
# Peak-green window: post-snowmelt, pre-summer-drought. September Sierra
# scenes are technically cloud-free too but the vegetation is heat-
# stressed and NDVI plots wash out into orange.
DATE_RANGE = "2024-06-01/2024-07-15"
TARGET_TILE = "10SGJ"  # fully covers the AOI

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)
search = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=BBOX,
    datetime=DATE_RANGE,
    query={
        "eo:cloud_cover": {"lt": 5},
        "s2:mgrs_tile": {"eq": TARGET_TILE},
    },
)
items = sorted(search.items(), key=lambda x: x.properties["eo:cloud_cover"])
print(f"found {len(items)} candidate items on tile {TARGET_TILE}")
item = items[0]
print(f"using {item.id}  cloud_cover={item.properties['eo:cloud_cover']:.1f}%")
found 17 candidate items on tile 10SGJ
using S2B_MSIL2A_20240614T183919_R070_T10SGJ_20240615T003207  cloud_cover=0.0%
def _load(
    asset_key: str,
    *,
    resample_to: xr.DataArray | None = None,
    resampling: Resampling = Resampling.bilinear,
) -> xr.DataArray:
    href = item.assets[asset_key].href
    da = rioxarray.open_rasterio(href, masked=False).squeeze("band", drop=True)
    da = da.rio.clip_box(*BBOX, crs="EPSG:4326")
    if resample_to is not None:
        da = da.rio.reproject_match(resample_to, resampling=resampling)
    return da


red = _load("B04")
green = _load("B03", resample_to=red)
blue = _load("B02", resample_to=red)
nir = _load("B08", resample_to=red)
scl = _load("SCL", resample_to=red, resampling=Resampling.nearest)

# Four-band reflectance carrier — B (idx 0), G (1), R (2), NIR (3) —
# uint16 DN values. Standard BGRN stacking; B04 → idx 2, B08 → idx 3.
bgrn_arr = np.stack([blue.values, green.values, red.values, nir.values], axis=0).astype(
    "uint16"
)
bgrn_gt = GeoTensor(
    values=bgrn_arr,
    transform=red.rio.transform(),
    crs=red.rio.crs,
    fill_value_default=0,
)

# Separate SCL carrier (1, H, W).
scl_gt = GeoTensor(
    values=scl.values[None, ...].astype("uint8"),
    transform=scl.rio.transform(),
    crs=scl.rio.crs,
    fill_value_default=0,
)

print(f"bgrn_gt: shape={bgrn_gt.shape}  dtype={bgrn_gt.dtype}")
print(f"scl_gt:  shape={scl_gt.shape}  dtype={scl_gt.dtype}")
bgrn_gt: shape=(4, 3935, 1599)  dtype=uint16
scl_gt:  shape=(1, 3935, 1599)  dtype=uint8

2. True-color reference

Before running the index, plot a percentile-stretched true-color composite so we know what the scene looks like. gz.radiometry.PercentileClip is the named op for the stretch; np.transpose reorders (C, H, W) → (H, W, C) for matplotlib.

rgb_pipe = gz.radiometry.DNToReflectance(scale=1e-4) | gz.radiometry.PercentileClip(
    p_min=2.0, p_max=98.0
)
rgb_full = rgb_pipe(bgrn_gt)
rgb_arr = np.asarray(rgb_full)[[2, 1, 0]]  # R, G, B from B-G-R-NIR

fig, ax = plt.subplots(figsize=(7, 8))
ax.imshow(np.transpose(rgb_arr, (1, 2, 0)))
ax.set_title(f"Lake Tahoe true-color (B04, B03, B02) — {item.id[:32]}…")
ax.axis("off")
plt.show()
<Figure size 700x800 with 1 Axes>

3. Compose NDVI with Sequential

The simplest possible shape — DN → reflectance, then NDVI. Two named ops, one |. BGRN stack → red is at idx 2, NIR at idx 3.

pipe = gz.radiometry.DNToReflectance(scale=1e-4) | gz.indices.NDVI(red_idx=2, nir_idx=3)
print(pipe)
ndvi_gt = pipe(bgrn_gt)
print("ndvi shape:", ndvi_gt.shape)
print(
    "ndvi range:",
    float(np.nanmin(np.asarray(ndvi_gt))),
    float(np.nanmax(np.asarray(ndvi_gt))),
)
Sequential([DNToReflectance(scale=0.0001, offset=0.0, axis=0), NDVI(nir_idx=3, red_idx=2, axis=0, eps=1e-10)])
ndvi shape: (3935, 1599)
ndvi range: -0.9999999992647058 0.999999999057493

4. Visualise the raw NDVI

Standard NDVI ramp. We use RdYlGn clipped at vmin=-0.4, vmax=0.9 so water shows red, conifer forest shows green, and the dry granite / scrub middle ground reads as a transition rather than alarming orange.

fig, ax = plt.subplots(figsize=(7, 8))
im = ax.imshow(np.asarray(ndvi_gt), cmap="RdYlGn", vmin=-0.4, vmax=0.9)
ax.set_title("NDVI (raw, no cloud mask)")
ax.axis("off")
fig.colorbar(im, ax=ax, shrink=0.7, label="NDVI")
plt.show()
<Figure size 700x800 with 2 Axes>

5. Cloud-aware variant

We extract a “drop these pixels” mask from SCL using gz.cloud.MaskFromSCL, then attach gz.mask.ApplyMask at the end of the same Sequential. The mask is True on cloud + invalid pixels; ApplyMask follows the library convention (“True = drop”) so the call is just ApplyMask(mask=cloud_mask, fill_value=np.nan).

cloud_op = gz.cloud.MaskFromSCL(
    band_idx=0,
    classes=SCL_CLOUDS_AND_INVALID,
)
cloud_mask = cloud_op(scl_gt)
print("cloud_mask shape:", cloud_mask.shape, "dtype:", cloud_mask.dtype)
print(f"fraction masked: {float(np.mean(np.asarray(cloud_mask))):.3f}")

clean_pipe = (
    gz.radiometry.DNToReflectance(scale=1e-4)
    | gz.indices.NDVI(red_idx=2, nir_idx=3)
    | gz.mask.ApplyMask(mask=cloud_mask, fill_value=float("nan"))
)
print(clean_pipe)
ndvi_clean = clean_pipe(bgrn_gt)
print(
    "clean ndvi range:",
    float(np.nanmin(np.asarray(ndvi_clean))),
    float(np.nanmax(np.asarray(ndvi_clean))),
)
cloud_mask shape: (3935, 1599) dtype: bool
fraction masked: 0.000
Sequential([DNToReflectance(scale=0.0001, offset=0.0, axis=0), NDVI(nir_idx=3, red_idx=2, axis=0, eps=1e-10), ApplyMask(mask={'type': 'ndarray', 'shape': [3935, 1599], 'dtype': 'bool'}, fill_value=nan, invert=False)])
clean ndvi range: -0.9999999992647058 0.999999999057493
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 8))
ax1.imshow(np.transpose(rgb_arr, (1, 2, 0)))
ax1.set_title("True-color reference")
ax1.axis("off")
im2 = ax2.imshow(np.asarray(ndvi_clean), cmap="RdYlGn", vmin=-0.4, vmax=0.9)
ax2.set_title("Cloud-masked NDVI")
ax2.axis("off")
fig.colorbar(im2, ax=ax2, shrink=0.7, label="NDVI")
plt.show()
<Figure size 1300x800 with 3 Axes>

6. Fan-out — multiple indices from one scaled scene

gz.Fanout is sugar over a single-input Graph: one input, many named outputs. Here we compute NDVI and NDWI (the water-detection complement; high over open water, low over vegetation) off the same reflectance carrier — water and vegetation light up in opposite panels.

scaled = gz.radiometry.DNToReflectance(scale=1e-4)(bgrn_gt)
products = gz.Fanout(
    {
        "ndvi": gz.indices.NDVI(red_idx=2, nir_idx=3),
        "ndwi": gz.indices.NDWI(green_idx=1, nir_idx=3),
    }
)(scaled)
print("product keys:", list(products.keys()))
print("ndvi shape:", products["ndvi"].shape)
print("ndwi shape:", products["ndwi"].shape)
product keys: ['ndvi', 'ndwi']
ndvi shape: (3935, 1599)
ndwi shape: (3935, 1599)
fig, axes = plt.subplots(1, 2, figsize=(13, 8))
cfg = [("ndvi", "RdYlGn", (-0.4, 0.9)), ("ndwi", "Blues", (-0.4, 0.6))]
for ax, (name, cmap, (vmin, vmax)) in zip(axes, cfg, strict=True):
    im = ax.imshow(np.asarray(products[name]), cmap=cmap, vmin=vmin, vmax=vmax)
    ax.set_title(name.upper())
    ax.axis("off")
    fig.colorbar(im, ax=ax, shrink=0.7, label=name)
plt.show()
<Figure size 1300x800 with 4 Axes>

7. Observability — inspect, time, assert

Operator is general enough that side-effects round-trip the same as transforms. pipekit ships a small library of identity-with-side- effect ops you compose inline:

PrimitiveRoleRe-exported as
Tap(fn, name=...)callback hook (logging / breakpoints)gz.Tap
ShapeTrace(mode="diff_only")log shape/dtype on changegz.ShapeTrace
Snapshot() + snap.at(key)capture intermediates by keygz.Snapshot
Profile() + prof.wrap(op)per-step wall-clockfrom pipekit import Profile
AssertShape(expected)runtime contract; raises QCErrorfrom pipekit import AssertShape
AssertDType(expected)runtime dtype contractfrom pipekit import AssertDType
Quarantine(check, sentinel)non-raising QC (route bad inputs)from pipekit import Quarantine

None of these change the pipeline output — they just observe or enforce. Drop them in with |, take them out for production.

from pipekit import AssertDType, AssertShape, Profile


snap = gz.Snapshot()
prof = Profile()

# Same NDVI pipeline as §3, but instrumented:
#   - AssertShape pins the input contract (4 bands, any H × W)
#   - Tap logs the dtype on the first call
#   - Snapshot.at captures the reflectance intermediate by key
#   - ShapeTrace prints when shape/dtype changes between steps
#   - Profile.wrap times each operator
#   - AssertDType pins the output contract
trace = gz.ShapeTrace(mode="diff_only", label="ndvi")

observed = (
    AssertShape((4, None, None))
    | gz.Tap(lambda gt: print(f"input dtype: {np.asarray(gt).dtype}"), name="log_dtype")
    | prof.wrap(gz.radiometry.DNToReflectance(scale=1e-4))
    | snap.at("reflectance")
    | trace
    | prof.wrap(gz.indices.NDVI(red_idx=2, nir_idx=3))
    | trace
    | AssertDType("float64")
)
_ = observed(bgrn_gt)

print("\nSnapshot keys:", list(snap.captures))
print(
    "reflectance carrier:",
    snap.captures["reflectance"].shape,
    snap.captures["reflectance"].dtype,
)
print("\nProfile report:")
for name, stats in prof.report().items():
    print(
        f"  {name:20s}  calls={int(stats['calls'])}  mean={stats['mean'] * 1000:6.1f} ms"
    )
input dtype: uint16
[ndvi] shape=(4, 3935, 1599) dtype=float64
[ndvi] shape=(3935, 1599) dtype=float64

Snapshot keys: ['reflectance']
reflectance carrier: (4, 3935, 1599) float64

Profile report:
  DNToReflectance       calls=1  mean=  32.9 ms
  NDVI                  calls=1  mean=  17.2 ms

What you’d reach for next:

  • Quarantine — route bad-cloud-fraction scenes to a sentinel instead of aborting the batch.
  • Cache — memoize by config + input hash for expensive ops (Ledoit-Wolf covariance, learned models).
  • Try / Coalesce / Retry — production control flow for flaky external loaders (STAC, S3).

Each is itself an Operator so the whole stack composes uniformly.

8. When to use which composition primitive

The cells above used Sequential (linear) and Fanout (one input, many outputs). The choice tree for the rest:

You want…Reach forWhy
A linear chain of 2–6 opsSequential (or op_a | op_b)Smallest API surface; flattens nested pipes.
Named intermediates / branching / fan-inGraphSame scene, multiple sub-pipelines, fused at the end.
Runtime two-way fork (e.g. reproject if geographic)BranchPredicate evaluates the whole carrier.
Sensor / product dispatch ('S2' vs 'L8')SwitchFinite named cases; default is Identity.
One input → many named derived productsFanoutSugar over a single-input Graph.

Each of these is itself an Operator, so they nest freely: a Graph inside a Sequential, a Switch whose cases are Sequentials, a Fanout whose values are Graphs.

8. Where next