Geopatcher scaling
Lazy reads, streaming writes, and hierarchical composition under memory pressure
Scaling: Lazy Reads, Streaming Writes, and Hierarchical Composition¶
Overview¶
Global-scale reconstruction blows up memory in five predictable places along the split → operate → merge pipeline. Most have one-line fixes once the framework is wired correctly; the rest need a small set of well-targeted extensions. This section maps the bottlenecks, describes the fixes, identifies which combinations of inputs and outputs are actually streamable, and shows how the framework composes recursively when data doesn’t fit in RAM at any single level.
The five memory pressure points¶
Memory pressure in the split → operate → merge pipeline
┌──────────────────────────────────────────────────────────────────────┐
│ │
│ GlobalField ──▶ anchors ──▶ patches ──▶ outputs ──▶ merge │
│ [1] [2] [3] [4] [5] │
│ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️⚠️⚠️ │
│ │
│ full source anchor patches operator rec + │
│ array in RAM list in list in output list count + │
│ RAM RAM in RAM div (3!) │
│ │
└──────────────────────────────────────────────────────────────────────┘
[1] Lazy backend [2] Sampler is [3,4] Iterator split
(rasterio, dask) Iterable already + streaming flag
✓ free if reader ✓ free 📦 extension
picked correctly
[5] Disk-backed Aggregation
(framework-owned Zarr)
📦 extensionWhat goes where¶
| Pressure point | Solution | Where it lives | Status |
|---|---|---|---|
| [1] Full field in RAM | Lazy backend readers | Field adapter | Already supported |
| [2] Anchor list in RAM | Sampler.anchors() is Iterable | Sampler protocol | Already supported |
| [3] Patch list in RAM | Patcher.split() returns Iterator | Patcher API | Small extension |
| [4] Output list in RAM | Consume + accumulate one at a time | Pattern, not code | Documented |
| [5] Three full-size arrays | Disk-backed Aggregation (flag) | Aggregation impl | Extension |
| Single field too big | Hierarchical Patcher-of-Patchers | Composition pattern | Documented |
Three workflows result: a trivial small-data path (in-RAM, eager), a streaming inference path (lazy reads + iterator split + disk-backed aggregation), and a hierarchical path for problems that don’t fit at any single level.
What the framework gives you for free¶
Two pressure points are already addressed by the design — pick the right backend and respect the iterator nature of Sampler.
Lazy reads from the backend¶
field = RasterioField(open_cog("global_methane.tif")) # nothing read yet
patch = field.select(window) # still nothing read
arr = patch.data.values # ONLY now we read from disk| Backend | Lazy by default? | How to make it lazy |
|---|---|---|
RasterioField (COG) | ✓ | windowed I/O is the API |
RioXarrayField + dask | ✓ | open_rasterio(..., chunks={...}) |
GeoTensorField | ✗ (materialized) | use RasterioField instead |
XarrayField + Zarr | ✓ | open_zarr(...) |
XarrayField + netCDF | ✗ | open_dataset(..., chunks={...}) to dask-back |
GeoPandasField | ✗ (loads full GeoDataFrame) | GeoParquet + row-group filtering |
XvecField | ✓ if underlying xarray is dask-backed | inherits xarray laziness |
Rule of thumb: rasters and gridded data are lazy for free if you pick the right reader. Vectors and points require explicit work.
Sampler anchors are already an iterator¶
class Sampler(Protocol):
def anchors(self, domain, geometry) -> Iterable[Anchor]: ... # not list!RegularStride, Random, and PoissonDisk are all generator-friendly. The protocol already commits to laziness; no extra work needed.
Four extensions¶
1. Patcher.split returns an Iterator¶
A single-line change with system-wide consequences:
@dataclass
class Patcher:
def split(self, field: Field) -> Iterator[Patch]: # was list[Patch]
for a in self.sampler.anchors(field.domain, self.geometry):
idx = self.geometry.neighborhood(field.domain, a)
w = self.window.weights(self.geometry)
yield Patch(data=field.select(idx), anchor=a, indices=idx, weights=w)Materializing remains a one-liner when you want it: list(patcher.split(field)). The default API forces streaming consumption.
2. Disk-backed Aggregation (flag on the existing class)¶
OverlapAdd accumulates in RAM by default. With streaming=True and a target_path, it accumulates into a framework-managed Zarr store on disk. The math is identical; only the storage moves.
@dataclass
class OverlapAdd(Aggregation):
streaming: bool = False
target_path: str | None = None # where the Zarr store goes
chunks: tuple[int, ...] | None = None
streaming_safe: ClassVar[bool] = True
def merge(self, patches: Iterable[Patch], domain: Domain) -> Field:
if self.streaming:
return self._merge_streaming(patches, domain)
return self._merge_in_memory(patches, domain)
def _merge_streaming(self, patches, domain):
rec = zarr.open(f"{self.target_path}/rec.zarr", mode="w",
shape=domain.shape, chunks=self.chunks, dtype="float32")
count = zarr.open(f"{self.target_path}/count.zarr", mode="w",
shape=domain.shape, chunks=self.chunks, dtype="float32")
for p in patches: # one patch at a time
target = _resolve_indexer(p.indices) # Window → slice, etc.
rec[target] += p.data * p.weights
count[target] += p.weights
for ck in iter_chunks(rec): # chunked, in-place division
rec[ck] = np.divide(rec[ck], count[ck],
out=rec[ck], where=count[ck] > 0)
return ZarrField(rec)The flag + framework-managed store address bottlenecks [3], [4], [5] in one mechanism. The user passes a target path and gets back a Field they can read; they don’t open, close, or clean up anything. Memmap is a drop-in zero-dependency alternative (zarr.open → np.memmap); the interface is identical.
3. Hierarchical reconstruction (Patcher-of-Patchers)¶
When even one super-tile is larger than convenient, the framework composes recursively. No new class — just two Patcher instances at different scales:
GlobalField (petabyte-scale, lazy on disk)
│
▼
┌─────────────────────────┐
│ Outer Patcher │ coarse super-tiles (e.g. 4×4 of globe)
│ geometry = (4096²) │
│ sampler = RegStride │
│ aggregation = None │ (write per super-tile; no global merge)
└────────────┬────────────┘
│ for each super-tile (streaming):
▼
┌──────────┐ ◄── one super-tile loaded lazily, fits in RAM
│ Super- │
│ Tile │
└────┬─────┘
│
▼
┌─────────────────────────┐
│ Inner Patcher │ fine patches within super-tile
│ geometry = (256²) │
│ sampler = RegStride │
│ aggregation = Overlap │ in-RAM accumulator, super-tile-sized
│ Add() │
└────────────┬────────────┘
│ for each patch (streaming):
▼
┌──────────┐
│ Patch │ ──▶ operator ──▶ result
└──────────┘
│
┌──────────────────────────────────┘
▼
┌──────────────────────┐
│ Inner OverlapAdd │ in-RAM, super-tile shaped
└─────────┬────────────┘
│
▼
write to global Zarr at super-tile offset, free RAM, next tile
│
▼
┌──────────────────────┐
│ Zarr store │ full global field, on disk
└──────────────────────┘User code is ~20 lines:
outer = Patcher(geometry=Rectangular((4096, 4096)),
sampler=RegularStride((4096, 4096)),
window=Boxcar(), aggregation=None)
inner = Patcher(geometry=Rectangular((256, 256)),
sampler=RegularStride((192, 192)),
window=Hann(), aggregation=OverlapAdd()) # in-RAM
global_store = zarr.open("global.zarr", mode="w", shape=field.shape, ...)
for super_tile in outer.split(field):
super_acc = inner.merge(
(operator(p) for p in inner.split(super_tile.data)),
super_tile.indices,
) # super-tile-sized
global_store[_resolve_indexer(super_tile.indices)] = super_acc.values
del super_acc # free for next tileThe recursion uses only existing pieces — Patcher, split, merge, Field. The pattern is documented; no HierarchicalPatcher class is introduced. If real workflows show a consistent shape across 2–3 use cases, abstract later.
Coarse-to-fine reconstruction (e.g. Multi-Stage Progressive Image Restoration, CVPR multi-scale reconstruction) is a related but distinct idea: a single field reconstructed at multiple resolutions, with each stage’s output feeding the next. That’s an operator pattern, not a Patcher pattern. The framework supports it (different Rectangular sizes per stage), but the staging is orchestrated by the operator, not by Patcher composition.
4. streaming_safe property on Aggregation¶
Not every aggregation can be done one patch at a time. The next subsection explains why — but the property itself is simple:
class Aggregation:
streaming_safe: ClassVar[bool] = False # conservative default
class OverlapAdd(Aggregation):
streaming_safe = TrueWhen a user sets streaming=True on an aggregation with streaming_safe = False, the framework emits a warning rather than raising. The warning explains the issue and points to two-pass or approximation alternatives. Hard-erroring is too aggressive for a property that can depend on use case; silent failure is exactly what this property is designed to prevent.
The streaming asymmetry: it’s about the output, not the input¶
The previous extensions raise a natural question — what makes a workflow streamable in the first place? The answer is a single key insight that’s worth pulling out explicitly:
Streaming is easy when the output address space is preallocatable, and hard when it isn’t.
The input domain doesn’t matter — raster, grid, points, polygons all stream fine, because each patch’s input is independently readable through Field.select. The asymmetry is on the output side: the Aggregation has to write each patch’s contribution somewhere, and “somewhere” needs to exist before the patch arrives.
ANY input Patcher OUTPUT structure
──────── ─────── ────────────────
raster ┌────────────────┐
grid ──▶ split ──▶ ──▶ │ PREALLOCATABLE │ ─▶ STREAMS ✓
points + op │ • regular │ (write to
polygons │ grid │ known
│ • fixed key │ address)
(lazy reads │ set │
are free) │ • known │
│ query pts │
└────────────────┘
┌────────────────┐
│ NON-PREALLOC. │ ─▶ DOES NOT ✗
│ • dynamic │ (no address
│ points │ to write
│ • growing │ to)
│ graph │
│ • locations │
│ self- │
│ determined │
└────────────────┘
The bottleneck lives on the OUTPUT side.The matrix of common cases¶
| Input domain | Output domain | Streaming? | Aggregation |
|---|---|---|---|
| Raster | Raster | ✓ | OverlapAdd |
| Raster | Polygons (fixed) | ✓ | ByIndex |
| Raster | Points (fixed query set) | ✓ | ByIndex |
| Points (radius/knn) | Raster | ✓ | OverlapAdd |
| Points | Points (fixed set) | ✓ | ByIndex |
| Points | Points (dynamic / growing) | ✗ | — |
| Anything | Anything (dynamic) | ✗ | — |
The bold row is worth flagging — predicting from scattered observations (Argo, in-situ stations, swath samples) onto a regular grid is one of the most common geoscience workflows, and it streams cleanly. The radius/knn part is about how the input is gathered per patch; it doesn’t constrain the output.
This is exactly what the streaming_safe flag captures. It’s really asking “is this aggregation’s output address space preallocatable?” OverlapAdd-into-Zarr is safe; ByIndex is safe; a hypothetical Scatter writing to dynamically-determined locations is not.
One subtlety: points → grid at scale¶
Even when the output streams cleanly, the input may not fit in RAM. With 100M Argo profiles you can’t hold a single KDTree over the full point cloud. The pattern is the same hierarchical Patcher-of-Patchers from above, but applied for a different reason: the outer Patcher partitions the output grid into super-tiles, and for each super-tile you load only the points whose neighborhoods could touch that region (via a disk-backed spatial index, or by spatially partitioning the points to begin with).
1. Outer Patcher : super-tile the output grid
2. Per super-tile : load only the points whose neighborhoods could touch it
3. Inner Patcher : build radius/knn patches around grid cells in this super-tile
4. Operator : predict per cell (GP, GNN, neural process, …)
5. OverlapAdd : stream into Zarr for the super-tile's grid regionOutput streams via OverlapAdd; input streams via the outer Patcher selecting which points to load. Two streaming mechanisms, working at different levels of the recursion.
The genuinely hard case¶
Streaming breaks when the output structure is determined by the predictions themselves: adaptive-mesh refinement, active learning, on-the-fly point insertion. You can’t preallocate, and each prediction may need to spatially-index against all previous predictions. That’s not really streamable in our sense; it needs different abstractions (dynamic spatial structures, incremental indexing).
The pragmatic workaround is two-pass: a first pass with a static (possibly coarse) output set using a streaming-safe aggregation, then a second pass that refines using the first-pass result. The framework supports this naturally — two Patchers, two passes — but it’s not automatic. The user opts in.
The same two-pass pattern covers global-context aggregations (global normalization, attention across all patches, learned aggregations needing the full batch): pass 1 computes the global statistic with a streaming-safe aggregation, pass 2 uses it. Approximation alternatives (t-digest for quantiles, sketch-based attention) are useful when two passes are too expensive.
Two workflows, not one “streaming mode”¶
The strategies above support two distinct use cases that should be documented separately.
Training-time streaming (DataLoader pattern)¶
One patch at a time, no reconstruction. Patches are consumed by a model and thrown away. The adapter is small:
class PatchDataset:
def __init__(self, patcher: Patcher, field: Field):
self.patcher = patcher
self.field = field
# Anchor list materialized once (small); patch data never is
self.anchors = list(patcher.sampler.anchors(field.domain, patcher.geometry))
def __len__(self): return len(self.anchors)
def __getitem__(self, i: int) -> Patch:
a = self.anchors[i]
idx = self.patcher.geometry.neighborhood(self.field.domain, a)
w = self.patcher.window.weights(self.patcher.geometry)
return Patch(data=self.field.select(idx), anchor=a, indices=idx, weights=w)Drop-in for torch.utils.data.Dataset or any indexable dataset. The DataLoader handles batching, shuffling, and parallel I/O; the Patcher provides the patch definition. Memory footprint: one anchor list (small) + one patch per worker (small).
Inference-time streaming (reconstruct the global field)¶
Lazy reads + iterator split + streaming-flagged disk-backed aggregation, optionally wrapped in hierarchical composition.
inference = Patcher(
geometry = Rectangular((256, 256)),
sampler = RegularStride((192, 192)),
window = Hann(),
aggregation = OverlapAdd(streaming=True, target_path="global.zarr",
chunks=(256, 256)),
)
result_field = inference.merge(
(operator(p) for p in inference.split(field)),
field.domain,
)
# result_field is a ZarrField — read it back with xarray if you wantIf a single super-tile is too big, wrap this block in the outer Patcher of section 3.
Ragged geometries: a streaming non-issue¶
One more thing worth flagging up front: ragged geometries (SphericalCap, RadiusGraph, PolygonIntersection) stream fine at the Patcher level — each patch is independently computed and accumulated. The complication is downstream batching: if your operator is a GNN expecting padded batches, you need a batching layer between split and operator (jraph, torch_geometric handle this natively).
So ragged input ≠ non-streamable. Ragged input + GNN-style batching is a separate engineering problem, not a Patcher problem.
Quick wins (small fixes that just become defaults)¶
In-place division in
OverlapAdd._merge_in_memory. There is no reason to allocate the third full-size array —np.divide(rec, count, out=rec, where=count > 0)is the right primitive.Lazy coordinate generation is implicit once
Sampler.anchors()is treated asIterable; nothing to do beyond removing any old code that materialized them eagerly.
Open questions¶
A few choices left deliberately open, to revisit once real workflows force the shape:
Two-pass operator support. Common enough (any normalization, any cross-patch attention, refinement loops) that an explicit
TwoPassPatcherpattern might earn its keep. Defer until a real use case lands.Coarse-to-fine reconstruction as a workflow. Operator-led, but interacts with the Patcher (different geometries per stage). Worth its own short section once a concrete use case (multi-scale methane retrieval, for example) is in flight.
Streaming-safety: warning vs hard error vs configurable. Currently warning. A global
strictflag would let CI-like contexts fail fast without per-call API noise.Distributed reconstruction. Zarr writes parallelize across workers via dask; the framework doesn’t need to know. The hierarchical pattern is the natural unit of distribution (one worker per super-tile). Worth documenting as a recipe rather than building in.
Summary¶
The framework handles scaling through three orthogonal mechanisms: backend laziness (free — just pick the right reader), iterator-based traversal (small protocol change to Patcher.split), and disk-backed aggregation (flag on Aggregation, framework-managed Zarr). When the global field doesn’t fit at any single level, the same Patcher abstraction composes recursively as a Patcher-of-Patchers, with no new class introduced.
The key conceptual frame is the streaming asymmetry: input domains stream freely, but the output structure has to be preallocatable for streaming to work. Regular grids, fixed key sets, and known query points are preallocatable; dynamic point clouds and growing graphs are not. The streaming_safe property on Aggregation encodes this distinction, and the two-pass pattern is the escape hatch for the hard cases.
Two distinct workflows — training-time streaming via a PatchDataset adapter, and inference-time streaming via flagged disk-backed aggregation — cover the common cases.
Streaming-Compatible Aggregations¶
OverlapAdd is the canonical example, but there’s a much richer family of aggregations that stream. The structural property they share: each can be expressed as a fold over one or more parallel accumulators, where every per-patch update is independent of every other.
Two paths to streaming follow from this — exact (associative reductions) and approximate (bounded-error sketches). The split matters: it tells you when streaming preserves the answer and when it preserves it only up to a guaranteed error bound.
STREAMING-COMPATIBLE AGGREGATIONS
═════════════════════════════════
EXACT APPROXIMATE
──────────────── ─────────────────────────
Associative reductions Bounded-error sketches
(one or more parallel (substitute for stats that
accumulators, no error) can't reduce associatively)
┌─────────────────────┐ ┌──────────────────────────┐
│ Sum, WeightedSum │ │ ApproxQuantile │
│ Max, Min, Range │ │ (t-digest, KLL, GK) │
│ Mean, Variance │ │ ApproxCardinality │
│ OverlapAdd │ │ (HyperLogLog) │
│ HardVote, SoftVote │ │ ApproxMode │
│ InvVarWeightedMean │ │ (Misra-Gries) │
└─────────────────────┘ │ StreamingHistogram │
│ Reservoir(k) │
no information loss └──────────────────────────┘
ε-bounded error
│
▼
substitute for stats
that DON'T stream exactly:
median, percentiles,
exact mode, k-th order statExact streaming aggregations¶
Associative + commutative reductions. The per-patch update commutes with every other patch, so streaming gives bit-identical results to a batch computation (modulo floating-point order).
Monoidal (one accumulator)¶
| Aggregation | Accumulator | Per-patch update | Final |
|---|---|---|---|
Sum | acc | acc[idx] += data | acc |
WeightedSum | acc | acc[idx] += data * w | acc |
Max | acc (init −∞) | acc[idx] = maximum(acc[idx], data) | acc |
Min | acc (init +∞) | acc[idx] = minimum(acc[idx], data) | acc |
Any / All (bool) | acc (init 0/1) | acc[idx] |= data / &= | acc |
Count | acc (int) | acc[idx] += 1 | acc |
Compound monoidal (parallel accumulators, combine at the end)¶
Two or more monoidal accumulators run in parallel; the final step is a chunked map.
| Aggregation | Accumulators | Final op |
|---|---|---|
Mean | sum, count | sum / count |
OverlapAdd | weighted_sum, weight | weighted_sum / weight |
Range | min, max | max - min |
Variance (Welford) | mean, M2, count | M2 / (count - 1) |
InvVarWeightedMean | Σ(μ/σ²), Σ(1/σ²) | acc1 / acc2, 1 / acc2 |
Categorical / voting¶
| Aggregation | Accumulator | Per-patch update | Final |
|---|---|---|---|
HardVote | acc[K, ...] | acc[argmax(data), idx] += 1 | argmax(acc, axis=0) |
SoftVote | acc[K, ...] | acc[:, idx] += data_probs | argmax(acc, axis=0) |
Both stream cleanly; just K channels instead of one. Useful for ensembling classification models patch-by-patch.
Approximate streaming aggregations (sketches)¶
The aggregations above all reduce associatively. Quantiles, mode, cardinality, and the full empirical distribution do not — exact computation needs all values per cell. Sketches sidestep this by maintaining a compact summary per cell with a bounded error guarantee.
| Aggregation | Sketch backend | Error | Per-cell memory |
|---|---|---|---|
ApproxQuantile | t-digest, KLL, GK | ε-relative on rank | O(log(1/ε)/ε) |
ApproxCardinality | HyperLogLog | ~1.6%/√m | O(2^p) |
ApproxMode / heavy hitters | Misra-Gries, Space-Saving | bounded freq error | O(k) |
StreamingHistogram | equi-width or t-digest | bin error | O(bins) |
Reservoir(k) | reservoir sampling | sampling variance | O(k) |
Sketches substitute for, but do not equal, their exact counterparts:
| Exact (not streamable) | Approximate substitute (streamable) |
|---|---|
| Median, percentiles | ApproxQuantile(q=0.5), t-digest |
| Mode | ApproxMode via Misra-Gries |
| Unique-value count | ApproxCardinality via HyperLogLog |
| Full empirical distribution | StreamingHistogram |
| Arbitrary downstream statistic | Reservoir(k) + post-hoc computation |
This is the central tradeoff: you cannot have exact median in a streaming pipeline, but you can have ε-bounded median that does stream. Naming the operators ApproxX makes the tradeoff explicit at the call site rather than hiding it.
One practical gotcha: per-cell sketches can dominate memory. A t-digest is ~1 KB per cell; on a 720×1440 grid that’s ~1 GB just for sketches. Sketch per Zarr chunk rather than per cell is usually the right level — one t-digest summarizing a (256, 256) chunk gives you per-region percentiles without the per-cell overhead.
Bayesian merging: InvVarWeightedMean¶
Worth flagging on its own because it’s the natural answer to a common geoscience workflow: merging overlapping local-Bayesian predictions (local GPs, BNNs, neural processes) into a global field while preserving uncertainty.
When each patch produces a mean and variance, the optimal Gaussian combination is Kalman-style inverse-variance weighting:
with patch window weights as an extra modeling knob. This streams with two accumulators:
mu_i, var_i = operator(patch) # per-patch Bayesian output
mu_acc[idx] += patch.weights * mu_i / var_i
prec_acc[idx] += patch.weights / var_i
# at the end
mu_global = mu_acc / prec_acc
var_global = 1 / prec_accFor Case 2 local-GP workflows (no pooling, per-patch ), this is the right way to stitch overlapping posteriors into a global field with calibrated uncertainty. Both the mean and the variance fall out of the streaming reduction at no extra cost.
What stays non-streamable¶
Some aggregations genuinely cannot be done one patch at a time — they need either the full per-cell history or cross-patch context.
| Aggregation | Why it fails | Streaming substitute |
|---|---|---|
| Exact median, exact percentiles | Needs the full per-cell distribution | ApproxQuantile |
| Exact mode | Needs all values | ApproxMode |
| Order statistics (k-th smallest) | Needs sorted per-cell list | ApproxQuantile |
Learned(NN) aggregation | Typically needs full batch of outputs | Two-pass or per-region |
| Bayesian Model Averaging (exact) | Needs all posteriors jointly | InvVarWeightedMean (Gaussian approx) |
| Global normalization | Needs global statistic before applying | Two-pass: compute stat, then normalize |
These are the honest streaming_safe = False cases. The two-pass pattern is the general escape hatch (compute the global stat in pass 1 with a streaming-safe aggregation, then apply in pass 2). Sketches are the specific escape hatch for distribution statistics.
Suggested framework exposure¶
# ── Exact: single-accumulator monoidal ──
class Sum (Aggregation): streaming_safe = True
class WeightedSum (Aggregation): streaming_safe = True
class Max (Aggregation): streaming_safe = True
class Min (Aggregation): streaming_safe = True
# ── Exact: compound monoidal ──
class Mean (Aggregation): streaming_safe = True
class OverlapAdd (Aggregation): streaming_safe = True
class Variance (Aggregation): streaming_safe = True # Welford
class InvVarWeightedMean (Aggregation): streaming_safe = True # Bayesian merge
# ── Exact: categorical ──
class HardVote (Aggregation): streaming_safe = True
class SoftVote (Aggregation): streaming_safe = True
# ── Approximate: sketches ──
class ApproxQuantile (Aggregation): streaming_safe = True # t-digest backend
class ApproxCardinality (Aggregation): streaming_safe = True # HyperLogLog
class ApproxMode (Aggregation): streaming_safe = True # Misra-Gries
class StreamingHistogram (Aggregation): streaming_safe = True
class Reservoir (Aggregation): streaming_safe = True
# ── Honest non-streamable ──
class Median (Aggregation): streaming_safe = False # use ApproxQuantile
class Mode (Aggregation): streaming_safe = False # use ApproxMode
class Learned (Aggregation): streaming_safe = False # use two-passUsers who want streaming median write ApproxQuantile(q=0.5). The naming makes the exact/approximate choice explicit at the call site rather than burying it in a flag.
A bonus property of monoidal aggregations: distributability¶
Every exact streaming aggregation above is trivially distributable. Each worker maintains a local accumulator over its slice of patches; the final step is a tree-reduce of the accumulators. For compound monoidals (Mean = Sum / Count), each worker keeps both Sum and Count, the Sums are reduced separately, the Counts are reduced separately, and the division happens once at the end.
This means the hierarchical Patcher-of-Patchers pattern parallelizes with no extra machinery — one worker per super-tile, tree-reduce the accumulators when they come back, finalize once. The Aggregation classes designed for streaming get distribution for free.