Generative model: events arrive in time according to an intensity , each carries a mark , and each is observed with probability . The “thinned marked temporal point process” (TMTPP) is the mathematical object on which everything else at Tier V is built — see Daley & Vere-Jones, 2003Daley & Vere-Jones, 2008 for foundations.
The full mathematical derivation lives in methane_pod/notebooks/01_mttpp_theory. This page gives the architectural view: the components, their interfaces, and where they plug into the rest of plumax.
The three components¶
Temporal — (events / second)¶
The intensity function tells you how rapidly events arrive at time . Examples from the catalogue in 02_intensity_zoo.md:
13 deterministic / Hawkes kernels currently implemented in methane_pod.intensity; LGCP is the v1.5 next kernel — it’s the natural model when clustering is environmental rather than self-exciting.
Each kernel is an equinox.Module exposing the same __call__(t) → λ and sample_priors() interface. Adding a new kernel is a one-file PR.
Marks — (probability density on kg/s)¶
The mark distribution gives the size of an event conditional on it happening.
Table (1):Mark families — form and operational fit.
| Family | Form | When to use |
|---|---|---|
| Lognormal | single-class facility populations | |
| Pure power-law | for | baseline only; over-emphasises the tail |
| Lognormal-Pareto | body, tail | v1 default — Cusworth 2021 / Sherwin 2024 operational standard |
| Mixture-of-lognormals | multi-class facility populations (wells + tanks + pipelines) |
(power-law) and (lognormal-Pareto) are themselves parameters.
The mark distribution is what Tier V actually wants to recover — it’s the population-scale answer to “how big are the leaks at this kind of facility?”
Detection thinning — (probability)¶
Not every event is observed. Each satellite has a probability of detection that depends on the leak size, viewing geometry, surface, and atmospheric state.
The operational form in the methane literature is the Hill function (Cusworth 2021, Sherwin 2024):
where is the leak size at which detection probability is 0.5 and controls the steepness.
POD calibration uncertainty — hierarchical prior¶
Per-instrument controlled-release campaigns (Sherwin et al. 2024) deliver a posterior on , not a point. v1 default: hierarchical prior carrying calibration-campaign uncertainty:
This is the middle ground between (a) hard-coding published values (biased when those are uncertain) and (b) full joint inference (cleanest but identifiability concern with λ). v2 promotes to joint inference when basin data warrants.
10 POD models currently in methane_pod.pod_functions, described visually in 05_pod_gallery. Variants:
- Hill — operational standard.
- Varying-coefficient Hill — .
- Spectral-aware — explicitly carries the SWIR retrieval noise floor as a function of column XCH₄.
- Full GLM — generalised linear model with multiple scene covariates.
TMTPP likelihood — canonical form¶
For a set of detected events with per-event posteriors and detection times over a window :
The first sum scores each detected event under: (a) the temporal intensity at the detection time, and (b) the integrated mark contribution that combines the per-event likelihood with the population mark distribution and the satellite POD. The second integral is the expected number of events that would have been detected under the model — subtracts the right amount so the posterior is consistent.
Practical evaluation¶
The mark integral is computed via the importance-weighted Monte Carlo estimator from 06a § Mark likelihood:
with samples and the per-event prior used at Tier I–IV. The factor is the importance weight; without it the population fit double-counts the per-event prior.
Point-regime simplification¶
When per-event posteriors are tightly concentrated () and is smooth on that scale, the importance-weighted MC reduces to the Point regime of 06a:
This is the form currently implemented in methane_pod.fitting.pod_powerlaw_model. It’s the simplification, not the canonical form — explicit regime selection per 06a § Regime selection rule decides when it’s safe to use.
Numerical stability of the integrated thinned-rate term¶
The integral over heavy-tailed (power-law tail, Pareto) and saturating (Hill) must not be evaluated by naive quadrature in linear — the heavy tail underflows.
Where it plugs into plumax¶
Table (2):TMTPP inputs.
| Input | Source |
|---|---|
| per detection | 06a_instantaneous.md — Tier V.A adapter |
| Per-event (samples + ) | Tiers I–IV inversion + posterior export |
| Per-instrument POD calibration | Sherwin 2024 / Cusworth 2021 / Kamdar IMEO controlled-release campaigns; alternatively joint inference with the population |
| Per-instrument overpass coverage (for the integrated rate) | 06a § Non-detection events — catalog ingest |
Table (3):TMTPP outputs.
| Output | Consumer |
|---|---|
| Posterior | 06c_persistency.md — wait times, dispatch windows |
| Posterior | 06d_total_emission.md — total mass under POD-thinning correction |
| Per-instrument POD posterior | instrument-design and cross-mission calibration questions; multi-satellite fusion (06d) |
| Joint posterior | sensitivity studies, satellite-tasking optimisation |
Population vs. per-source — the v1 commitment¶
The TMTPP fits aggregate the population. Two distinct framings:
- Across-population (v1 default for inventory accounting): fit one over all sources of a class within a basin/region. means “size of an event drawn from this class”.
- Per-source longitudinal (v1 for persistency forecasting on a known facility): fit one per facility, with hierarchical shrinkage to the population. means “size of an event from this specific facility”.
Both have library support; the choice is driven by the scientific question, not by the methodology. Inventory totals (06d) use across-population; dispatch decisions for a known leak history (06c) use per-source.
Module layout¶
Table (4):Tier V.B module layout — concern, target module, status.
| Concern | Module | Status |
|---|---|---|
| Intensity registry — deterministic + Hawkes | methane_pod.intensity | ✓ (13 kernels) |
| Intensity registry — log-Gaussian Cox process | methane_pod.intensity.lgcp | ☐ — v1.5 |
| Mark registry | methane_pod.marks (currently inline in fitting) | 🚧 — power-law only; lognormal, lognormal-Pareto, mixture-of-lognormals pending |
| POD models (Hill + variants) | methane_pod.pod_functions | ✓ (10 models) |
| POD time-of-day binning (v1 time-varying POD) | methane_pod.pod_functions.tod_binned | ☐ |
| POD continuous (v2) | methane_pod.pod_functions.continuous_t | ☐ |
| Hierarchical POD calibration prior | methane_pod.pod_functions.calibration_prior | ☐ |
| TMTPP likelihood — point regime | methane_pod.fitting.pod_powerlaw_model | ✓ |
| TMTPP likelihood — full importance-corrected regime | methane_pod.fitting.tmtpp_iw | ☐ — consumes population.adapter.importance |
| Numerical integration helpers (log-space Gauss-Hermite, Pareto IS) | methane_pod.fitting.integrate | ☐ |
| Hawkes / self-exciting kernel | methane_pod.intensity.hawkes | ☐ — beyond the existing kernels |
| Spatial extension (Cox process) | methane_pod.spatial | ☐ — v2; ties to Tier III’s |
Validation strategy¶
- Likelihood gradient.
jax.gradmatches finite differences within tolerance. Cheap unit test. - Synthetic recovery — Point regime. Already in
06_stationary_numpyro_mcmcfor power-law mark. - Synthetic recovery — Full regime with importance correction. Generate per-event posteriors with a known , fit population, recover within reported posterior. Mirrors the importance-correction round trip from 06a § Validation.
- SBC — point. Across 1000 simulated populations, per-parameter rank statistics uniform.
- SBC — soft observation. Same SBC but with the soft-observation layer (per-event posteriors as input). Validates the cross-tier inference end-to-end.
- Identifiability stress test. Generate data where λ is high but is low (vs. the opposite). Quantitative target: posterior correlation when confounded; when well-separated. Confirms the model knows what it can’t disentangle.
- Log-space integration test. Compare log-space Gauss–Hermite to Pareto importance sampling for across . Linear quadrature should fail loudly past . Catches the silent thinned-rate underflow bug.
- Hierarchical POD coverage. With known controlled-release calibration injected as , the hierarchical POD prior should produce posteriors that contain the true value at 95% CI ~95% of the time across simulated basins.
Open questions¶
- Daley, D. J., & Vere-Jones, D. (2003). An Introduction to the Theory of Point Processes, Volume I: Elementary Theory and Methods (2nd ed.). Springer. 10.1007/b97277
- Daley, D. J., & Vere-Jones, D. (2008). An Introduction to the Theory of Point Processes, Volume II: General Theory and Structure (2nd ed.). Springer. 10.1007/978-0-387-49835-5