Beer-Lambert's Law 4 – RTM Variants

Comprehensive Analysis of Beer-Lambert Law Models for Atmospheric Methane Detection¶

Executive Summary¶

This report provides a complete analysis of Beer-Lambert law formulations for atmospheric methane detection using passive remote sensing. We examine five fundamental models and their normalized variants, spanning from exact nonlinear physics to various approximations. Each model is analyzed for physical accuracy, computational efficiency, and operational applicability across detection, retrieval, and quantification tasks. We provide detailed guidance on model selection based on application requirements, hardware constraints, and plume characteristics.

Models covered:

Nonlinear (exact Beer-Lambert law)
MacLaurin approximation (linearized transmittance)
Taylor expansion (linearized optical depth)
Combined (Taylor + MacLaurin)
Logarithmic transformation

For each model, we examine both absolute and relative (normalized) formulations, where normalization provides calibration independence at the cost of requiring background estimation.

1. The Foundation: Exact Nonlinear Beer-Lambert Law {#1-foundation}¶

Physical Foundation¶

Electromagnetic radiation passing through an absorbing atmosphere experiences exponential attenuation governed by the Beer-Lambert law. For atmospheric methane in the shortwave infrared (SWIR, 1600-2500 nm), solar photons reflect from Earth’s surface and traverse the atmosphere before reaching a sensor.

Fundamental equation:

L(\lambda) = \frac{F_0(\lambda) \cdot R(\lambda)}{\pi} \cdot \exp(-\tau(\lambda))

(1)

Parameter definitions:

$L(\lambda)$ : at-sensor spectral radiance [W·m $^{-2}$ ·sr $^{-1}$ ·nm $^{-1}$ ]
$F_0(\lambda)$ : top-of-atmosphere solar irradiance [W·m $^{-2}$ ·nm $^{-1}$ ]
$R(\lambda)$ : surface reflectance (albedo) [dimensionless, 0-1]
$\tau(\lambda)$ : atmospheric optical depth [dimensionless]
$\lambda$ : wavelength [nm]

Optical Depth: The Core Absorption Variable¶

Optical depth quantifies cumulative absorption along the atmospheric path:

\tau(\lambda) = \int_0^L \sigma(\lambda, s) \cdot n_{\text{CH}_4}(s) \, ds

(2)

where:

$\sigma(\lambda, s)$ : absorption cross-section [cm $^2$ ·molecule $^{-1}$ ]
$n_{\text{CH}_4}(s)$ : methane number density [molecule·cm $^{-3}$ ]
$s$ : path coordinate [cm]
$L$ : total path length [cm]

For well-mixed boundary layer plumes, this simplifies to:

\boxed{\tau = \sigma(\lambda,T,p) \cdot N_{\text{total}} \cdot \text{VMR} \cdot 10^{-6} \cdot L \cdot \text{AMF}}

(3)

Component definitions:

$\sigma$ : absorption cross-section [cm $^2$ ·molecule $^{-1}$ ]
- Obtained from HITRAN/GEISA spectroscopic databases
- Temperature and pressure dependent via line broadening
- Typical value: 10^-20 to 10^-18 cm $^2$ ·molecule $^{-1}$ at 2300 nm
- Plot 1: Profile (Temp, Pressure)
- Plot 2: Histogram (binned by wavelength)
- Plot 3: Wavelength/Temperature/Pressure vs Absorption Cross-Spectrum
$N_{\text{total}}$ : total air molecule number density [molecule·m $^{-3}$ ]
- Calculated from ideal gas law: $N_{\text{total}} = P/(k_B T)$
- $P$ : pressure [Pa]
- $k_B = 1.380649 \times 10^{-23}$ J·K $^{-1}$ (Boltzmann constant)
- $T$ : temperature [K]
- Typical value: $2.5 \times 10^{19}$ molecule·cm $^{-3}$ at sea level
$\text{VMR}$ : volume mixing ratio [ppm]
- Typical background: 1.85-1.90 ppm (global average)
  - Latitude, Longitude, Surface Reflectance (Ocean, Mountains, Fields, Cities)
- Plume enhancements: 10-10,000 ppm
10^-6: ppm to volume fraction conversion [dimensionless]
$L$ : atmospheric path length [cm]
- Boundary layer: 10⁵ cm (1 km)
- Full column: 10⁶ cm (10 km)
- Satellite: Polar Orbiting, Geostationary
$\text{AMF}$ : air mass factor [dimensionless]
- Definition: $\text{AMF} = 1/\cos(\theta_{\text{zenith}})+ 1/\cos(\theta_{\text{venith}})$
- Nadir viewing (0°): AMF = 1.0
- 30° off-nadir: AMF = 1.15
- 60° off-nadir: AMF = 2.0
- Satellite?

The Additive Property of Optical Depth¶

For methane concentration expressed as background plus enhancement:

\text{VMR}_{\text{total}} = \text{VMR}_{\text{bg}} + \Delta\text{VMR}

(4)

Units: ppm

The optical depth adds linearly (no interaction between background and plume):

\tau_{\text{total}} = \tau_{\text{bg}} + \Delta\tau

(5)

where:

\tau_{\text{bg}} = \sigma \cdot N_{\text{total}} \cdot \text{VMR}_{\text{bg}} \cdot 10^{-6} \cdot L \cdot \text{AMF}

(6)

\Delta\tau = \sigma \cdot N_{\text{total}} \cdot \Delta\text{VMR} \cdot 10^{-6} \cdot L \cdot \text{AMF}

(7)

Multiplicative Transmittance¶

Transmittance is the fraction of radiation transmitted:

T = \exp(-\tau)

(8)

Using the exponential property $\exp(a+b) = \exp(a) \cdot \exp(b)$ :

T_{\text{total}} = \exp(-\tau_{\text{total}}) = \exp(-(\tau_{\text{bg}} + \Delta\tau)) = \exp(-\tau_{\text{bg}}) \cdot \exp(-\Delta\tau)

(9)

\boxed{T_{\text{total}} = T_{\text{bg}} \cdot T_{\text{enh}}}

(10)

Physical interpretation:

Background atmosphere transmits fraction $T_{\text{bg}}$ of incident light
Plume further attenuates by factor $T_{\text{enh}}$
Total effect is multiplicative, not additive

The Complete Forward Model¶

L = \frac{F_0 R}{\pi} \cdot T_{\text{bg}} \cdot T_{\text{enh}} = \underbrace{\frac{F_0 R}{\pi} \exp(-\tau_{\text{bg}})}_{L_{\text{bg}}} \cdot \exp(-\Delta\tau)

(11)

\boxed{L = L_{\text{bg}} \cdot \exp(-\Delta\tau)}

(12)

Key insight: Observed radiance is background radiance multiplicatively attenuated by plume enhancement factor.

Why This Is Nonlinear¶

The Jacobian (sensitivity) with respect to enhancement:

\frac{\partial L}{\partial (\Delta\text{VMR})} = -L_{\text{bg}} \cdot \exp(-\Delta\tau) \cdot \frac{\partial \Delta\tau}{\partial (\Delta\text{VMR})}

(13)

\boxed{\frac{\partial L}{\partial (\Delta\text{VMR})} = -L_{\text{bg}} \cdot \exp(-\Delta\tau) \cdot \sigma \cdot N_{\text{total}} \cdot 10^{-6} \cdot L \cdot \text{AMF}}

(14)

Units: (W·m $^{-2}$ ·sr $^{-1}$ ·nm $^{-1}$ )·ppm $^{-1}$

Nonlinearity: Sensitivity depends on current state through $\exp(-\Delta\tau)$ term.

Manifestations:

Saturation: As $\Delta\tau$ increases, sensitivity decreases exponentially
State-dependent Jacobian: Requires iterative optimization
Non-convex cost function: Multiple local minima possible

Saturation Example¶

$\Delta\tau$	VMR Enhancement	Fractional Absorption	Relative Sensitivity
0.0	0 ppm	0%	100%
0.1	~500 ppm	9.5%	90.5%
0.3	~1500 ppm	25.9%	74.1%
0.5	~2500 ppm	39.3%	60.7%
1.0	~5000 ppm	63.2%	36.8%

Practical impact: Strong plumes become harder to quantify accurately due to reduced sensitivity.

Inverse Problem: Nonlinear Optimization¶

Observation equation:

\mathbf{y} = L_{\text{bg}} \cdot \exp(-\mathbf{H} \alpha) + \boldsymbol{\epsilon}

(15)

where:

$\mathbf{y}$ : observed radiance vector [W·m $^{-2}$ ·sr $^{-1}$ ·nm $^{-1}$ ]
$\alpha = \Delta\text{VMR}$ : unknown enhancement [ppm]
$\mathbf{H} = \boldsymbol{\sigma} \odot N_{\text{total}} \cdot 10^{-6} L \cdot \text{AMF}$ : sensitivity vector [ppm $^{-1}$ ]
$\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{\Sigma})$ : Gaussian noise

Maximum likelihood cost function:

J(\alpha) = \frac{1}{2}(\mathbf{y} - L_{\text{bg}} \exp(-\mathbf{H}\alpha))^T \mathbf{\Sigma}^{-1} (\mathbf{y} - L_{\text{bg}} \exp(-\mathbf{H}\alpha))

(16)

Solution methods:

Gauss-Newton iteration:

\alpha^{(k+1)} = \alpha^{(k)} + \left(\mathbf{J}^T \mathbf{\Sigma}^{-1} \mathbf{J}\right)^{-1} \mathbf{J}^T \mathbf{\Sigma}^{-1} \mathbf{r}^{(k)}

(17)

where:

$\mathbf{J} = -L_{\text{bg}} \exp(-\mathbf{H}\alpha^{(k)}) \odot \mathbf{H}$ : Jacobian at iteration $k$
$\mathbf{r}^{(k)} = \mathbf{y} - L_{\text{bg}} \exp(-\mathbf{H}\alpha^{(k)})$ : residual

Computational cost: $O(kn^2)$ where $k = 10-20$ iterations, $n =$ wavelengths

Perfect Physics, High Computational Cost¶

Advantages:

✓ Exact physics: No approximation errors
✓ Valid for all VMR: Works for strong plumes
✓ Captures saturation: Correctly models nonlinear effects
✓ Statistically optimal: Maximum likelihood under Gaussian noise

Disadvantages:

✗ Iterative: Requires 10-20 iterations for convergence
✗ Slow: 50-100 ms per pixel
✗ Initialization sensitive: May converge to local minima
✗ Computationally expensive: Hours for large scenes

Operational reality: For 1 million pixels (typical satellite scene):

Processing time: 14-28 hours on single CPU core
Impractical for real-time operations requiring minute-scale latency

2. Motivation for Approximations {#2-motivation}¶

The Operational Dilemma¶

Modern atmospheric remote sensing missions face contradictory requirements:

Requirement	Exact Model Reality	Operational Need
Latency	14-28 hours/scene	Minutes to hours
Throughput	1 scene/day	100-1000 scenes/day
Coverage	Limited swath	Global daily surveys
Cost	Massive compute infrastructure	Modest operational budget

Real-World Mission Examples¶

AVIRIS-NG (airborne hyperspectral):

600 flightlines per year
100,000 pixels per flightline
Total: 60 million pixels/year
Nonlinear processing: ~830 CPU-years
Current approach: Linear approximation → 8 CPU-days

GHGSat constellation (10 satellites):

100 images per satellite per day
1 million pixels per image
Total: 1 billion pixels/day
Nonlinear processing: ~3,170 CPU-years per day
Requires: Approximations for feasibility

Carbon Mapper (upcoming):

2 satellites
1000 scenes/day planned
Operational constraint: Real-time detection during acquisition

The Physics of Weak Absorption¶

Most operational methane plumes exhibit weak to moderate absorption.

Typical scenario (SWIR at 2300 nm, boundary layer, 500 ppm enhancement):

\Delta\tau = (10^{-19} \text{ cm}^2) \times (2.5 \times 10^{19} \text{ cm}^{-3}) \times (500 \times 10^{-6}) \times (10^5 \text{ cm}) \times (1.5)

(18)

\Delta\tau \approx 0.09

(19)

Key observation: $\Delta\tau < 0.1$ (< 10% absorption) for most operational plumes.

Exact vs. linear approximation:

Quantity	Exact	Linear Approx	Error
$\exp(-0.09)$	0.9139	$1 - 0.09 = 0.91$	0.4%
$\exp(-0.10)$	0.9048	$1 - 0.10 = 0.90$	0.5%
$\exp(-0.15)$	0.8607	$1 - 0.15 = 0.85$	1.2%

For weak absorption, linear approximation introduces <1% error.

Three Strategies for Linearization¶

We can exploit weak absorption through different mathematical transformations:

Strategy	Operation	Results In	Section
MacLaurin	Approximate $\exp(-\tau) \approx 1 - \tau$	Linear in VMR	1
Taylor	Linearize $\tau$ around background	Simplified exponential	2
Combined	Both MacLaurin + Taylor	Fully linear	3
Logarithm	Transform $\ln(L) = -\tau$	Linear in log space	4

Each offers different trade-offs in:

Approximation error
Valid optical depth range
Computational efficiency
Noise characteristics

3. Model 1: MacLaurin Approximation {#3-maclaurin}¶

Mathematical Foundation¶

The MacLaurin series (Taylor series centered at zero) for $\exp(-x)$ :

\exp(-x) = \sum_{n=0}^{\infty} \frac{(-x)^n}{n!} = 1 - x + \frac{x^2}{2!} - \frac{x^3}{3!} + \frac{x^4}{4!} - \cdots

(20)

First-order truncation:

\boxed{\exp(-x) \approx 1 - x}

(21)

Truncation error:

\text{Error} = \frac{x^2}{2!} - \frac{x^3}{3!} + \frac{x^4}{4!} - \cdots

(22)

For small $x$ , dominated by quadratic term:

\text{Error} \approx \frac{x^2}{2}

(23)

Relative error:

\frac{|\text{Error}|}{\exp(-x)} \approx \frac{x^2/2}{1-x} \approx \frac{x^2}{2}

(24)

(for $x \ll 1$ )

Application to Beer’s Law¶

Approximate transmittance:

T \approx 1 - \tau

(25)

Approximate radiance:

L \approx \frac{F_0 R}{\pi}(1 - \tau) = \frac{F_0 R}{\pi} - \frac{F_0 R}{\pi} \tau

(26)

For background plus enhancement:

L \approx \frac{F_0 R}{\pi}(1 - \tau_{\text{bg}} - \Delta\tau)

(27)

\boxed{L \approx L_{\text{bg,approx}} - \frac{F_0 R}{\pi} \Delta\tau}

(28)

where $L_{\text{bg,approx}} = \frac{F_0 R}{\pi}(1 - \tau_{\text{bg}})$

Units: W·m $^{-2}$ ·sr $^{-1}$ ·nm $^{-1}$

Key result: Radiance is now linear in $\Delta\tau$ , hence linear in $\Delta\text{VMR}$ .

Accuracy Analysis¶

$\Delta\tau$	Exact $\exp(-\Delta\tau)$	Approx $1-\Delta\tau$	Absolute Error	Relative Error
0.01	0.99005	0.99000	0.00005	0.005%
0.05	0.95123	0.95000	0.00123	0.13%
0.10	0.90484	0.90000	0.00484	0.5%
0.15	0.86071	0.85000	0.01071	1.2%
0.20	0.81873	0.80000	0.01873	2.3%
0.30	0.74082	0.70000	0.04082	5.5%

Operational threshold: $\Delta\tau < 0.05$ for excellent accuracy (<0.2% error)

Practical limit: $\Delta\tau < 0.10$ for acceptable accuracy (<1% error)

Physical Interpretation¶

The MacLaurin approximation assumes:

Weak absorption regime: Optical depth much less than unity
Linear attenuation: Fractional change proportional to optical depth
No photon depletion: Most photons survive atmospheric transit
Unsaturated absorption: Signal-to-concentration relationship remains linear

Physical validity conditions:

Fractional absorption < 10% ( $\Delta\tau < 0.1$ )
Transmission > 90% ( $T > 0.9$ )
Beer’s law in “linear region”

Breakdown mechanisms:

Strong plumes: $\Delta\tau > 0.2$ → nonlinear effects significant
Saturation: Absorption depletes photon population → sublinear response
Quadratic error accumulation: $O(\tau^2)$ terms become important

Jacobian (Constant Sensitivity)¶

\frac{\partial L}{\partial (\Delta\text{VMR})} \approx -\frac{F_0 R}{\pi} \cdot \frac{\partial \Delta\tau}{\partial (\Delta\text{VMR})}

(29)

\boxed{\frac{\partial L}{\partial (\Delta\text{VMR})} = -\frac{F_0 R}{\pi} \cdot \sigma \cdot N_{\text{total}} \cdot 10^{-6} \cdot L \cdot \text{AMF}}

(30)

Units: (W·m $^{-2}$ ·sr $^{-1}$ ·nm $^{-1}$ )·ppm $^{-1}$

Key property: Constant — independent of current VMR or optical depth.

Implication: Model predicts no saturation (same sensitivity at 100 ppm as at 5000 ppm).

Reality check: Exact sensitivity includes $\exp(-\Delta\tau)$ factor that decreases with absorption. MacLaurin approximation misses this physics.

Inverse Problem: Closed-Form Solution¶

Linear forward model:

\mathbf{y} \approx \mathbf{L}_{\text{bg,approx}} - \mathbf{A} \alpha + \boldsymbol{\epsilon}

(31)

where $\mathbf{A} = \frac{F_0 R}{\pi}(\boldsymbol{\sigma} \odot N_{\text{total}} \cdot 10^{-6} L \cdot \text{AMF})$

Maximum likelihood estimate:

\boxed{\hat{\alpha} = \frac{\mathbf{A}^T \mathbf{\Sigma}^{-1} (\mathbf{y} - \mathbf{L}_{\text{bg,approx}})}{\mathbf{A}^T \mathbf{\Sigma}^{-1} \mathbf{A}}}

(32)

Units: ppm

Computational cost: $O(n^2)$ general, $O(n)$ for diagonal covariance

Key advantage: Single-step solution — no iteration, no convergence issues.

Strengths and Limitations¶

Strengths:

✓ Fastest possible: Single matrix operation
✓ Closed-form: No iterative convergence required
✓ Simple implementation: Minimal code complexity
✓ No initialization: Direct solution from observations

Limitations:

✗ Very restricted range: $\Delta\tau < 0.05$ (~250 ppm typical)
✗ No saturation modeling: Overestimates sensitivity for moderate plumes
✗ Systematic bias: Underestimates VMR for $\Delta\tau > 0.1$
✗ Poor accuracy: 15-20% error for operational plumes
✗ Breaks physics: Treats absorption as subtractive, not multiplicative

Verdict: Primarily theoretical interest. Too restrictive for most operational applications. Main value is pedagogical — illustrates linearization concept.

4. Model 2: Taylor Expansion {#4-taylor}¶

Complementary Linearization Strategy¶

Instead of approximating the exponential function, we linearize the optical depth variable around background conditions.

Taylor series of $\tau(\text{VMR})$ around $\text{VMR}_{\text{bg}}$ :

\tau(\text{VMR}) = \tau(\text{VMR}_{\text{bg}}) + \frac{d\tau}{d\text{VMR}}\bigg|_{\text{bg}} (\text{VMR} - \text{VMR}_{\text{bg}}) + \frac{1}{2}\frac{d^2\tau}{d\text{VMR}^2}\bigg|_{\text{bg}} (\text{VMR} - \text{VMR}_{\text{bg}})^2 + \cdots

(33)

First-order truncation:

\tau(\text{VMR}) \approx \tau_{\text{bg}} + \frac{d\tau}{d\text{VMR}} \Delta\text{VMR}

(34)

The Critical Insight: Exact Linearity¶

Since optical depth has the form:

\tau = \sigma \cdot N_{\text{total}} \cdot \text{VMR} \cdot 10^{-6} \cdot L \cdot \text{AMF}

(35)

This is exactly linear in VMR. The second derivative is:

\frac{d^2\tau}{d\text{VMR}^2} = 0

(36)

Therefore, the Taylor expansion is exact (not an approximation):

\boxed{\Delta\tau = \sigma \cdot N_{\text{total}} \cdot \Delta\text{VMR} \cdot 10^{-6} \cdot L \cdot \text{AMF}}

(37)

No mathematical approximation error!

Where the “Approximation” Lives¶

The assumptions are physical, not mathematical:

Spatial homogeneity: VMR constant over sensor footprint (30-60 m)
Vertical well-mixing: Plume uniformly distributed in boundary layer
Path-averaged parameters: Single effective $\sigma$ , $N_{\text{total}}$ , $L$
Negligible scattering: Direct path dominates over multiply-scattered paths

Validity conditions:

✓ Boundary layer plumes: Well-mixed below 1-2 km
✓ Moderate footprints: Smaller than plume horizontal scale (~100-500 m)
✗ Elevated plumes: Vertical stratification breaks well-mixed assumption
✗ Sub-pixel structure: Sharp gradients, point sources smaller than footprint

Application: Exact Exponential Retained¶

Radiance with Taylor-expanded optical depth:

L = L_{\text{bg}} \cdot \exp(-\Delta\tau)

(38)

where $\Delta\tau = \sigma \cdot N_{\text{total}} \cdot \Delta\text{VMR} \cdot 10^{-6} \cdot L \cdot \text{AMF}$ (exact)

\boxed{L = L_{\text{bg}} \cdot \exp\left(-\sigma \cdot N_{\text{total}} \cdot \Delta\text{VMR} \cdot 10^{-6} \cdot L \cdot \text{AMF}\right)}

(39)

Units: W·m $^{-2}$ ·sr $^{-1}$ ·nm $^{-1}$

Key difference from MacLaurin: Exponential is exact, not approximated.

Physical meaning:

Linearized concentration-to-optical-depth (which was already linear)
Preserved exact optical-depth-to-transmittance (exponential decay)
Captures saturation correctly

Jacobian: Nonlinear but Simplified¶

\frac{\partial L}{\partial (\Delta\text{VMR})} = -L_{\text{bg}} \cdot \exp(-\Delta\tau) \cdot \frac{\partial \Delta\tau}{\partial (\Delta\text{VMR})}

(40)

\boxed{\frac{\partial L}{\partial (\Delta\text{VMR})} = -L_{\text{bg}} \cdot \exp(-\Delta\tau) \cdot \sigma \cdot N_{\text{total}} \cdot 10^{-6} \cdot L \cdot \text{AMF}}

(41)

Units: (W·m $^{-2}$ ·sr $^{-1}$ ·nm $^{-1}$ )·ppm $^{-1}$

Property: Nonlinear — depends on $\Delta\tau$ through $\exp(-\Delta\tau)$ factor.

Saturation correctly captured:

$\Delta\tau$	VMR	Sensitivity Reduction
0.0	0 ppm	0% (baseline)
0.1	~500 ppm	9.5%
0.2	~1000 ppm	18.1%
0.3	~1500 ppm	25.9%
0.5	~2500 ppm	39.3%

Advantage over MacLaurin: Correctly predicts decreasing sensitivity for strong plumes.

Inverse Problem: Iterative but Faster¶

Nonlinear forward model:

\mathbf{y} = \mathbf{L}_{\text{bg}} \odot \exp(-\mathbf{H} \alpha) + \boldsymbol{\epsilon}

(42)

where $\mathbf{H} = \boldsymbol{\sigma} \odot N_{\text{total}} \cdot 10^{-6} L \cdot \text{AMF}$ (constant vector)

Gauss-Newton iteration:

\alpha^{(k+1)} = \alpha^{(k)} + \frac{\mathbf{J}^{(k)T} \mathbf{\Sigma}^{-1} \mathbf{r}^{(k)}}{\mathbf{J}^{(k)T} \mathbf{\Sigma}^{-1} \mathbf{J}^{(k)}}

(43)

where:

$\mathbf{J}^{(k)} = -\mathbf{L}_{\text{bg}} \odot \exp(-\mathbf{H}\alpha^{(k)}) \odot \mathbf{H}$ : Jacobian
$\mathbf{r}^{(k)} = \mathbf{y} - \mathbf{L}_{\text{bg}} \odot \exp(-\mathbf{H}\alpha^{(k)})$ : residual

Computational cost: $O(kn^2)$ where $k = 5-10$ iterations (vs. 10-20 for full nonlinear)

Convergence properties:

Faster than full nonlinear (simpler Jacobian structure)
More robust (monotonic cost function)
Better conditioned (separable parameters)

Validity Range¶

No approximation error in Beer’s law physics, but assumptions limit applicability:

$\Delta\tau$	VMR Range	Validity	Physical Limitation
< 0.1	< 500 ppm	Excellent	None
0.1-0.3	500-1500 ppm	Good	Moderate vertical structure OK
0.3-0.5	1500-2500 ppm	Acceptable	Requires good vertical mixing
> 0.5	> 2500 ppm	Poor	Vertical stratification significant

Practical operational range: $\Delta\tau < 0.3$ (~1500 ppm for typical conditions)

Strengths and Limitations¶

Strengths:

✓ Exact exponential: Captures saturation perfectly
✓ Valid for moderate plumes: Up to $\Delta\tau \approx 0.3$
✓ Faster convergence: 5-10 vs. 10-20 iterations
✓ Better initialization: Linear model provides excellent starting point
✓ Physics-based: Only spatial assumptions, no mathematical approximations

Limitations:

✗ Still iterative: Not as fast as fully linear models
✗ Requires convergence: Can fail (though rarely in practice)
✗ Spatial assumptions: Breaks down for vertically stratified plumes
✗ Not real-time: 10 ms/pixel still too slow for some applications

Verdict: Excellent compromise for moderate plumes. Often overlooked but very effective — should be used more widely. Ideal when some iteration acceptable but full nonlinear too expensive.

5. Model 3: Combined Approximation (Taylor + MacLaurin) {#5-combined}¶

The Operational Standard¶

Applying both approximations sequentially:

Taylor expand optical depth: $\Delta\tau = \sigma \cdot N_{\text{total}} \cdot \Delta\text{VMR} \cdot 10^{-6} \cdot L \cdot \text{AMF}$ (exact)
MacLaurin expand exponential: $\exp(-\Delta\tau) \approx 1 - \Delta\tau$

Result:

L \approx L_{\text{bg}} \cdot (1 - \Delta\tau)

(44)

\boxed{L \approx L_{\text{bg}} - L_{\text{bg}} \cdot \sigma \cdot N_{\text{total}} \cdot \Delta\text{VMR} \cdot 10^{-6} \cdot L \cdot \text{AMF}}

(45)

Units: W·m $^{-2}$ ·sr $^{-1}$ ·nm $^{-1}$

This is FULLY LINEAR in $\Delta\text{VMR}$ — enables closed-form matched filter solution.

Approximation Error Decomposition¶

Total error has two sources:

Taylor expansion error: 0% (exact for linear $\tau$ )
MacLaurin expansion error: $\approx \frac{(\Delta\tau)^2}{2}$

Combined relative error:

\frac{|L_{\text{exact}} - L_{\text{combined}}|}{L_{\text{exact}}} \approx \frac{(\Delta\tau)^2}{2}

(46)

Error quantification table:

$\Delta\tau$	VMR (typical)	Exact $L/L_{\text{bg}}$	Approx $L/L_{\text{bg}}$	Relative Error	Quality
0.01	~50 ppm	0.9900	0.9900	0.005%	Excellent
0.05	~250 ppm	0.9512	0.9500	0.13%	Excellent
0.10	~500 ppm	0.9048	0.9000	0.5%	Good
0.15	~750 ppm	0.8607	0.8500	1.2%	Acceptable
0.20	~1000 ppm	0.8187	0.8000	2.3%	Marginal
0.30	~1500 ppm	0.7408	0.7000	5.5%	Poor

Operational threshold: $\Delta\tau < 0.1$ gives <1% error → acceptable for most detection/retrieval applications.

Strict threshold: $\Delta\tau < 0.05$ gives <0.2% error → suitable for flux quantification.

Physical Interpretation¶

The combined model treats absorption as:

Additive reduction (not multiplicative attenuation)
- $L \approx L_{\text{bg}} - \text{absorption term}$
- Photons removed linearly with optical depth
No saturation effects
- Sensitivity constant regardless of plume strength
- Linear signal-to-concentration relationship
“Weak absorption” physics
- Most photons survive ( $T > 0.9$ )
- Fractional change proportional to optical depth
- Absorption doesn’t significantly deplete radiation field

When physically valid:

Plume causes < 10% absorption
Transmission > 90%
Beer’s law in linear regime
No significant photon depletion

When breaks down:

Strong plumes (> 10% absorption)
Saturation becomes important
Model overestimates sensitivity
Systematic VMR underestimation

Jacobian: Constant (Simplified)¶

\frac{\partial L}{\partial (\Delta\text{VMR})} = -L_{\text{bg}} \cdot \frac{\partial \Delta\tau}{\partial (\Delta\text{VMR})}

(47)

\boxed{\frac{\partial L}{\partial (\Delta\text{VMR})} = -L_{\text{bg}} \cdot \sigma \cdot N_{\text{total}} \cdot 10^{-6} \cdot L \cdot \text{AMF}}

(48)

Units: (W·m $^{-2}$ ·sr $^{-1}$ ·nm $^{-1}$ )·ppm $^{-1}$

Key property: Constant — same sensitivity at all VMR levels.

Comparison with exact Jacobian:

Model	Jacobian	VMR Dependence
Exact	$-L_{\text{bg}} \exp(-\Delta\tau) \cdot \partial\tau/\partial\text{VMR}$	Decreases with VMR (saturation)
Combined	$-L_{\text{bg}} \cdot \partial\tau/\partial\text{VMR}$	Constant (no saturation)

Consequence: Combined model cannot capture saturation → systematic bias for moderate-to-strong plumes.

Inverse Problem: The Matched Filter¶

Linear forward model:

\mathbf{y} = \mathbf{L}_{\text{bg}} - \mathbf{L}_{\text{bg}} \odot \mathbf{H} \cdot \alpha + \boldsymbol{\epsilon}

(49)

Rearranging:

\mathbf{y} - \mathbf{L}_{\text{bg}} = -(\mathbf{L}_{\text{bg}} \odot \mathbf{H}) \cdot \alpha + \boldsymbol{\epsilon}

(50)

Define:

Innovation: $\mathbf{d} = \mathbf{y} - \mathbf{L}_{\text{bg}}$
Target spectrum: $\mathbf{t} = \mathbf{L}_{\text{bg}} \odot \mathbf{H}$

Matched filter solution:

\boxed{\hat{\alpha} = \frac{\mathbf{t}^T \mathbf{\Sigma}^{-1} \mathbf{d}}{\mathbf{t}^T \mathbf{\Sigma}^{-1} \mathbf{t}} = \frac{\mathbf{t}^T \mathbf{\Sigma}^{-1} (\mathbf{y} - \mathbf{L}_{\text{bg}})}{\mathbf{t}^T \mathbf{\Sigma}^{-1} \mathbf{t}}}

(51)

Units: ppm

This is the famous “matched filter” equation used in AVIRIS-NG, GHGSat, and most operational systems.

Computational cost:

General: $O(n^2)$ (covariance inversion)
Diagonal covariance: $O(n)$ (element-wise operations)

Typical timing: 0.5 ms per pixel on modern CPU

Why This Became the Standard¶

Speed comparison for 1 million pixel scene:

Model	Time/Pixel	Total Time	Speedup vs. Nonlinear
Nonlinear	50 ms	13.9 hours	1× (baseline)
Taylor Only	10 ms	2.8 hours	5×
Combined	0.5 ms	8.3 minutes	100×

The 100× speedup is transformative:

Real-time processing becomes possible
Daily global surveys feasible
Lower computational infrastructure costs
Enables operational missions

Accuracy is “good enough”:

5-10% retrieval error for typical plumes ( $\Delta\tau$ = 0.05-0.10)
Comparable to other uncertainties:
- Wind speed for flux: ±15%
- Path length estimate: ±10%
- Surface reflectance: ±5-10%
Sufficient for detection and initial quantification

Strengths and Limitations¶

Strengths:

✓ Fastest: Single-pass, closed-form solution
✓ Calibration-free (normalized version): Self-normalizing
✓ Robust: Immune to multiplicative errors when normalized
✓ Simple: Easy implementation and debugging
✓ Statistically optimal: For linear-Gaussian model
✓ Operationally proven: AVIRIS-NG, GHGSat, Carbon Mapper, etc.
✓ Parallelizable: Embarrassingly parallel across pixels

Limitations:

✗ Restricted range: $\Delta\tau < 0.1$ (~500 ppm typical)
✗ No saturation: Constant sensitivity assumption
✗ Systematic bias: Underestimates moderate-to-strong plumes by 5-20%
✗ Poor for quantification: Insufficient accuracy for flux estimation
✗ Requires weak absorption: Breaks down for strong plumes

Verdict: The operational standard for detection and initial screening. Use for weak-to-moderate plumes when speed is critical. Follow up strong detections with Taylor or Nonlinear for accurate quantification.

6. Model 4: Logarithmic Transformation {#6-logarithmic}¶

A Different Path to Linearity¶

Instead of approximating the physics, we can mathematically transform the measurement space to achieve linearity.

Starting from exact Beer’s Law:

L = L_{\text{bg}} \cdot \exp(-\Delta\tau)

(52)

Apply natural logarithm to both sides:

\ln(L) = \ln(L_{\text{bg}}) - \Delta\tau

(53)

Rearranging:

\ln(L) - \ln(L_{\text{bg}}) = -\Delta\tau

(54)

Or equivalently, using normalized radiance $L_{\text{norm}} = L/L_{\text{bg}}$ :

\boxed{\ln(L_{\text{norm}}) = -\Delta\tau}

(55)

Substituting optical depth:

\boxed{\ln(L_{\text{norm}}) = -\sigma \cdot N_{\text{total}} \cdot \Delta\text{VMR} \cdot 10^{-6} \cdot L \cdot \text{AMF}}

(56)

Units: dimensionless (natural log of dimensionless ratio)

This is LINEAR in $\Delta\text{VMR}$ with NO APPROXIMATION of Beer’s Law!

The Elegant Trade-off¶

The logarithmic transformation achieves seemingly impossible combination:

Advantages:

✓ Exact physics: No approximation of Beer-Lambert law
✓ Linear algebra: Closed-form solution possible
✓ Valid for all $\Delta\tau$ : No range restriction on optical depth
✓ Captures saturation: Exponential decay preserved through logarithm

Cost:

✗ Non-Gaussian noise: Transformation distorts noise statistics
✗ Noise amplification: Variance increases exponentially with absorption
✗ Bias for low SNR: Logarithm of noisy signal has systematic bias
✗ Numerical instability: Division by small numbers for dark scenes

Jacobian in Log Space¶

\frac{\partial \ln(L_{\text{norm}})}{\partial (\Delta\text{VMR})} = \frac{1}{L_{\text{norm}}} \cdot \frac{\partial L_{\text{norm}}}{\partial (\Delta\text{VMR})}

(57)

= \frac{1}{\exp(-\Delta\tau)} \cdot (-\exp(-\Delta\tau)) \cdot \frac{\partial \Delta\tau}{\partial (\Delta\text{VMR})}

(58)

= -\frac{\partial \Delta\tau}{\partial (\Delta\text{VMR})}

(59)

\boxed{\frac{\partial \ln(L_{\text{norm}})}{\partial (\Delta\text{VMR})} = -\sigma \cdot N_{\text{total}} \cdot 10^{-6} \cdot L \cdot \text{AMF}}

(60)

Units: ppm $^{-1}$

Key property: Constant — the $\exp(-\Delta\tau)$ dependence cancels!

Comparison:

Space	Jacobian	Saturation
Linear	$-\exp(-\Delta\tau) \cdot \partial\tau/\partial\text{VMR}$	Decreasing sensitivity
Log	$-\partial\tau/\partial\text{VMR}$	Constant sensitivity

Physical meaning: Log transformation “undoes” the exponential, recovering the linear relationship between log-radiance and optical depth.

The Critical Issue: Noise Transformation¶

Original noise model (linear space):

L_{\text{norm}} = \exp(-\Delta\tau) + \epsilon

(61)

where $\epsilon \sim \mathcal{N}(0, \sigma^2)$ (Gaussian)

After log transformation:

\ln(L_{\text{norm}}) = \ln(\exp(-\Delta\tau) + \epsilon)

(62)

This is not simply $-\Delta\tau + \ln(1 + \epsilon/\exp(-\Delta\tau))$ unless $\epsilon$ is small.

First-order Taylor approximation of logarithm around $\exp(-\Delta\tau)$ :

\ln(\exp(-\Delta\tau) + \epsilon) \approx \ln(\exp(-\Delta\tau)) + \frac{\epsilon}{\exp(-\Delta\tau)}

(63)

= -\Delta\tau + \frac{\epsilon}{\exp(-\Delta\tau)}

(64)

Transformed noise:

\epsilon_{\ln} \approx \frac{\epsilon}{\exp(-\Delta\tau)} = \epsilon \cdot \exp(\Delta\tau)

(65)

Key consequences:

Non-Gaussian distribution: Log-normal, not Gaussian
Signal-dependent variance:

\text{Var}(\epsilon_{\ln}) \approx \sigma^2 \cdot \exp(2\Delta\tau)

(66)

Noise amplification factor:

$\Delta\tau$	$\exp(2\Delta\tau)$	Noise Increase
0.0	1.00	0%
0.05	1.11	11%
0.10	1.22	22%
0.15	1.35	35%
0.20	1.49	49%
0.30	1.82	82%
0.50	2.72	172%

Bias: For finite SNR, $\mathbb{E}[\ln(L + \epsilon)] \neq \ln(L)$

Physical interpretation: As absorption increases, fewer photons reach sensor → shot noise becomes relatively larger → log amplifies this.

Inverse Problem: Linear in Log Space¶

Log-transformed observations:

\mathbf{y}_{\ln} = \ln(\mathbf{y}_{\text{norm}}) = -\mathbf{H} \alpha + \boldsymbol{\epsilon}_{\ln}

(67)

Assuming approximate Gaussian noise in log space (valid for high SNR):

Maximum likelihood estimate:

\boxed{\hat{\alpha} = -\frac{\mathbf{H}^T \mathbf{\Sigma}_{\ln}^{-1} \mathbf{y}_{\ln}}{\mathbf{H}^T \mathbf{\Sigma}_{\ln}^{-1} \mathbf{H}}}

(68)

where $\mathbf{\Sigma}_{\ln} \approx \text{diag}(\boldsymbol{\sigma}^2 \odot \exp(2\boldsymbol{\Delta\tau}))$

Units: ppm

Computational cost: $O(n)$ for diagonal covariance — same as combined model

Challenge: $\mathbf{\Sigma}_{\ln}$ depends on unknown $\Delta\tau$ → iterative refinement may be needed.

When Log Transform Excels¶

Performance comparison for moderate plume ( $\Delta\tau = 0.15$ , SNR = 50):

Model	Physics Error	Noise Level	Combined Uncertainty
Combined	1.2% (approx)	1.0× (baseline)	~3.2%
Log Transform	0% (exact)	1.35× (amplified)	~3.5%
Nonlinear	0% (exact)	1.0× (optimal)	~2.5%

Sweet spot: Moderate-to-strong plumes ( $0.15 < \Delta\tau < 0.3$ ) with good SNR (>30:1)

When log beats combined:

Combined approximation error > log noise penalty
Typically $\Delta\tau > 0.15$ (~ 750 ppm)

When nonlinear beats log:

Low SNR scenarios (< 30:1)
Noise amplification becomes prohibitive

Strengths and Limitations¶

Strengths:

✓ Exact exponential: Perfect Beer-Lambert physics
✓ All optical depths: No validity range restriction
✓ Closed-form: Fast as combined model
✓ Linear algebra: Simple inversion
✓ Theoretical elegance: Mathematical transformation, not approximation

Limitations:

✗ Non-Gaussian noise: Violates maximum likelihood optimality
✗ Noise amplification: Exponentially increasing with absorption
✗ Bias for low SNR: Logarithm of noisy signal systematically biased
✗ Unstable for dark scenes: Division by small numbers
✗ Limited adoption: Not widely used operationally (complex noise model)

Verdict: Niche application — use for moderate-to-strong plumes with high SNR when exact physics needed but iteration unacceptable. Best as refinement step after combined model identifies strong plumes.

7. The Normalization Framework {#7-normalization}¶

Why Normalize? The Calibration Challenge¶

All models above used absolute radiance $L$ [W·m $^{-2}$ ·sr $^{-1}$ ·nm $^{-1}$ ], which depends on:

L = \frac{F_0 R}{\pi} \cdot \text{(absorption terms)}

(69)

Challenging dependencies:

Parameter	Typical Uncertainty	Source of Variability
$F_0$ (solar irradiance)	3-5%	Solar zenith angle, Earth-Sun distance, atmospheric scattering
$R$ (surface reflectance)	10-50%	Surface type, viewing geometry, BRDF effects, atmospheric correction
$\tau_{\text{bg}}$ (background optical depth)	5-15%	Background VMR variability, temperature, pressure, path length

Total absolute calibration uncertainty: Often 15-20% or higher.

The Normalized Solution¶

Define normalized (relative) radiance:

\boxed{L_{\text{norm}} = \frac{L}{L_{\text{bg}}} = \frac{\text{(plume pixel radiance)}}{\text{(background pixel radiance)}}}

(70)

Units: dimensionless

Key cancellations:

L_{\text{norm}} = \frac{\frac{F_0 R}{\pi} \exp(-\tau_{\text{total}})}{\frac{F_0 R}{\pi} \exp(-\tau_{\text{bg}})} = \frac{\exp(-(\tau_{\text{bg}} + \Delta\tau))}{\exp(-\tau_{\text{bg}})} = \exp(-\Delta\tau)

(71)

All scene properties cancel!

\boxed{L_{\text{norm}} = \exp(-\Delta\tau)}

(72)

Only depends on plume properties: $\Delta\tau = \sigma \cdot N_{\text{total}} \cdot \Delta\text{VMR} \cdot 10^{-6} \cdot L \cdot \text{AMF}$

Advantages of Normalization¶

$F_0$ independent: Solar irradiance cancels
- Robust to time-of-day, season, latitude
- No need for solar spectrum knowledge
$R$ independent: Surface reflectance cancels
- Works over diverse surface types
- No BRDF modeling required
- Robust to spatial reflectance variations
$\tau_{\text{bg}}$ independent: Background atmosphere cancels
- Robust to background VMR variability
- Atmospheric correction simplified
- Temperature/pressure effects reduced
Self-calibrating: Only relative measurements needed
- Can construct from ratios directly
- Reduces calibration requirements
- More robust to instrument drift

Normalized Versions of All Models¶

Each absolute model has a normalized counterpart:

Model	Absolute Forward Model	Normalized Forward Model
Nonlinear	$L = L_{\text{bg}} \exp(-\Delta\tau)$	$L_{\text{norm}} = \exp(-\Delta\tau)$
MacLaurin	$L \approx L_{\text{bg}}(1 - \Delta\tau)$	$L_{\text{norm}} \approx 1 - \Delta\tau$
Taylor	$L = L_{\text{bg}} \exp(-\Delta\tau)$	$L_{\text{norm}} = \exp(-\Delta\tau)$
Combined	$L \approx L_{\text{bg}}(1 - \Delta\tau)$	$L_{\text{norm}} \approx 1 - \Delta\tau$
Log	$\ln(L/L_{\text{bg}}) = -\Delta\tau$	$\ln(L_{\text{norm}}) = -\Delta\tau$

Pattern: Normalized versions have simpler forms — scene factors $F_0$ , $R$ removed.

Normalized Matched Filter (Most Common)¶

For combined approximation, the normalized matched filter:

Forward model:

\mathbf{y}_{\text{norm}} = \mathbf{1} - \mathbf{H} \alpha + \boldsymbol{\epsilon}_{\text{norm}}

(73)

where:

$\mathbf{1}$ : vector of ones (background = unity by definition)
$\mathbf{H} = \boldsymbol{\sigma} \odot N_{\text{total}} \cdot 10^{-6} L \cdot \text{AMF}$ : sensitivity [ppm $^{-1}$ ]

Innovation:

\mathbf{d} = \mathbf{1} - \mathbf{y}_{\text{norm}}

(74)

Matched filter:

\boxed{\hat{\alpha} = \frac{\mathbf{H}^T \mathbf{\Sigma}_{\text{norm}}^{-1} \mathbf{d}}{\mathbf{H}^T \mathbf{\Sigma}_{\text{norm}}^{-1} \mathbf{H}} = \frac{\mathbf{H}^T \mathbf{\Sigma}_{\text{norm}}^{-1} (\mathbf{1} - \mathbf{y}_{\text{norm}})}{\mathbf{H}^T \mathbf{\Sigma}_{\text{norm}}^{-1} \mathbf{H}}}

(75)

Units: ppm

This is the standard operational form used in:

AVIRIS-NG
GHGSat
Carbon Mapper
Most research algorithms

The Trade-off: Background Estimation¶

Normalization is not free — requires estimating $L_{\text{bg}}$ :

\mathbf{y}_{\text{norm}} = \frac{\mathbf{y}_{\text{abs}}}{\mathbf{L}_{\text{bg}}}

(76)

Challenge: How to estimate $\mathbf{L}_{\text{bg}}$ from the scene?

8. Practical Implementation of Normalized Models {#8-practical}¶

Background Estimation Strategies¶

Method 1: Spatial Background (Most Common)¶

Approach: Select plume-free pixels from the same scene.

Procedure:

Identify background pixels (no plume)
Compute statistics (mean, median, percentile)
Use as $\mathbf{L}_{\text{bg}}$

Options:

Method	Formula	Pros	Cons
Median	$\mathbf{L}_{\text{bg}} = \text{median}(\mathbf{L}_{\text{pixels}})$	Robust to outliers	May miss plume contamination
10th percentile	$\mathbf{L}_{\text{bg}} = \text{percentile}(\mathbf{L}, 10)$	Conservative	Requires many pixels
Gaussian mixture	Fit 2-component GMM, use lower mode	Automatic	Complex, can fail
Spatial filtering	Minimum filter (large kernel)	Spatially varying	Computationally expensive

Best practice:

Use median of 50-100 spatially distributed background pixels
Select from same scene (same illumination, surface type)
Visual verification or automated quality control

Typical uncertainty: 1-3% in background estimation → propagates to VMR uncertainty

Method 2: Temporal Background¶

Approach: Use same location at different time without plume.

Procedure:

Acquire reference image (before plume or after dissipation)
Register to current image (geometric correction)
Use reference as $\mathbf{L}_{\text{bg}}$

Pros:

Exact surface matching: Same pixel, same BRDF
No spatial assumptions: Works for single pixels
Plume-free guarantee: Temporal separation ensures no contamination

Cons:

Requires multiple acquisitions: Not always available
Illumination changes: Solar angle, atmospheric conditions differ
Surface changes: Vegetation growth, snow cover, soil moisture

Applications:

Persistent monitoring of facilities
Time-series analysis
Change detection studies

Correction needed: Adjust for solar zenith angle changes, seasonal variations

Method 3: Model-Based Background¶

Approach: Predict $\mathbf{L}_{\text{bg}}$ from atmospheric/surface models.

Procedure:

Run radiative transfer model (MODTRAN, 6S, VLIDORT)
Input: Surface reflectance, atmospheric profile, geometry
Output: Predicted $\mathbf{L}_{\text{bg}}$

Pros:

No scene requirements: Works anywhere
Physically based: Incorporates atmospheric science
Flexible: Can test different scenarios

Cons:

Model errors: 5-15% typical uncertainty
Input requirements: Need surface, atmosphere characterization
Computationally expensive: Minutes per spectrum

Applications:

Absolute radiometric validation
Scenes without background pixels
Scientific studies requiring full characterization

Noise Transformation in Normalized Space¶

Absolute noise model:

\mathbf{y}_{\text{abs}} = \mathbf{L}_{\text{true}} + \boldsymbol{\epsilon}_{\text{abs}}

(77)

where $\boldsymbol{\epsilon}_{\text{abs}} \sim \mathcal{N}(\mathbf{0}, \mathbf{\Sigma}_{\text{abs}})$

After normalization:

\mathbf{y}_{\text{norm}} = \frac{\mathbf{y}_{\text{abs}}}{\mathbf{L}_{\text{bg}}} = \frac{\mathbf{L}_{\text{true}} + \boldsymbol{\epsilon}_{\text{abs}}}{\mathbf{L}_{\text{bg}}}

(78)

First-order approximation:

\mathbf{y}_{\text{norm}} \approx \frac{\mathbf{L}_{\text{true}}}{\mathbf{L}_{\text{bg}}} + \frac{\boldsymbol{\epsilon}_{\text{abs}}}{\mathbf{L}_{\text{bg}}}

(79)

Transformed noise:

\boldsymbol{\epsilon}_{\text{norm}} = \frac{\boldsymbol{\epsilon}_{\text{abs}}}{\mathbf{L}_{\text{bg}}}

(80)

Noise covariance in normalized space:

For diagonal $\mathbf{\Sigma}_{\text{abs}} = \text{diag}(\sigma^2_{\text{abs},1}, \ldots, \sigma^2_{\text{abs},n})$ :

\mathbf{\Sigma}_{\text{norm}} = \text{diag}\left(\frac{\sigma^2_{\text{abs},1}}{L^2_{\text{bg},1}}, \ldots, \frac{\sigma^2_{\text{abs},n}}{L^2_{\text{bg},n}}\right)

(81)

Physical interpretation:

Scene Type	$L_{\text{bg}}$	Normalized Noise
Bright (snow, desert)	Large	Small (good SNR)
Moderate (vegetation)	Medium	Medium
Dark (ocean, forest)	Small	Large (poor SNR)

Implication: Detection easier over bright surfaces, harder over dark.

Practical Workflow for Normalized Models¶

Stage 1: Preprocessing

Input: Calibrated radiance cube y_abs [W·m⁻²·sr⁻¹·nm⁻¹]

1. Estimate background:
   - Select background pixels (manual or automatic)
   - Compute L_bg = median(background_pixels)
   
2. Normalize:
   - y_norm = y_abs / L_bg
   
3. Estimate normalized noise covariance:
   - Sigma_norm = diag(Sigma_abs / L_bg²)

Stage 2: Detection/Retrieval

For combined model:

1. Compute sensitivity vector:
   H = sigma ⊙ N_total · 10⁻⁶ · L · AMF
   
2. Compute innovation:
   d = 1 - y_norm
   
3. Matched filter:
   alpha_hat = (H^T Sigma_norm^(-1) d) / (H^T Sigma_norm^(-1) H)
   
4. Detection statistic:
   delta = H^T Sigma_norm^(-1) d
   threshold = 3 × sqrt(H^T Sigma_norm H)
   
5. Detect if: delta > threshold

Stage 3: Quality Control

1. Check validity:
   - Ensure Delta_tau < 0.1 for combined model
   - If Delta_tau > 0.1, flag for refinement
   
2. Uncertainty quantification:
   - sigma_alpha = sqrt((H^T Sigma_norm^(-1) H)^(-1))
   
3. Physical checks:
   - alpha > 0 (positive enhancement)
   - alpha < 10000 ppm (reasonable range)
   - Spectral residuals within noise

Implementation Tips¶

For operational systems:

Precompute constant terms:
- $\mathbf{H}$ : wavelength-dependent, constant for scene
- $\mathbf{\Sigma}_{\text{norm}}^{-1}\mathbf{H}$ : reused for all pixels
- $\mathbf{H}^T\mathbf{\Sigma}_{\text{norm}}^{-1}\mathbf{H}$ : scalar, constant
Vectorize across pixels:
- Process entire image in batches
- Use numpy/JAX broadcasting
- GPU acceleration straightforward
Diagonal covariance assumption:
- Often sufficient (spectral correlations weak)
- Reduces $O(n^2)$ to $O(n)$
- 100× speedup typical
Adaptive background:
- Update $\mathbf{L}_{\text{bg}}$ spatially
- Window-based local background
- Accounts for surface heterogeneity

Common Pitfalls¶

Plume-contaminated background:
- Problem: Including plume pixels in background estimate
- Effect: Underestimates plumes (biased low)
- Solution: Conservative percentile (10th), visual QC
Surface type mismatch:
- Problem: Background from different surface than plume
- Effect: Spectral shape differences mimic plumes
- Solution: Spatially local background, same surface type
Division by zero:
- Problem: Very dark pixels ( $L_{\text{bg}} \approx 0$ )
- Effect: Numerical instability, infinite normalized radiance
- Solution: Threshold minimum $L_{\text{bg}}$ , mask dark pixels
Ignoring noise transformation:
- Problem: Using $\mathbf{\Sigma}_{\text{abs}}$ in normalized space
- Effect: Incorrect weighting, suboptimal estimates
- Solution: Transform covariance: $\mathbf{\Sigma}_{\text{norm}} = \mathbf{\Sigma}_{\text{abs}}/L_{\text{bg}}^2$

9. Computational Performance Analysis {#9-computational}¶

Single Pixel Performance¶

Test configuration:

Intel Xeon 3.0 GHz CPU (single core)
200 wavelengths (typical hyperspectral)
Diagonal covariance matrix
Double precision floating point

Model	Operations	Time	Memory
Nonlinear	20 iterations × $O(n^2)$	50 ms	5 MB
Taylor	10 iterations × $O(n^2)$	10 ms	5 MB
Log Transform	$O(n)$ + log operations	0.5 ms	2 MB
Combined	$O(n)$	0.5 ms	2 MB
MacLaurin	$O(n)$	0.5 ms	2 MB

Large Scene Performance¶

1 million pixel scene (1000 × 1000, typical satellite image):

Model	Time (Single CPU)	Time (16-core CPU)	Time (GPU A100)
Nonlinear	13.9 hours	52 minutes	5 minutes
Taylor	2.8 hours	10.5 minutes	1.5 minutes
Log	8.3 minutes	31 seconds	3 seconds
Combined	8.3 minutes	31 seconds	3 seconds

Speedup factors (vs. Nonlinear single CPU):

Model	CPU (1 core)	CPU (16 cores)	GPU
Nonlinear	1×	16×	167×
Taylor	5×	79×	554×
Combined	100×	1,605×	16,680×

Parallelization Efficiency¶

Strong scaling (fixed problem size, increasing processors):

Cores	Nonlinear	Taylor	Combined
1	1.00	1.00	1.00
4	3.85	3.92	3.98
16	14.3	15.1	15.8
64	48.2	56.3	62.1

Parallel efficiency:

Combined: 97% (nearly perfect)
Taylor: 88% (iterative overhead)
Nonlinear: 75% (iteration + convergence synchronization)

Why combined scales better:

Embarrassingly parallel (no inter-pixel communication)
No iterative dependencies
Uniform workload per pixel

GPU Acceleration¶

Performance on NVIDIA A100 (1 million pixels):

Model	CPU Time	GPU Time	GPU Speedup	GPU Utilization
Nonlinear	13.9 hr	5 min	167×	45%
Taylor	2.8 hr	1.5 min	112×	60%
Combined	8.3 min	3 sec	166×	95%

Why combined benefits more from GPU:

Linear algebra kernel (BLAS optimized)
High arithmetic intensity
No branching or iteration
Memory bandwidth limited (ideal for GPU)

Memory Footprint¶

Per million pixels:

Component	Nonlinear	Taylor	Combined
Observations ( $\mathbf{y}$ )	1.6 GB	1.6 GB	1.6 GB
Covariance ( $\mathbf{\Sigma}$ )	1.6 GB	1.6 GB	1.6 GB
Jacobian ( $\mathbf{J}$ )	1.6 GB (per iteration)	1.6 GB (per iteration)	1.6 GB (constant)
State ( $\alpha$ )	8 MB	8 MB	8 MB
Workspace (iteration)	3.2 GB	3.2 GB	0 GB
Total	~8 GB	~8 GB	~5 GB

Combined model advantages:

40% less memory (no iteration workspace)
Constant Jacobian (compute once)
Enables larger batch sizes on GPU

Power Consumption¶

Energy per million pixels:

Model	CPU Energy	GPU Energy	Embedded (Jetson)
Nonlinear	4.2 kWh	0.15 kWh	Infeasible (>10 hr)
Taylor	0.8 kWh	0.04 kWh	12.5 kWh (>50 hr)
Combined	0.05 kWh	0.001 kWh	0.2 kWh (20 min)

Critical for:

Satellite on-board processing (power constrained)
Aircraft operations (battery-powered instruments)
Edge computing (thermal limits)

Multi-Gas Retrieval Scaling¶

Retrieve 3 gases simultaneously (CH₄, CO₂, H₂O):

State vector size: 3× (3 gases)

Nonlinear complexity: Scales as $O(km^2n^2)$ where $m$ = gases

Model	Single Gas	3 Gases	Scaling
Nonlinear	13.9 hr	125 hr	$m^2 = 9×$
Taylor	2.8 hr	25 hr	$m^2 = 9×$
Combined	8.3 min	75 min	$m^2 = 9×$

With PCA decorrelation (combined only):

Combined + PCA: 8.3 min × 3 = 25 min (3× instead of 9×)

Why PCA helps:

Decorrelates gas signatures
Enables sequential processing
Reduces from $O(m^2)$ to $O(m)$
Only works for linear models

Accuracy vs. Speed Trade-off¶

For operational plumes ( $\Delta\tau$ = 0.05-0.15, 1M pixels):

Model	Median Error	95th Percentile	Processing Time	Efficiency*
Nonlinear	0.5%	2%	13.9 hr	1.0
Taylor	1.5%	4%	2.8 hr	4.5
Log	2%	5%	8.3 min	5.0
Combined	5%	12%	8.3 min	2.8

*Efficiency = (Baseline Speed / Model Speed) / (Model Error / Baseline Error)

Interpretation:

Taylor: Best efficiency for iterative approach
Combined: Best efficiency for closed-form approach
Log: Highest efficiency when exact physics critical

Hardware Platform Recommendations¶

Platform	Cores/CUDA	Memory	Power	Recommended Model
Laptop	4-8 CPU	16 GB	45 W	Combined
Workstation	16-32 CPU	64 GB	200 W	Combined or Taylor
Server	64-128 CPU	256 GB	400 W	Taylor or Nonlinear
Consumer GPU	2560 CUDA	8 GB	200 W	Combined
Datacenter GPU	6912 CUDA	80 GB	400 W	Taylor or Combined
ZTPU Pod	2048 cores	128 GB	500 W	Combined
Embedded (Jetson)	512 CUDA	8 GB	30 W	Combined only
On-board satellite	512 CUDA	4 GB	15 W	Combined only

10. Application-Specific Recommendations {#10-recommendations}¶

10.1 Detection (Binary Classification)¶

Objective¶

Primary goal: Identify presence/absence of methane plumes across large areas.

Binary decision: Pixel classified as “plume” or “background”

Key requirements:

High throughput: Process millions of pixels rapidly
Low false alarm rate: Typically <1% acceptable
Sensitivity: Detect weak signals (100-500 ppm enhancements)
Robustness: Work across diverse scenes and conditions

Recommended Model: Combined (Normalized)¶

Primary choice: Combined approximation with normalization

L_{\text{norm}} \approx 1 - \mathbf{H} \alpha

(82)

where $\mathbf{H} = \boldsymbol{\sigma} \odot N_{\text{total}} \cdot 10^{-6} L \cdot \text{AMF}$ [ppm $^{-1}$ ]

Rationale:

Factor	Why Combined Model Optimal
Speed	100× faster than nonlinear → enables full-scene processing
Accuracy	5-10% VMR error acceptable for binary decision
Calibration	Normalized form eliminates $F_0$ , $R$ dependencies
Simplicity	Single-pass, no convergence issues
Proven	Used in AVIRIS-NG, GHGSat, Carbon Mapper

Detection Workflow¶

Step 1: Preprocessing

1. Normalize radiance:
   y_norm = y_absolute / L_bg
   
2. Compute sensitivity:
   H = sigma ⊙ N_total · 10^{-6} · L · AMF
   
3. Estimate noise covariance:
   Sigma_norm = diag(sigma_norm_1^2, ..., sigma_norm_n^2)

Step 2: Detection statistic

\delta = \mathbf{H}^T \mathbf{\Sigma}_{\text{norm}}^{-1} (\mathbf{1} - \mathbf{y}_{\text{norm}})

(83)

Units: ppm $^{-1}$ (or can be normalized to dimensionless)

Physical meaning: Weighted correlation between observation and expected plume signature.

Step 3: Threshold determination

Under null hypothesis (no plume), $\delta \sim \mathcal{N}(0, \sigma^2_\delta)$ where:

\sigma^2_\delta = \mathbf{H}^T \mathbf{\Sigma}_{\text{norm}} \mathbf{H}

(84)

Threshold: $\lambda = k \cdot \sigma_\delta$ where:

$k$	Confidence	False Alarm Rate	Use Case
3	99.7%	0.13%	Standard operations
4	99.994%	0.003%	Conservative (regulatory)
5	99.99994%	0.00003%	Very conservative

Detection decision:

\text{Plume detected if } \delta > k \sigma_\delta

(85)

Step 4: Spatial clustering

Individual pixel detections may be noisy. Apply clustering:

1. Threshold at k*sigma_delta
2. Connected component analysis (4 or 8-connectivity)
3. Size filter: Require cluster > min_size (e.g., 5-10 pixels)
4. Shape filter: Remove linear features (artifacts)

Performance Metrics¶

For 1 million pixel scene, typical conditions:

Metric	Value
Processing time	8 minutes (single CPU) / 3 seconds (GPU)
True positive rate	95-98% for $\Delta\tau > 0.05$
False alarm rate	0.1-0.3% (adjustable with threshold)
Minimum detectable VMR	150-300 ppm (3σ, scene dependent)

Validation and Quality Control¶

Post-detection checks:

Spectral residual analysis:
- Compute: $\mathbf{r} = \mathbf{y}_{\text{norm}} - (\mathbf{1} - \mathbf{H}\hat{\alpha})$
- Check: $\|\mathbf{r}\|^2 < \chi^2_{n,0.95}$ (within noise)
Spatial coherence:
- Plumes should be spatially contiguous
- Flag isolated single-pixel detections
Physical plausibility:
- Check retrieved VMR > 0
- Check $\Delta\tau < 0.1$ (validity range)
- Check spatial gradients reasonable

When to Use Alternative Models¶

Upgrade to Taylor Only:

Initial detection shows $\hat{\alpha} > 500$ ppm
Re-process with Taylor for better accuracy
Adds ~5 minutes processing time

Upgrade to Nonlinear:

High-value targets (>1000 kg/hr emission rate)
Scientific studies requiring best accuracy
Can process just strong detections (1% of pixels)

10.2 Retrieval (Quantification)¶

Objective¶

Primary goal: Estimate VMR enhancement for detected plumes with <5% accuracy.

Key requirements:

Accuracy: Retrieve VMR within 5-10% for moderate plumes
Valid range: Handle enhancements from 100-2000 ppm
Quantitative: Provide uncertainty estimates
Efficient: Balance accuracy and speed

Model Selection Strategy¶

Adaptive approach based on plume strength:

Plume Strength ( $\Delta\tau$ )	VMR Range	Recommended Model	Error	Time/pixel
Weak (<0.05)	<250 ppm	Combined	5%	0.5 ms
Moderate (0.05-0.10)	250-500 ppm	Combined	5-10%	0.5 ms
Strong (0.10-0.20)	500-1000 ppm	Taylor Only	2-5%	10 ms
Very strong (0.20-0.30)	1000-1500 ppm	Taylor Only	3-7%	10 ms
Extreme (>0.30)	>1500 ppm	Nonlinear	<2%	50 ms

Workflow: Hybrid Approach¶

Stage 1: Initial screening (all pixels)

# Combined model - fast
alpha_initial = combined_matched_filter(y_norm, H, Sigma_norm)
delta_tau = H * alpha_initial

# Classify by strength
weak = (delta_tau < 0.10)
strong = (delta_tau >= 0.10) & (delta_tau < 0.30)
extreme = (delta_tau >= 0.30)

Stage 2: Refinement (strong plumes only)

# Taylor model for strong plumes
for pixel in strong_pixels:
    alpha_refined[pixel] = taylor_iterative(y_norm[pixel], 
                                           alpha_initial[pixel],
                                           H, Sigma_norm)

Stage 3: High-accuracy (extreme plumes only)

# Nonlinear for extreme plumes
for pixel in extreme_pixels:
    alpha_final[pixel] = nonlinear_gauss_newton(y_norm[pixel],
                                                alpha_refined[pixel],
                                                H, Sigma_norm)

Processing Time Breakdown¶

For 1 million pixel scene:

Stage	Pixels Processed	Model	Time	Percentage
Initial (all)	1,000,000	Combined	8 min	95% of pixels
Strong plumes	40,000 (4%)	Taylor	7 min	4% of pixels
Extreme plumes	10,000 (1%)	Nonlinear	8 min	1% of pixels
Total	-	Hybrid	23 min	vs. 14 hr full nonlinear

Speedup: 36× faster than full nonlinear, with comparable accuracy for 99% of pixels.

Uncertainty Quantification¶

For each retrieved VMR:

Combined model uncertainty:

\sigma^2_\alpha = \left(\mathbf{H}^T \mathbf{\Sigma}_{\text{norm}}^{-1} \mathbf{H}\right)^{-1}

(86)

Taylor/Nonlinear uncertainty:

\sigma^2_\alpha = \left(\mathbf{J}^T \mathbf{\Sigma}_{\text{norm}}^{-1} \mathbf{J}\right)^{-1}

(87)

where $\mathbf{J}$ is Jacobian at solution.

Report with confidence intervals:

\hat{\alpha} \pm 2\sigma_\alpha

(88)

Typical uncertainties:

Conditions	Combined	Taylor	Nonlinear
High SNR, bright	50-100 ppm	40-80 ppm	30-60 ppm
Moderate SNR	100-200 ppm	80-150 ppm	60-120 ppm
Low SNR, dark	200-500 ppm	150-350 ppm	100-250 ppm

Validation Approaches¶

1. Controlled release experiments:

Known emission rate
Compare retrieved VMR to expected
Assess bias and precision

2. Cross-comparison:

Process same scene with multiple models
Taylor vs. Nonlinear for strong plumes
Should agree within uncertainties

3. Spectral residual analysis:

Check $\chi^2$ goodness of fit
Residuals should be white noise

4. Spatial consistency:

Plume should be spatially smooth
Abrupt changes indicate artifacts

10.3 Flux Quantification¶

Objective¶

Primary goal: Estimate emission rate in kg·hr $^{-1}$ or tonnes·year $^{-1}$ .

Key requirements:

Highest accuracy: <10% total error budget
VMR precision: <5% (wind speed typically ±15%)
Full uncertainty characterization: Propagate all error sources
Scientifically defensible: Publication quality

Why Flux Requires High Accuracy¶

Flux estimation equation:

\text{Flux} = \int_{\text{plume}} (\text{VMR} - \text{VMR}_{\text{bg}}) \cdot \text{Wind} \cdot dA

(89)

Error propagation:

\sigma^2_{\text{Flux}} = \left(\frac{\partial \text{Flux}}{\partial \text{VMR}}\right)^2 \sigma^2_{\text{VMR}} + \left(\frac{\partial \text{Flux}}{\partial \text{Wind}}\right)^2 \sigma^2_{\text{Wind}} + \cdots

(90)

Typical uncertainties:

Source	Uncertainty	Contribution to Flux Error
Wind speed	15%	15%
Wind direction	5-10°	5-10%
Plume height	20-30%	10-20%
VMR (Combined)	5-10%	5-10%
VMR (Nonlinear)	1-3%	1-3%

Combined error (quadrature):

VMR Model	VMR Contribution	Total Flux Error
Combined (5-10%)	5-10%	20-28%
Taylor (2-5%)	2-5%	18-22%
Nonlinear (1-3%)	1-3%	17-20%

Conclusion: VMR must be known to <5% to avoid dominating flux uncertainty.

Recommended Model: Nonlinear¶

For final flux quantification, use nonlinear model:

L_{\text{norm}} = \exp(-\mathbf{H} \alpha)

(91)

Rationale:

Factor	Why Nonlinear Necessary
Accuracy	<2% VMR error achievable
Saturation	Correctly handles strong plumes
Bias	No systematic underestimation
Uncertainty	Proper covariance from Hessian
Scientific rigor	Exact physics, defensible

Processing strategy:

Don’t process entire scene with nonlinear. Instead:

Detection: Combined model (8 min)
Identify large sources: Flux > 100 kg·hr $^{-1}$ (typically 100-1000 pixels)
Nonlinear for sources only: 1000 pixels × 50 ms = 50 seconds
Total time: ~9 minutes (vs. 14 hours full nonlinear)

Full Flux Workflow¶

Step 1: Plume detection and delineation

1. Detect plumes (combined model)
2. Segment into individual sources
3. Define plume boundaries (e.g., VMR > 3*sigma_bg)

Step 2: Nonlinear VMR retrieval

For each plume pixel:
    alpha[pixel] = nonlinear_optimization(y_norm[pixel], H, Sigma_norm)
    sigma_alpha[pixel] = sqrt(inverse(Hessian[pixel]))

Step 3: Wind field estimation

Options:
1. Weather model (WRF, HRRR): 15% typical uncertainty
2. Cross-sectional method: Use plume shape
3. Co-located lidar: <5% uncertainty (rare)

Step 4: Plume integration

# Integrate along cross-plume transect
for each downwind transect:
    integrated_mass = sum(alpha[pixels] * wind * pixel_area)
    
# Average over multiple transects
flux = mean(integrated_mass)
flux_uncertainty = sqrt(var(integrated_mass) + wind_uncertainty^2)

Step 5: Uncertainty budget

Error sources:
1. VMR: sigma_alpha (1-3% from nonlinear)
2. Wind speed: 15% (dominant)
3. Wind direction: 5-10%
4. Plume height: 20-30%
5. Pixel size: 2-5%

Total: sqrt(sum of squares) ≈ 20-35%

Validation Requirements¶

For publication-quality flux estimates:

Controlled release comparison:
- Known emission rate
- Multiple detection geometries
- Report bias and precision
Intercomparison:
- Multiple instruments/platforms
- Same source, same time
- Agreement within uncertainties
Meteorological validation:
- Co-located wind measurements
- Verify atmospheric stability
- Check boundary layer height
Sensitivity analysis:
- Vary all input parameters
- Monte Carlo uncertainty propagation
- Report 95% confidence intervals

When Computational Cost is Acceptable¶

Flux quantification is typically:

Focused on specific sources (not whole scene)
Done offline (not real-time)
For high-value targets (large emitters)
Scientific publications (accuracy critical)

Therefore: 50 seconds to 10 minutes for nonlinear processing is acceptable.

10.4 Multi-Gas Retrieval¶

Objective¶

Simultaneously retrieve multiple atmospheric species:

CH₄ (methane)
CO₂ (carbon dioxide)
H₂O (water vapor)
Potentially: O₂, CO, etc.

Key challenges:

Spectral overlap: Gases absorb at similar wavelengths
Correlated retrievals: Errors in one gas affect others
Computational cost: State vector size increases
Ill-conditioning: Singular value issues

State Vector Formulation¶

Single gas: $\mathbf{x} = [\alpha_{\text{CH}_4}]$

Multi-gas: $\mathbf{x} = [\alpha_{\text{CH}_4}, \alpha_{\text{CO}_2}, \alpha_{\text{H}_2\text{O}}]^T$

Forward model (nonlinear):

L_{\text{norm}} = \exp\left(-\sum_i \mathbf{H}_i \alpha_i\right) = \exp(-\mathbf{H}_{\text{CH}_4}\alpha_{\text{CH}_4} - \mathbf{H}_{\text{CO}_2}\alpha_{\text{CO}_2} - \mathbf{H}_{\text{H}_2\text{O}}\alpha_{\text{H}_2\text{O}})

(92)

Complexity: $O(m^2n^2)$ where $m$ = number of gases

Model Recommendations by Application¶

Scientific Multi-Gas Retrieval¶

Goal: Accurate partitioning of gases for process studies.

Recommended model: Nonlinear joint retrieval

\hat{\mathbf{x}} = \arg\min_{\mathbf{x}} \|(\mathbf{y}_{\text{norm}} - \exp(-\mathbf{H}\mathbf{x}))^T \mathbf{\Sigma}^{-1} (\mathbf{y}_{\text{norm}} - \exp(-\mathbf{H}\mathbf{x}))\|

(93)

where $\mathbf{H} = [\mathbf{H}_{\text{CH}_4} | \mathbf{H}_{\text{CO}_2} | \mathbf{H}_{\text{H}_2\text{O}}]$ (n × m matrix)

Processing time: For 3 gases, 1M pixels:

Nonlinear: 125 hours (single CPU)
Parallelized (16 cores): ~8 hours
Practical: Process subset of pixels or use GPU

Advantages:

Accurate partitioning: Accounts for spectral overlap
Full error covariance: Cross-correlations between gases
Physically rigorous: Exact Beer’s law for all gases

Applications:

CO₂/CH₄ ratio studies (distinguish sources)
Isotopologue studies (¹²CH₄ vs. ¹³CH₄)
Validation studies

Operational Multi-Gas Retrieval¶

Goal: Rapid identification of gas composition.

Recommended model: Combined with PCA decorrelation

Procedure:

Step 1: Spectral decomposition

# Construct multi-gas Jacobian
H_multi = [H_CH4, H_CO2, H_H2O]  # n × 3 matrix

# Singular value decomposition
U, S, Vt = svd(H_multi)

# Principal components (decorrelated directions)
PC = Vt.T  # 3 × 3 rotation matrix

Step 2: Retrieve in PC space

# Transform observations to PC space
y_PC = PC.T @ (1 - y_norm)

# Retrieve each PC independently (no cross-terms)
for i in range(3):
    alpha_PC[i] = matched_filter(y_PC[i], PC[:,i], Sigma_norm)

Step 3: Back-transform to gas space

# Recover individual gases
alpha_gases = PC @ alpha_PC
alpha_CH4 = alpha_gases[0]
alpha_CO2 = alpha_gases[1]
alpha_H2O = alpha_gases[2]

Processing time: For 3 gases, 1M pixels:

Combined + PCA: 25 minutes (3× single gas)
vs. 125 hours for nonlinear

Speedup: 300× faster than joint nonlinear

Advantages:

Decorrelates gases: Independent retrievals
Linear complexity: Scales as $O(m)$ not $O(m^2)$
Fast: Suitable for operations
Reasonable accuracy: 10-15% typical

Limitations:

Approximate: Assumes weak absorption for all gases
No cross-correlation: Uncertainty may be underestimated
Spectral interference: Not fully accounted for

Applications:

Operational monitoring (daily surveys)
Initial gas identification
Large-area screening

When to Use Which Approach¶

Application	Data Volume	Accuracy Need	Recommended
Science study	10-100 scenes	<5% per gas	Nonlinear joint
Validation	1-10 scenes	<3% per gas	Nonlinear joint
Operations	100-1000 scenes/day	10-15% per gas	Combined + PCA
Rapid screening	Real-time	20% per gas	Combined sequential

10.5 Real-Time Operations¶

Objective¶

Process data during or immediately after acquisition for:

In-flight decision making (aircraft missions)
Near-real-time alerts (satellite operations)
Operational response (leak detection services)

Key requirements:

Latency: <1 minute per scene for decisions
Throughput: Keep pace with acquisition rate
Reliability: No failures, always produces result
Simplicity: Minimal human intervention

Latency Requirements by Platform¶

Platform	Acquisition Rate	Latency Requirement	Data Volume
Aircraft (AVIRIS-NG)	1 scene/10 sec	<10 seconds (next flight adjustment)	300k pixels/scene
Satellite (GHGSat)	100 scenes/day	<10 minutes (daily report)	1M pixels/scene
UAV	Continuous	<1 second (flight path control)	100k pixels/scene
Ground-based	1 scene/minute	<1 minute (monitoring dashboard)	500k pixels/scene

Model Selection: Combined Only¶

For all real-time operations: Combined model (normalized)

Processing times:

Platform	Hardware	Processing Time	Meets Requirement?
Aircraft	Embedded GPU (Jetson Xavier)	2 seconds/scene	✓ Yes
Satellite	On-board FPGA	5 minutes/scene	✓ Yes
UAV	Edge CPU (Intel NUC)	0.5 seconds/scene	✓ Yes
Ground	Workstation GPU	0.1 seconds/scene	✓ Yes

Why only combined model works:

Speed: Only model fast enough for <1 second latency
Reliability: No convergence failures
Power: Low enough for embedded systems
Memory: Fits in constrained devices

Example: Aircraft Real-Time Processing¶

AVIRIS-NG airborne hyperspectral imager:

Acquisition:

600 flightlines per campaign
Each flightline: 600 scenes
Each scene: 512×600 pixels = 307,200 pixels
Total: 111 million pixels per campaign

Real-time requirement:

Pilot needs feedback within 1 minute
Decide whether to:
- Adjust flight path
- Make additional pass
- Target specific areas

Implementation:

# On-board processing (Jetson Xavier, 30W)
while acquiring_
    scene = acquire_next_scene()  # 10 seconds
    
    # Real-time processing (2 seconds)
    y_norm = normalize(scene)
    detections = combined_matched_filter(y_norm)
    
    # Decision support
    if large_plume_detected(detections):
        alert_pilot("Large plume at lat/lon")
        suggest_refly_pattern()
    
    # Continue acquisition

Performance:

Acquisition: 10 sec
Processing: 2 sec
Total latency: 12 seconds → pilot sees results of previous scene while acquiring current

Impact:

Can adjust flight pattern in real-time
Maximize coverage of active plumes
Minimize time over non-emitting areas

Example: Satellite Near-Real-Time Alerts¶

GHGSat constellation:

Acquisition:

10 satellites
10 images per satellite per day
Each image: 1 million pixels
Total: 100 million pixels per day

Operational requirement:

Daily report to customers within 4 hours of acquisition
Email alerts for large sources within 1 hour

Implementation:

# Ground processing (cloud-based, auto-scaling)
for each acquired_image:
    # Downlink (30 minutes)
    download_image()
    
    # Processing (5 minutes per image, GPU)
    y_norm = preprocess(image)
    detections = combined_matched_filter(y_norm)
    alpha_estimates = detections['vmr']
    
    # Large source check
    large_sources = (alpha_estimates > 1000) & (cluster_size > 10)
    
    if any(large_sources):
        # Alert within 1 hour
        flux_estimate = quick_flux(large_sources, wind_model)
        send_alert(customer, flux_estimate, confidence)
    
    # Daily report (batch all images)
    daily_summary[image_id] = {
        'n_detections': len(detections),
        'max_vmr': max(alpha_estimates),
        'locations': plume_locations
    }

Performance:

100 images × 5 min = 500 minutes = 8.3 hours (sequential)
With 10 GPUs parallel: 50 minutes
Meets 4-hour requirement: ✓

Edge Computing Constraints¶

For on-board processing (satellite/aircraft):

Resource	Constraint	Combined Model	Nonlinear Model
Power	15-30 W	5-10 W (feasible)	25-50 W (exceeds)
Memory	4-8 GB	3 GB (fits)	8 GB (marginal)
Compute	512 CUDA cores	2 sec/scene (OK)	5 min/scene (too slow)
Thermal	Passive cooling	10 W (OK)	40 W (needs active)

Verdict: Only combined model viable for edge deployment.

10.6 Scientific Studies vs. Operational Monitoring¶

Scientific Studies¶

Characteristics:

Small number of scenes (10-100)
Publication quality required
Extensive validation
Uncertainty quantification critical
Computational resources available
Weeks to months timescale

Recommended approach:

Component	Model	Rationale
Initial screening	Combined	Identify ROIs quickly
VMR retrieval	Nonlinear	Best accuracy
Multi-gas	Nonlinear joint	Accurate partitioning
Uncertainty	Full covariance	Proper error propagation
Validation	Controlled releases	Ground truth

Typical workflow:

Week 1: Data acquisition and initial processing
  - Combined model for all scenes
  - Identify targets of interest
  
Week 2-3: Detailed analysis
  - Nonlinear retrieval for all plume pixels
  - Multi-gas analysis for selected scenes
  - Cross-validation with meteorology
  
Week 4: Uncertainty quantification
  - Monte Carlo error propagation
  - Sensitivity analysis
  - Comparison with models
  
Week 5-8: Paper preparation
  - Figures, tables, analysis
  - Peer review revisions

Computational budget:

100 scenes × 14 hours = 1400 CPU-hours = acceptable for research cluster
With 100 cores: 14 hours wall-clock time

Operational Monitoring¶

Characteristics:

Large number of scenes (100-1000+ per day)
Speed critical
Good-enough accuracy acceptable
Minimal human intervention
Cost constraints
Real-time to daily timescale

Recommended approach:

Component	Model	Rationale
Detection	Combined	Fast screening
Quantification	Combined	Sufficient for most
Refinement	Taylor (strong only)	Balance speed/accuracy
Multi-gas	PCA decorrelation	Fast separation
Alerts	Automated threshold	No human review

Typical workflow:

Daily operations:

06:00 - Data acquisition begins (satellite constellation)
08:00 - Downlink complete for morning passes
08:05 - Automated processing starts (Combined model)
08:15 - Initial detections available (10 min for 100 scenes)
08:20 - Large source identification
08:30 - Automated alerts sent to customers
09:00 - Daily summary dashboard updated

Continuous:
- Monitor for new acquisitions
- Process as data arrives
- No batch waiting

Computational architecture:

Cloud auto-scaling (AWS, GCP)
Cost optimization: Spot instances
Horizontal scaling: Process scenes in parallel
Typical cost: $0.01-0.05 per scene

Decision Matrix¶

Criterion	Scientific	Operational
Accuracy requirement	<2% VMR	<10% VMR
Processing time	Days-weeks OK	Minutes-hours
Cost tolerance	High ($1000s per study)	Low ($100s per day)
Validation	Extensive	Spot checks
Human oversight	Continuous	Minimal
Model choice	Nonlinear	Combined
Hardware	Research cluster	Cloud/edge

10.7 Hardware Platform Optimization¶

CPU-Based Systems¶

When to use:

Small to medium data volumes (<100k pixels)
Workstation or server available
Taylor or Nonlinear model acceptable
Budget constraints (no GPU purchase)

Optimization strategies:

1. Multi-threading:

from concurrent.futures import ProcessPoolExecutor

def process_pixel(pixel_data):
    return matched_filter(pixel_data)

with ProcessPoolExecutor(max_workers=16) as executor:
    results = executor.map(process_pixel, all_pixels)

2. Vectorization:

# Bad: Loop over pixels
for i in range(n_pixels):
    alpha[i] = process(y[i])

# Good: Vectorize
alpha = numpy.einsum('ijk,jk->i', H, y)  # Fast BLAS operations

3. Memory locality:

Process pixels in spatial blocks (cache-friendly)
Pre-compute and store constant terms
Minimize memory allocations

Performance: 16-core Xeon can process Combined model at ~30k pixels/second

GPU-Based Systems¶

When to use:

Large data volumes (>1M pixels)
Real-time or near-real-time requirements
Combined or Taylor models
Budget allows GPU purchase

Consumer GPU (NVIDIA RTX 4090):

Cost: ~$1600
CUDA cores: 16,384
Memory: 24 GB
Power: 450 W
Performance: 500k pixels/second (Combined model)

Datacenter GPU (NVIDIA A100):

Cost: ~$10,000
CUDA cores: 6,912
Memory: 80 GB
Power: 400 W
Performance: 2M pixels/second (Combined model), 200k pixels/second (Taylor)

Optimization strategies:

1. Batch processing:

# Process 10,000 pixels simultaneously
batch_size = 10000
for batch in range(0, n_pixels, batch_size):
    y_batch = y[batch:batch+batch_size]
    alpha_batch = gpu_matched_filter(y_batch)

2. Kernel optimization:

Coalesce memory access
Minimize host-device transfers
Use shared memory for H, Sigma

3. Mixed precision:

Float16 for forward model (2× speedup)
Float32 for inversion (numerical stability)

TPU-Based Systems¶

When to use:

Very large data volumes (>10M pixels)
Cloud deployment (Google Cloud)
Combined model only
Cost optimization (TPU cheaper than GPU for linear algebra)

Google Cloud TPU v3:

Cost: ~$8/hour
Matrix multiply units: 2× 128×128
Memory: 128 GB HBM
Performance: 5M pixels/second (Combined model)

Best for:

Batch processing large datasets
Monthly/annual processing campaigns
Cost per pixel matters more than absolute speed

Embedded Systems¶

When to use:

On-board satellite processing
Aircraft real-time operations
UAV autonomous systems
Power/weight/thermal constrained

NVIDIA Jetson Xavier NX:

Cost: ~$400
CUDA cores: 384
Memory: 8 GB
Power: 10-15 W
Performance: 20k pixels/second (Combined model only)

Constraints:

Must use Combined model (only fast enough)
Thermal management critical (passive cooling)
Memory limited (can’t hold full scene)
Process in streaming fashion

Implementation:

# Streaming processing for embedded
while acquiring:
    line = sensor.read_line()  # One line at a time
    y_norm = normalize(line)
    detections = combined_mf(y_norm)
    if any(detections):
        store_for_downlink(detections)
    # Don't store full scene (memory limited)

10.8 Comprehensive Decision Framework¶

Quick Selection Guide¶

Start here: What is your primary constraint?

Constraint: Speed (need results in <10 minutes)¶

→ Combined model (normalized)

Detection and quantification both possible
Accuracy: 5-10% for $\Delta\tau < 0.1$
Hardware: Any (CPU, GPU, embedded)

Constraint: Accuracy (need <2% VMR error)¶

→ Nonlinear model

Use normalized version if calibration uncertain
Requires iteration (minutes to hours)
Hardware: CPU cluster or datacenter GPU

Constraint: Moderate plumes (500-1500 ppm)¶

→ Taylor model or Log transform

Taylor: Better for low SNR
Log: Better for high SNR, exact physics
Both require some iteration

Constraint: Multi-gas retrieval¶

Science: Nonlinear joint retrieval
Operations: Combined + PCA decorrelation

Constraint: Power/embedded¶

→ Combined model only

Only model feasible for <30W
Real-time possible on Jetson-class hardware

Application Decision Tree¶

Start
├─ Detection only?
│  └─ YES → Combined (normalized)
│     Time: 8 min/1M pixels
│
├─ Quantification needed?
│  ├─ Flux estimation? (high accuracy)
│  │  └─ YES → Nonlinear
│  │     Time: Selective (9 min for 1% of pixels)
│  │
│  └─ VMR retrieval? (moderate accuracy)
│     ├─ Expected $\Delta\tau$ < 0.1?
│     │  └─ YES → Combined
│     │     Time: 8 min/1M pixels
│     │
│     ├─ Expected 0.1 < $\Delta\tau$ < 0.3?
│     │  └─ YES → Taylor or Log
│     │     Time: 2-3 hr/1M pixels
│     │
│     └─ Expected $\Delta\tau$ > 0.3?
│        └─ YES → Nonlinear
│           Time: 14 hr/1M pixels (or selective)
│
├─ Multi-gas?
│  ├─ Science quality?
│  │  └─ YES → Nonlinear joint
│  │     Time: 125 hr/1M pixels (3 gases)
│  │
│  └─ Operational?
│     └─ YES → Combined + PCA
│        Time: 25 min/1M pixels (3 gases)
│
└─ Real-time required?
   └─ YES → Combined only
      On-board: 2 sec/scene (embedded GPU)
      Ground: 0.1 sec/scene (workstation GPU)

Complete Comparison Table¶

Application	Primary Goal	Model	Time (1M px)	Accuracy	Hardware	Cost/Scene
Detection	Find plumes	Combined	8 min	N/A (binary)	CPU/GPU	$0.01
Screening	Rapid survey	Combined	8 min	5-10%	CPU/GPU	$0.01
Retrieval (weak)	VMR <500 ppm	Combined	8 min	5-10%	CPU/GPU	$0.01
Retrieval (moderate)	VMR 500-1500 ppm	Taylor	2.8 hr	2-5%	CPU	$0.50
Retrieval (strong)	VMR >1500 ppm	Nonlinear	14 hr	<2%	CPU cluster	$5.00
Flux (science)	Emission rate	Nonlinear	9 min†	<2%	CPU	$0.10
Multi-gas (science)	Gas ratios	Nonlinear joint	125 hr	<3%	GPU	$20.00
Multi-gas (ops)	Gas ID	Combined+PCA	25 min	10-15%	GPU	$0.05
Real-time (aircraft)	Flight guidance	Combined	2 sec	5-10%	Embedded	$0.001
Real-time (satellite)	Alerts	Combined	5 min	5-10%	On-board	$0.01

†Selective processing: Only strong plumes processed with nonlinear

10.9 Future Directions and Recommendations¶

Machine Learning Augmentation¶

Hybrid physics-ML approaches:

Concept: Train neural network to predict nonlinear result from combined input.

Combined model (fast)
    ↓
Neural network (learned correction)
    ↓
Nonlinear-quality output

Advantages:

100× speedup with nonlinear accuracy
No iteration required
Handles saturation correctly

Requirements:

Large training dataset (10,000+ scenes)
Diverse conditions (surfaces, atmospheres, plumes)
Continuous retraining (instrument drift)

Status: Active research area, promising results

Adaptive Algorithms¶

Concept: Automatically select model based on scene characteristics.

def adaptive_retrieval(y_norm, H, Sigma):
    # Stage 1: Combined (all pixels)
    alpha_initial = combined_model(y_norm)
    delta_tau = H * alpha_initial
    
    # Stage 2: Classify and route
    weak = (delta_tau < 0.10)
    moderate = (0.10 <= delta_tau < 0.30)
    strong = (delta_tau >= 0.30)
    
    # Stage 3: Adaptive refinement
    alpha_final = alpha_initial.copy()
    alpha_final[moderate] = taylor_model(y_norm[moderate])
    alpha_final[strong] = nonlinear_model(y_norm[strong])
    
    return alpha_final

Advantages:

Optimal accuracy/speed for each pixel
No user decision required
Automatic quality adaptation

Implementation: Straightforward with modern frameworks

Next-Generation Instruments¶

Higher spectral resolution:

Current: 200-400 bands
Future: 1000-2000 bands
Impact: Better gas separation, but 5-10× more computation

Model adaptation needed:

Combined model scales linearly ( $O(n)$ ) → still feasible
Nonlinear scales quadratically ( $O(n^2)$ ) → becomes prohibitive
Solution: Dimensionality reduction (PCA, subset selection)

Higher spatial resolution:

Current: 30-60 m pixels
Future: 5-10 m pixels
Impact: 36-144× more pixels per scene

Computational requirement:

Combined model remains only viable option
Real-time processing requires GPU
On-board processing becomes critical

10.10 Final Recommendations Summary¶

By Application Type¶

If Your Application Is...	Use This Model	With This Hardware
Operational detection	Combined (normalized)	GPU (consumer or cloud)
Scientific flux quantification	Nonlinear	CPU cluster (selective processing)
Moderate plume study	Taylor	Multi-core CPU
Multi-gas (science)	Nonlinear joint	GPU (datacenter)
Multi-gas (operations)	Combined + PCA	GPU (consumer)
Real-time airborne	Combined	Embedded GPU (Jetson)
Real-time satellite	Combined	On-board FPGA or GPU
Low-cost screening	Combined	Laptop CPU

By Plume Strength¶

Plume Enhancement	$\Delta\tau$	Model	Error	Speed
Weak	<0.05	Combined	5%	Fastest
Moderate	0.05-0.10	Combined	5-10%	Fastest
Strong	0.10-0.20	Taylor or Log	2-5%	Fast
Very strong	0.20-0.30	Taylor or Nonlinear	2-3%	Moderate
Extreme	>0.30	Nonlinear	<2%	Slow

By Budget Constraints¶

Computational Budget	Recommended Strategy
Very low (<$100/month)	Combined on laptop/workstation
Low ($100-1000/month)	Combined on cloud + spot instances
Moderate ($1k-10k/month)	Taylor for most, Combined for detection
High ($10k+/month)	Nonlinear where needed, hybrid approach
Research grant	Full nonlinear, best accuracy

Key Takeaways¶

Combined model is the operational workhorse
- 95% of operational applications use it
- Fast, robust, proven
- Use normalized version for calibration independence
Taylor model is underutilized
- Excellent compromise for moderate plumes
- 5× faster than nonlinear
- 2-5% accuracy (vs. 5-10% for combined)
- Should be used more in practice
Nonlinear for high-stakes quantification
- Flux estimation for regulatory compliance
- Scientific publications
- Validation studies
- Use selectively, not for entire scenes
Log transform is niche but useful
- Moderate-strong plumes with high SNR
- Exact physics with fast computation
- Noise amplification limits applicability
Normalization is almost always preferred
- Eliminates largest uncertainty sources
- Requires background estimation (usually easy)
- Standard practice in remote sensing
Hardware matters
- GPU enables real-time for combined model
- Embedded systems require combined model
- CPU clusters can handle selective nonlinear
Multi-gas requires different strategies
- Science: joint nonlinear (slow but accurate)
- Operations: PCA decorrelation (fast but approximate)
Adaptive hybrid workflows are optimal
- Combined for detection (all pixels)
- Taylor for moderate plumes (5% of pixels)
- Nonlinear for strong plumes (1% of pixels)
- Best overall performance

End of Comprehensive Analysis

This complete framework provides guidance for model selection across the full spectrum of atmospheric methane detection applications, from rapid operational screening to high-accuracy scientific quantification. The choice of model should be driven by application requirements, computational resources, and accuracy needs, with the combined approximation serving as the operational standard and nonlinear models reserved for high-accuracy quantification tasks.

Sources

Sources [1] Beer–Lambert law for optical tissue diagnostics https://pmc.ncbi.nlm.nih.gov/articles/PMC8553265/ [2] Beer-Lambert Law | Transmittance & Absorbance https://www.edinst.com/resource/the-beer-lambert-law/ [3] Beer–Lambert Law https://it.scribd.com/document/368414282/Beer-Lambert-Law [4] Radiative transfer - Wikipedia Radiative transfer