Skip to article frontmatterSkip to article content
  • Data Size - Small, Medium, Large
  • Linear -> NonLinear
  • Deterministic, Probabilistic, Bayesian

0.0 - Datasets

  • Fire
  • Drought
  • Agro
  • Temperature, Precipitation

1.0 - Learning with Observation Data

Data:
  • L2 Gappy Observations
    • Global Land Surface Atmospheric Variables - CDS
    • Global Marine Surface Meteorological Variables - CDS
    • SOCAT - Website
  • L3 Gap-Filled Observations
Cases:
  • Data - L2 Obs
  • Data - L2 Obs and L3 Interpolated Obs
Formulation:
f:y(Ωy,Ty)×Θy(Ωz,Tz)f: y (\Omega_y, \mathcal{T}_y ) \times \Theta \rightarrow y (\Omega_z, \mathcal{T}_z)

1.1 - Discretization

A method that discretizes the unstructured data into a structured representation, e.g., a Cartesian, rectilinear or curvilinear grid.

Use Case

  • Data 4 Learning -> Parameters, Interpolator
  • Data 4 Estimation -> State, Latent State

1.1.1 - Histogram


Supp. Material

  • Formulation
  • Viz of neighbors and radius Neighbours -
  • Link between histogram and parzen window -
  • Regression -
  • Gridding with geopandas -
  • sparse + xarray + geopandas -

1.2 - Non-Parametric Interpolator (Coordinate-Based)

A method that applies a non-parametric, coordinate-based regression algorithm to interpolate the observations based on SpatioTemporal location.

Use Case

  • Learning - Interpolated Maps
  • Estimation - Initial Conditions & Boundary Conditions 4 Data Assimilation

1.2.1 - Naive Methods

We will revisit the same methods used for the Discretization. This will include the kernel density method and the k nearest neighbors method.

a - PyInterp Baselines - Linear, IDW, RBF, Window Function, Kriging/OI/GPs, Splines


1.2.1a - Kernel Density Estimation

In this section, we look at kernel density estimation as a nonparametric methodology to gap-fill unstructured observations. We will start with the most basic method of k-nearest neighbors. Then we will look at scalable alternatives like KD-Tree, Ball-Tree, or FFT. We’ll also look at some ways to scale it via hardware like KeOps or cuml which both use advanced methods for taking advantage of GPUs.

Basic Methods:

Scaling

Applied Problems:


1.2.1b - KNN Interpolation

Here, we use k-nearest neighbors (KNN) to do interpolation. This is one of the simplest, most versatile algorithms available for learning. This is a more scalable method which uses the nearest neighbors to interpolate gappy data. We also showcase how we can modify the distance metric with inverse weighting or a custom distance function, e.g., Gaussian kernel.

Basic Methods:

  • Probabilistic Interpretation - Course

Scaling:

Example Applications:

  • Housing Interpolation w/ KNN + IDW - Medium

Strengths: K-nearest neighbors regression

  1. is a simple, intuitive algorithm,
  2. requires few assumptions about what the data must look like, and
  3. works well with non-linear relationships (i.e., if the relationship is not a straight line).
  4. The key merit of KNN is the quick computation time, easy interpretability, versatility to use across classification and regression problems and its non parametric nature (no need to any assumptions or data tuning)

Weaknesses: K-nearest neighbors regression

  1. becomes very slow as the training data gets larger,
  2. may not perform well with a large number of predictors, and
  3. may not predict well beyond the range of values input in your training data.
  4. In the KNN algorithm, for every new test data point, we need to find its distance to all of the training data points. This is quite hectic when we have a large data with several features. To solve this issue we can use some KNN extension methods like KD tree. I will discuss more on this in later blog posts.
  • KNN is also sensitive to irrelevant features but this issue can be addressed by feature selection. A possible solution is to perform PCA on the data and just chose the principal features for the KNN analysis.
  • KNN also needs to store all of the training data and this is can be quite costly in case of large data sets.

1.2.2 - GPs/OI/Kriging

This will feature tutorials to build up our GP/OI/Kriging mathematical proficiency. We will start by start by We will also look at some specific terminology, e.g., length scale vs lag


Applications

  • Data Assimilation - DA Window + LOWESS

We will use the LOWESS method to do interpolation on a subset of spatiotemporal data. We will look at 3 data types:

  1. sea surface height with very sparse structured randomness
  2. Sea surface temperature - dense structured randomness
  3. Land Temperature Data -

Software

  • Optimal Interpolation 4 Data Assimilation (OI4DA) - package + xarray interface + sklearn column transforms

From Scratch

Canonical Example

Scaling

Appendix


1.2.3 - Improved GPs - Moment-Based


1.2.4 - Improved GPs - Basis Functions


1.2.5 - State Space Gaussian Processes

In this improvement, we add the Markovian assumption which improves the scalability. See this video for a better introduction.


1.3 - Parametric Interpolator (Coordinate-Based)

Learns a parametric, coordinate-based, Differentiable Interpolator for fast queries and online training.

Use Case
  • Learning - Compressed Representation, Online Learning
  • Estimate - Fast Queries, Online Estimation
Formulation
y(s,t)=f(s,t;θ)y(s,t) = f(s,t;\theta)
Algorithms
  • Baseline - SIREN

  • Improvements - SpatioTemporal Encoders

  • Research - Physics Informed, Modulated, Scalable, Stochastic

  • a - SIREN

  • b - spatial coordinate encoders

  • c - temporal coordinate encoders

  • d - modulation

Scale

  • Hashing

Background - TimeEmbedding, SpatialEmbeddings


1.4 - Parametric SpatioTemporal Field Interpolator (Field-Based)

These methods are parametric interpolators. They directly operate on the gappy fields and output a gap-free field. They are parametric which implies that they will use neural networks to some degree. Because it’s space and time, we will need physics inspired architectures which decompose the field into a spatial operator and TimeStepper. For example, for the spatial operator, we will use architectures like convolutions, transformers or graphs. For the TimeStepper, we can use convolutions, recurrent neural networks, transformers, or graphs.

y(Ωu,t)=f(Ωy,t,θ)y(\Omega_u, t) = f(\Omega_y, t, \theta)

Use Cases:

  • Learning - Fast, Compressed Interpolator, ROM, PnP Priors, Anomaly Detectors, Pretraining 4 DA
  • Estimation - Latent Variable Data Assimilation

Algorithms

  • Baseline: (Spectral) Conv, UNet, DINEOF, Convolutional Neural Operator
  • Improved: Deep Equilibrium Models
  • Research: Transformers, Graphical Neural Networks

1.4.1 - Direct CNN Models

We apply some simple NN models that are specifically designed to deal with masked inputs. We’re dealing with spatiotemporal data, we will directly apply convolutions. We can increase the difficulty by applying Convolutional LSTMs which is a popular architecture for spatiotemporal data. To deal with the missing data, we’ll start with some simple ad-hoc masks techniques which is similar the kernel methods. We’ll do more advanced methods like partial convolutions which are compatible with neural networks.


1.4.2 - Direct Transformer Models

Here, we will use more advanced models called transformers. We look at the same task of dealing with missing values. However, we can use patch Embeddings to deal with missing data.

  • a - Masked AutoEncoder - keras | keras | SST | SatMAE
  • b - SpatioTemporal Masked AutoEncoder - keras
  • Appendix - Transformer, Attention, UNet, AE, PatchEmbedding Masks, Time Embeddings

1.4.3 - Graphical Models

We will look at Graphical Models as a different data structure for dealing with spatiotemporal data.

  • Appendix - GNN

1.4.4 - Deep Equilibrium Models

We will add an extra


1.4.5 - Conditional Flow Models

Here, we will use conditional flow models. These are conditional stochastic models. They include architectures such as bijective, Surjective, or stochastic. The nice thing here is that we can reuse some of the previous architectures, e.g., the Conv, the partial convolutions, and/or the transformers.


1.5 - Parametric Dynamical Model (Field-Based)

In this application, we train a dynamical model that best fit the observations. The model complexity ranges from linear to nonlinear. The physics can range from a PDE to a surrogate model.

Use Cases:
  • Learning - Scientific Discovery, Surrogate Model
  • Estimation - Latent Variable Data Assimilation
Formulation
z(Ωz,t)=f[z;θ](Ωz,tδt)y(Ωy,t)=h[z;θ](Ωz,t)\begin{aligned} z(\Omega_z, t) &= f[z;\theta](\Omega_z,t-\delta t) \\ y(\Omega_y,t) &= h[z;\theta](\Omega_z,t) \end{aligned}
Algorithms
  • Baseline: Kalman Filter Family
  • Improved: PDE, Neural ODE, UDE
  • Research: Deep Markov Model

1.5.1 - Learning Spatial Operators

Look at this from a Spatiotemporal decomposition perspective. We go over the basics of a state space model including the dynamical (transition) model and the observation (emission) model. We then talk about the complexity of the system. In the case of observations only, we keep it simple with a masked. We will use a simple TimeStepper for all models, e.g., we can use a “continuous” time stepper like a traditional ODESolver or a “discrete” time stepper like Euler.

  • Universal Differential Equations (UDE) - Framework
  • a - Linear Spatial Operator
  • b - Convolutional (Finite Difference) Spatial Operator
  • c - Spectral Convolutional Spatial Operator

Appendix

  • Faster Neural ODEs -
  • Gradients - FD, AutoDiff., Adjoint/ Implicit Diff.

1.5.2 - Probabilistic Dynamical Models

In this section, we will look at how we can perform inference with time series. This will be useful for Reanalysis and Forecasting. A great introduction can be found here


1.5.2a - Conjugate Inference

Basically using conjugate priors and linear models will magically give us exact inference.


1.5.2b - Parametric Inference

a.k.a. Deterministic Approximate Inference. This is a local approximation whereby we cover one mode of the potentially complex, multi-modal distribution really well. We approximate the posterior with a simpler distribution, q(θ;α)q(\theta;\alpha) These include staples like MLE, MAP, Laplace Approx, VI, and EP.

  • Non-Linear Model + Deterministic Approximate Inference

1.5.2c - Stochastic Inference

a.k.a. Stochastic Approximate Inference We draw samples from the posterior. This includes staples like MCMC, HMC/NUTS, SGLD, Gibbs, ESS

Non-Linear Model + Stochastic Approximate Inference


Appendix


1.5.3 - Latent Probabilistic Dynamical Models

We look at state space models in general starting with linear models.


2.0 - Observations to Reanalysis

Data:

  • L2 Gappy Observations
  • L3 Gap-Filled Observations
  • L4 Reanalysis

Cases:

  • Data - L2 Obs
  • Data - L2 Obs and L3 Interpolated Obs
  • Data - L2 Obs, L3 Interpolated Obs, L4 Reanalysis
Formulation
f:y(Ωz,Tz)×ub(Ωz,T)×Θua(Ωz,Tz)f: y (\Omega_z, \mathcal{T}_z ) \times u_b (\Omega_z, \mathcal{T}) \times \Theta \rightarrow u_a (\Omega_z, \mathcal{T}_z)

Ideas:

  • Sequential DA, Variational DA, Amortized DA
  • Dynamical Model - Physical, Hybrid, Surrogate
  • Bi-Level Optimization
  • Dynamical Inference - MLE, MAP, Variational, Laplace, EM, VEM
  • Amortized Model - Direct, DEQ

  • Bilevel Optimization
  • Plug n Play Prior

Physics


2.1 - Parametric Dynamical Model (Field-Based)

Use Cases
  • Estimation - Reanalysis
  • Learning - Physical Models
Formulation
z(Ωz,t)=f[z;θ](Ωz,tδt)y(Ωy,t)=h[z;θ](Ωz,t)f(z,t)=αfdyn(z,t)+βfparam(z,t)\begin{aligned} z(\Omega_z, t) &= f[z;\theta](\Omega_z,t-\delta_t) \\ y(\Omega_y,t) &= h[z;\theta](\Omega_z,t) \\ f(z,t) &= \alpha f_{dyn}(z,t) + \beta f_{param}(z,t) \end{aligned}
Algorithms
  • Baseline - Parametric Dynamical Model + 3D/4DVar + BiLevel Optimization
  • Improved - Hybrid Dynamical Model - 3D/4DVar + VI
  • Research - LatentVar

2.2 - Amortized Parametric Model

Use Cases
  • Learning - Surrogate Modeling, Surrogate Reanalysis
Formulation
ua=f(ub,yobs)u_a = f(u_b, y_{obs})
Algorithms
  • Baseline - Deep Equilibrium Model

3.0 - Reanalysis to X-Casting

Data:

  • L2 Gappy Observations
  • L3 Gap-Filled Observations
  • L4 Reanalysis

Cases:

  • Data - L2 Obs
  • Data - L2 Obs and L3 Interpolated Obs
  • Data - L2 Obs, L3 Interpolated Obs, L4 Reanalysis

Use Cases:

  • NowCasting
  • ForeCasting
  • Projections
Formulation
f:ua(Ωz,T)×δt×Θua(Ωz,Tz+δt)f: u_a (\Omega_z,\mathcal{T}) \times \delta_t \times \Theta \rightarrow u_a (\Omega_z, \mathcal{T}_z+\delta_t)

Parametric Surrogate Model

Algorithms

  • Baseline: Spectral Conv, UNet,
  • Improvements: GNN, Transformer

Ideas:

  • Bilevel Optimization