- Data Size - Small, Medium, Large
- Linear -> NonLinear
- Deterministic, Probabilistic, Bayesian
0.0 - Datasets¶
- Fire
- Drought
- Agro
- Temperature, Precipitation
1.0 - Learning with Observation Data¶
Data:¶
- L2 Gappy Observations
- L3 Gap-Filled Observations
Cases:¶
- Data - L2 Obs
- Data - L2 Obs and L3 Interpolated Obs
Formulation:¶
1.1 - Discretization¶
A method that discretizes the unstructured data into a structured representation, e.g., a Cartesian, rectilinear or curvilinear grid.
Use Case
- Data 4 Learning -> Parameters, Interpolator
- Data 4 Estimation -> State, Latent State
1.1.1 - Histogram¶
- a - Equidistant Binning 4 Cartesian Grids - Global + Masks + Weights - Boost-Histogram | xarray-histogram | dask-histogram | xarray | xcdat
- b - Adaptive Binning 4 Rectilinear Grids - KBinsDiscretizer - sklearn tutorial | tutorial
- c - Graph-Node Binning:
- Voronoi - Voronoi w/ Python | Object Seg. w/ Voronoi | Semi-Discrete Flows
- K-Means - Ocean Clustering Example | Region Joining
Supp. Material¶
1.2 - Non-Parametric Interpolator (Coordinate-Based)¶
A method that applies a non-parametric, coordinate-based regression algorithm to interpolate the observations based on SpatioTemporal location.
Use Case
- Learning - Interpolated Maps
- Estimation - Initial Conditions & Boundary Conditions 4 Data Assimilation
1.2.1 - Naive Methods¶
We will revisit the same methods used for the Discretization. This will include the kernel density method and the k nearest neighbors method.
a - PyInterp Baselines - Linear, IDW, RBF, Window Function, Kriging/OI/GPs, Splines
1.2.1a - Kernel Density Estimation¶
In this section, we look at kernel density estimation as a nonparametric methodology to gap-fill unstructured observations. We will start with the most basic method of k-nearest neighbors. Then we will look at scalable alternatives like KD-Tree, Ball-Tree, or FFT. We’ll also look at some ways to scale it via hardware like KeOps or cuml which both use advanced methods for taking advantage of GPUs.
Basic Methods:
- Naive, Brute Force -
sklearn tutorial|sklearn.neighbors.KernelDensity
Scaling
- Algorithm:
- Tree-Based - jakevdp tutorial |
sklearn.neighbors.KernelDensity| numba-neighbors - Advanced Approximate NN-
sklearn.ann|PyNNDescent
- FFT (Equidistant) - KDEPy - kdepy
- Tree-Based - jakevdp tutorial |
- Data Structure
- Sparse - sklearn.neighbors
- Hardware:
- cuML - KDE cuml
- KeOps - KNN Example
Applied Problems:
- KDE Regression - kdepy example | wiki | Derivation | Video Derivation | Error Analysis | pytorch example
- Connection to Attention - d2l.ai | blog
- KDE Examples with Viz - Visualizing GeoData | Point Pattern Analysis
1.2.1b - KNN Interpolation¶
Here, we use k-nearest neighbors (KNN) to do interpolation. This is one of the simplest, most versatile algorithms available for learning. This is a more scalable method which uses the nearest neighbors to interpolate gappy data. We also showcase how we can modify the distance metric with inverse weighting or a custom distance function, e.g., Gaussian kernel.
Basic Methods:
- Probabilistic Interpretation - Course
- Naive, Brute-Force, Parallel -
sklearn.neighbors.KNeighborsRegressor|sklearn.neighbors.RadiusNeighborsRegressor| From Scratch
- Distance - Uniform, IDW, Gaussian - example.ipynb
Scaling:
- Algorithm:
- Hardware:
- cuML + Dask - Demo Blog |
cuml.neighbors.KNeighborsRegressor
- cuML + Dask - Demo Blog |
Example Applications:
- Housing Interpolation w/ KNN + IDW - Medium
Strengths: K-nearest neighbors regression
- is a simple, intuitive algorithm,
- requires few assumptions about what the data must look like, and
- works well with non-linear relationships (i.e., if the relationship is not a straight line).
- The key merit of KNN is the quick computation time, easy interpretability, versatility to use across classification and regression problems and its non parametric nature (no need to any assumptions or data tuning)
Weaknesses: K-nearest neighbors regression
- becomes very slow as the training data gets larger,
- may not perform well with a large number of predictors, and
- may not predict well beyond the range of values input in your training data.
- In the KNN algorithm, for every new test data point, we need to find its distance to all of the training data points. This is quite hectic when we have a large data with several features. To solve this issue we can use some KNN extension methods like KD tree. I will discuss more on this in later blog posts.
- KNN is also sensitive to irrelevant features but this issue can be addressed by feature selection. A possible solution is to perform PCA on the data and just chose the principal features for the KNN analysis.
- KNN also needs to store all of the training data and this is can be quite costly in case of large data sets.
1.2.2 - GPs/OI/Kriging¶
This will feature tutorials to build up our GP/OI/Kriging mathematical proficiency. We will start by start by We will also look at some specific terminology, e.g., length scale vs lag
Applications
- Data Assimilation - DA Window + LOWESS
We will use the LOWESS method to do interpolation on a subset of spatiotemporal data. We will look at 3 data types:
- sea surface height with very sparse structured randomness
- Sea surface temperature - dense structured randomness
- Land Temperature Data -
Software
- Optimal Interpolation 4 Data Assimilation (OI4DA) - package + xarray interface + sklearn column transforms
From Scratch
- a - GP From Scratch - JAX + Cola - Demo NB
- b - GP w/ Libs - JAX + TinyGP + Bayesian Inference (Demo NBs)
- c - GP w/ PPLs - JAX + Cola + Numpyro
- d - Customizing GP w/ PPLs - Custom TFP Distribution | Custom Numpyro Distribution
Canonical Example
Scaling
- d - Kernel Matrix Approximations -
sklearn.kernel_approximation| My kernellib - e - Hardware - KeOps | KeOPs + GPyTorch
Appendix
jax+ kernel functions +jax.vmap- Distances - scipy overview | jax demo
- Kernel Matrices - jax demo
- Kernel Matrix Derivatives - jax demo
1.2.3 - Improved GPs - Moment-Based¶
- a - Sparse GPs w/ PPLs - My Jax Code + Bayesian Inference
- b - SVGPs w/ PPLs - GPJax | Pyro-PPL | GPyTorch
- c - Structured GPs - SKI/SKIP (Precious Work, Example)
- d - Deep Kernel Learning - DUE | GPyTorch | Pyro-PPL
1.2.4 - Improved GPs - Basis Functions¶
- Fourier Features GP - RFF | PyRFF | GPyTorch
- Spherical Harmonics GPs (SHGPs) - GPfY | SphericalHarmonics | Torch-Harmonics | LocationEncoder | kNerF List
- Sparse SHGPs - GPfY
1.2.5 - State Space Gaussian Processes¶
In this improvement, we add the Markovian assumption which improves the scalability. See this video for a better introduction.
- Markovian GPs (MGPs) - BayesNewton | MarkovFlow | Dynamax
- Sparse MGPs
1.3 - Parametric Interpolator (Coordinate-Based)¶
Learns a parametric, coordinate-based, Differentiable Interpolator for fast queries and online training.
Use Case¶
- Learning - Compressed Representation, Online Learning
- Estimate - Fast Queries, Online Estimation
Formulation¶
Algorithms¶
Baseline - SIREN
Improvements - SpatioTemporal Encoders
Research - Physics Informed, Modulated, Scalable, Stochastic
a - SIREN
b - spatial coordinate encoders
c - temporal coordinate encoders
d - modulation
Scale
- Hashing
Background - TimeEmbedding, SpatialEmbeddings
1.4 - Parametric SpatioTemporal Field Interpolator (Field-Based)¶
These methods are parametric interpolators. They directly operate on the gappy fields and output a gap-free field. They are parametric which implies that they will use neural networks to some degree. Because it’s space and time, we will need physics inspired architectures which decompose the field into a spatial operator and TimeStepper. For example, for the spatial operator, we will use architectures like convolutions, transformers or graphs. For the TimeStepper, we can use convolutions, recurrent neural networks, transformers, or graphs.
Use Cases:
- Learning - Fast, Compressed Interpolator, ROM, PnP Priors, Anomaly Detectors, Pretraining 4 DA
- Estimation - Latent Variable Data Assimilation
Algorithms
- Baseline: (Spectral) Conv, UNet, DINEOF, Convolutional Neural Operator
- Improved: Deep Equilibrium Models
- Research: Transformers, Graphical Neural Networks
1.4.1 - Direct CNN Models¶
We apply some simple NN models that are specifically designed to deal with masked inputs. We’re dealing with spatiotemporal data, we will directly apply convolutions. We can increase the difficulty by applying Convolutional LSTMs which is a popular architecture for spatiotemporal data. To deal with the missing data, we’ll start with some simple ad-hoc masks techniques which is similar the kernel methods. We’ll do more advanced methods like partial convolutions which are compatible with neural networks.
- a - Convolutions w/ Masks - astropy | serket
- b - Partial Convolutions - keras - partial conv | NVidia
- c - Partial Convolution + TimeStepper - LSTMs - PConvLSTM
- Appendix - Masked Losses, Interpolation Losses, Convolution Family, RNN/GRU/LSTMs
1.4.2 - Direct Transformer Models¶
Here, we will use more advanced models called transformers. We look at the same task of dealing with missing values. However, we can use patch Embeddings to deal with missing data.
- a - Masked AutoEncoder - keras | keras | SST | SatMAE
- b - SpatioTemporal Masked AutoEncoder - keras
- Appendix - Transformer, Attention, UNet, AE, PatchEmbedding Masks, Time Embeddings
1.4.3 - Graphical Models¶
We will look at Graphical Models as a different data structure for dealing with spatiotemporal data.
- Appendix - GNN
1.4.4 - Deep Equilibrium Models¶
We will add an extra
- a - DEQ from Scratch - Implicit Layers Tutorial
- b - jaxopt
- c - Optimistix
1.4.5 - Conditional Flow Models¶
Here, we will use conditional flow models. These are conditional stochastic models. They include architectures such as bijective, Surjective, or stochastic. The nice thing here is that we can reuse some of the previous architectures, e.g., the Conv, the partial convolutions, and/or the transformers.
- Variational AutoEncoder + Masks - pyro-ppl
- PriorCVAE
- Stochastic Interpolants - Video | Video | Conditional Flow Matching | Stochastic Interpolants
1.5 - Parametric Dynamical Model (Field-Based)¶
In this application, we train a dynamical model that best fit the observations. The model complexity ranges from linear to nonlinear. The physics can range from a PDE to a surrogate model.
Use Cases:¶
- Learning - Scientific Discovery, Surrogate Model
- Estimation - Latent Variable Data Assimilation
Formulation¶
Algorithms¶
- Baseline: Kalman Filter Family
- Improved: PDE, Neural ODE, UDE
- Research: Deep Markov Model
1.5.1 - Learning Spatial Operators¶
Look at this from a Spatiotemporal decomposition perspective. We go over the basics of a state space model including the dynamical (transition) model and the observation (emission) model. We then talk about the complexity of the system. In the case of observations only, we keep it simple with a masked. We will use a simple TimeStepper for all models, e.g., we can use a “continuous” time stepper like a traditional ODESolver or a “discrete” time stepper like Euler.
- Universal Differential Equations (UDE) - Framework
- a - Linear Spatial Operator
- b - Convolutional (Finite Difference) Spatial Operator
- c - Spectral Convolutional Spatial Operator
Appendix
1.5.2 - Probabilistic Dynamical Models¶
In this section, we will look at how we can perform inference with time series. This will be useful for Reanalysis and Forecasting. A great introduction can be found here
1.5.2a - Conjugate Inference¶
Basically using conjugate priors and linear models will magically give us exact inference.
- a - Linear Model + Exact Inference
1.5.2b - Parametric Inference¶
a.k.a. Deterministic Approximate Inference. This is a local approximation whereby we cover one mode of the potentially complex, multi-modal distribution really well. We approximate the posterior with a simpler distribution, These include staples like MLE, MAP, Laplace Approx, VI, and EP.
- Non-Linear Model + Deterministic Approximate Inference
- Standard Approaches - EKF, UKF, ADF - Dynamax | Neural EKF | Training
- Approximate Expectation Propagation -
- Variational Approximate Inference - Slides
- Unified - Bayes-Newton
1.5.2c - Stochastic Inference¶
a.k.a. Stochastic Approximate Inference We draw samples from the posterior. This includes staples like MCMC, HMC/NUTS, SGLD, Gibbs, ESS
Non-Linear Model + Stochastic Approximate Inference
- Ensemble Kalman Filter -
- Particle Filter - pfilter | pc - tutorial
Appendix
- Sequential Model Inference - Exact, (V)EM, (V)EP,
- Packages - Nested Sampling | SGMCMC | BlackJax
1.5.3 - Latent Probabilistic Dynamical Models¶
We look at state space models in general starting with linear models.
- a - Conjugate Transform (Conditional Markov Flows)
- b - Stochastic Transform Filter
- Stochastic Inference - ROAD-EnsKF
- Variational Inference - pyro - DMM | numpyro - DMM | DMM | PgDMM
- observation operator encoder - KVAE
- c -
- d - Neural SDE
2.0 - Observations to Reanalysis¶
Data:
- L2 Gappy Observations
- L3 Gap-Filled Observations
- L4 Reanalysis
Cases:
- Data - L2 Obs
- Data - L2 Obs and L3 Interpolated Obs
- Data - L2 Obs, L3 Interpolated Obs, L4 Reanalysis
Formulation¶
Ideas:
- Sequential DA, Variational DA, Amortized DA
- Dynamical Model - Physical, Hybrid, Surrogate
- Bi-Level Optimization
- Dynamical Inference - MLE, MAP, Variational, Laplace, EM, VEM
- Amortized Model - Direct, DEQ
- Bilevel Optimization
- Plug n Play Prior
Physics¶
2.1 - Parametric Dynamical Model (Field-Based)¶
Use Cases¶
- Estimation - Reanalysis
- Learning - Physical Models
Formulation¶
Algorithms¶
- Baseline - Parametric Dynamical Model + 3D/4DVar + BiLevel Optimization
- Improved - Hybrid Dynamical Model - 3D/4DVar + VI
- Research - LatentVar
2.2 - Amortized Parametric Model¶
Use Cases¶
- Learning - Surrogate Modeling, Surrogate Reanalysis
Formulation¶
Algorithms¶
- Baseline - Deep Equilibrium Model
3.0 - Reanalysis to X-Casting¶
Data:
- L2 Gappy Observations
- L3 Gap-Filled Observations
- L4 Reanalysis
Cases:
- Data - L2 Obs
- Data - L2 Obs and L3 Interpolated Obs
- Data - L2 Obs, L3 Interpolated Obs, L4 Reanalysis
Use Cases:
- NowCasting
- ForeCasting
- Projections
Formulation¶
Parametric Surrogate Model¶
Algorithms
- Baseline: Spectral Conv, UNet,
- Improvements: GNN, Transformer
Ideas:
- Bilevel Optimization