Source

Uncertain Inputs in Gaussian Processe¶

Motivation¶

This is my complete literature review of all the ways the GPs have been modified to allow for uncertain inputs.

Algorithms¶

Error-In-Variables Regression¶

This isn't really GPs per say but it is probably the first few papers that actually publish about this problem in the Bayesian community (that we know of).

Bayesian Analysis of Error-in-Variables Regression Models - Dellaportas & Stephens (1995)
Error in Variables Regression: What is the Appropriate Model? - Gillard et. al. (2007) [Thesis]

Monte Carlo Sampling¶

So almost all of the papers in the first few years mention that you can do this. But I haven't seen a paper explicitly walking through the pros and cons of doing this. However, you can see the most implementations of the PILCO method as well as the Deep GP method do implement some form of this.

Taylor Expansion¶

Learning a Gaussian Process Model with Uncertain Inputs - Girard & Murray-Smith (2003) [Technical Report]

Moment Matching¶

This is where we approximate the mean function and the predictive variance function to be Gaussian by taking the mean and variance (the moments needed to describe the distribution).

Details

$\begin{aligned} m(\mu_{x_*}, \Sigma_{x_*}) &= \mu(\mu_{x_*})\\ v(\mu_{x_*}, \Sigma_{x_*}) &= \nu^2(\mu_{x_*}) + \frac{\partial \mu(\mu_{x_*})}{\partial x_*}^\top \Sigma_{x_*} \frac{\partial \mu(\mu_{x_*})}{\partial x_*} + \frac{1}{2} \text{Tr}\left\{ \frac{\partial^2 \nu^2(\mu_{x_*})}{\partial x_* \partial x_*^\top} \Sigma_{x_*}\right\} \end{aligned}$

Gaussian Process Priors With Uncertain Inputs – Application to Multiple-Step Ahead Time Series Forecasting - Girard et. al. (2003)
Approximate Methods for Propagation of Uncertainty in GP Models - Girard (2004) [Thesis]
Prediction at an Uncertain Input for Gaussian Processes and Relevance Vector Machines Application to Multiple-Step Ahead Time-Series Forecasting - Quinonero-Candela et. al. (2003) [Technical Report]
Analytic moment-based Gaussian process filtering - Deisenroth et. al. (2009)
PILCO: A Model-Based and Data-Efficient Approach to Policy Search - Deisenroth et. al. (2011)
- Code - TensorFlow | GPyTorch | MXFusion I | MXFusion II
Efficient Reinforcement Learning using Gaussian Processes - Deisenroth (2010) [Thesis]
Chapter IV - Finding Uncertain Patterns in GPs (Lit review at the end)

Covariance Functions¶

Details

Daillaire constructed a modification to the RBF covariance function that takes into account the input noise.

$K_{ij} = \left| 2\Lambda^{-1}\Sigma_x + I \right|^{1/2} \sigma_f^2 \exp\left( -\frac{1}{2}(x_i - x_j)^\top (\Lambda + 2\Sigma_x)^{-1}(x_i - x_j) \right)$

for $i\neq j$ and

$K_{ij}=\sigma_f^2$

for $i=j$ . This was shown to have bad results if this $\Sigma_x$ is not known. You can see the full explanation in the thesis of McHutchon (section 2.2.1) which can be found in Iterative section below.

An approximate inference with Gaussian process to latent functions from uncertain data - Dallaire et. al. (2011) | Prezi | Code

Iterative¶

Gaussian Process Training with Input Noise - McHutchon & Rasmussen (2011) | Code
Nonlinear Modelling and Control using GPs - McHutchon (2014) [Thesis]
- Chapter IV - Finding Uncertain Patterns in GPs
System Identification through Online Sparse Gaussian Process Regression with Input Noise - Bijl et. al. (2017) | Code
Gaussian Process Regression Techniques - Bijl (2018) [Thesis] | Code
- Chapter V - Noisy Input GPR

Linearized (Unscented) Approximation¶

This is the linearized version of the Moment-Matching approach mentioned above. Also known as unscented GP. In this approximation, we only change the predictive variance. You can find an example colab notebook here with an example of how to use this with the GPy library.

Details

$\begin{aligned} \tilde{\mu}_f(x_*) &= \underbrace{k_*^\top K^{-1}y}_{\mu_f(x_*)} \\ \tilde{\nu}^2(x_*) &= \underbrace{k_{**} - k_*^\top K^{-1} k_*}_{\nu^2(x_*)} + \partial \mu_f \text{ } \Sigma_x \text{ } \partial \mu_f^\top \end{aligned}$

Note: The inspiration of this comes from the Extended Kalman Filter (links below) which tries to find an approximation to a non-linear transformation, $f$ of $x$ when $x$ comes from a distribution $x \sim \mathcal{N}(\mu_x, \Sigma_x)$ .

GP-BayesFilters: Bayesian Filtering Using Gaussian Process Prediction - Ko and Fox (2008)

They originally came up with the linearized (unscented) approximation to the moment-matching method. They used it in the context of the extended Kalman filter which has a few more elaborate steps in addition to the input uncertainty propagation.
Expectation Propagation in Gaussian Process Dynamical Systems - Deisenroth & Mohamed (2012)

The authors use expectation propagation as a way to propagate the noise through the test points. They mention the two ways to account for the input uncertainty referencing the GP-BayesFilters paper above: explicit moment-matching and the linearized (unscented) version. They also give the interpretation that the Moment-Matching approach with the kernel expectations is analogous to doing the KL-Divergence between prior distribution with the uncertain inputs $p(x)$ and the approximate distribution $q(x)$ .
Accounting for Input Noise in Gaussian Process Parameter Retrieval - Johnson et. al. (2019)

My paper where I use the unscented version to get better predictive uncertainty estimates.

Note: I didn't know about the unscented stuff until after the publication...unfortunately.
Unscented Gaussian Process Latent Variable Model: learning from uncertain inputs with intractable kernels - Souza et. al. (2019) [arxiv]

A very recent paper that's been on arxiv for a while. They give a formulation for approximating the linearized (unscented) version of the moment matching approach. Apparently it works better that the quadrature, monte carlo and the kernel expectations approach.

Variational Strategies¶

Bayesian Gaussian Process Latent Variable Model - Titsias & Lawrence (2010)
Nonlinear Modelling and Control using GPs - McHutchon (2014) [Thesis]
Variational Inference for Uncertainty on the Inputs of Gaussian Process Models - Damianou et. al. (2014)
Deep GPs and Variational Propagation of Uncertainty - Damianou (2015) [Thesis]
- Chapter IV - Uncertain Inputs in Variational GPs
- Chapter II (2.1) - Lit Review
Processes Non-Stationary Surrogate Modeling with Deep Gaussian - Dutordoir (2016) [Thesis] > This is a good thesis that walks through the derivations of the moment matching approach and the Bayesian GPLVM approach. It becomes a little clearer how they are related after going through the derivations once.
Bringing Models to the Domain: Deploying Gaussian Processes in the Biological Sciences - Zwießele (2017) [Thesis]
Chapter II (2.4, 2.5) - Sparse GPs, Variational Bayesian GPLVM

Appendix¶

Kernel Expectations¶

So [Girard 2003] came up with a name of something we call kernel expectations $\{\mathbf{\xi, \Omega, \Phi}\}$ -statistics. These are basically calculated by taking the expectation of a kernel or product of two kernels w.r.t. some distribution. Typically this distribution is normal but in the variational literature it is a variational distribution.

Details

The three kernel expectations that surface are:

$\mathbf \xi(\mathbf{\mu, \Sigma}) = \int_X \mathbf k(\mathbf x, \mathbf x)\mathcal{N}(\mathbf x|\mathbf \mu,\mathbf \Sigma)d\mathbf x$

$\mathbf \Omega(\mathbf{y, \mu, \Sigma}) = \int_X \mathbf k(\mathbf x, \mathbf y)\mathcal{N}(\mathbf x|\mathbf \mu,\mathbf \Sigma)d\mathbf x$

$\mathbf \Phi(\mathbf{y, z, \mu, \Sigma}) = \int_X \mathbf k(\mathbf x, \mathbf y)k(\mathbf x, \mathbf z)\mathcal{N}(\mathbf x|\mathbf \mu,\mathbf \Sigma)d\mathbf x$

To my knowledge, I only know of the following kernels that have analytically calculated sufficient statistics: Linear, RBF, ARD and Spectral Mixture. And furthermore, the connection is how these kernel statistics show up in many other GP literature than just uncertain inputs of GPs; for example in Bayesian GP-LVMs and Deep GPs.

Literature¶

Oxford M:
Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature - Gunter et. al. (2014)
Batch Selection for Parallelisation of Bayesian Quadrature -
- Code
Prüher et. al
On the use of gradient information in Gaussian process quadratures (2016) > A nice introduction to moments in the context of Gaussian distributions.
Gaussian Process Quadrature Moment Transform (2017)
Student-t Process Quadratures for Filtering of Non-linear Systems with Heavy-tailed Noise (2017)
- Code: Nonlinear Sigma-Point Kalman Filters based on Bayesian Quadrature
  
  This includes an implementation of the nonlinear Sigma-Point Kalman filter. Includes implementations of the
- Moment Transform
- Linearized Moment Transform
- MC Transform
- SigmaPointTransform,
- Spherical Radial Transform
- Unscented Transform
- Gaussian Hermite Transform
- Fully Symmetric Student T Transform
  
  And a few experimental transforms:
- Truncated Transforms:
- Taylor GPQ+D w. RBF Kernel

Toolboxes¶

Emukit

Connecting Concepts¶

Moment Matching¶

Derivatives of GPs¶

Derivative observations in Gaussian Process Models of Dynamic Systems - Solak et. al. (2003)
Differentiating GPs - McHutchon (2013)

A nice PDF with the step-by-step calculations for taking derivatives of the linear and RBF kernels.
Exploiting gradients and Hessians in Bayesian optimization and Bayesian quadrature - Wu et. al. (2018)

Extended Kalman Filter¶

This is the origination of the Unscented transformation applied to GPs. It takes the Taylor approximation of your function

Wikipedia
Blog Posts by Harveen Singh - Kalman Filter | Unscented Kalman Filter | Extended Kalman Filter
Intro to Kalman Filter and Its Applications - Kim & Bang (2018)
Tutorial - Terejanu
Videos
Lecture by Cyrill Stachniss
Lecture by Robotics Course | Notes
Lecture explained with Python Code

Uncertain Inputs in other ML fields¶

Statistical Rethinking
Course Page
Lecture | Slides | PyMC3 Implementation

Key Equations¶

Details

Predictive Mean and Variance for Latent Function, f

$\mu_f(x_*) = k_*^\top K^{-1}y$

$\sigma^2_f(x_*) = k_{**} - k_*^\top K^{-1} k_*$

Details

Predictive Mean and Variance for mean output, y

$\mu_f(x_*) = k_*^\top K^{-1}y$

$\sigma^2_f(x_*) = k_{**} - k_*^\top K^{-1} k_*$

Uncertain Inputs in Gaussian Processe¶

Motivation¶

Algorithms¶

Error-In-Variables Regression¶

Monte Carlo Sampling¶

Taylor Expansion¶

Moment Matching¶

Covariance Functions¶

Iterative¶

Linearized (Unscented) Approximation¶

Heteroscedastic Likelihood Models¶

Latent Variable Models¶

Latent Covariates¶

Variational Strategies¶

Appendix¶

Kernel Expectations¶

Literature¶

Toolboxes¶

Connecting Concepts¶

Moment Matching¶

Derivatives of GPs¶

Extended Kalman Filter¶

Uncertain Inputs in other ML fields¶

Key Equations¶