# Uncertain Inputs in Gaussian Processe¶

## Motivation¶

This is my complete literature review of all the ways the GPs have been modified to allow for uncertain inputs.

## Algorithms¶

### Error-In-Variables Regression¶

This isn't really GPs per say but it is probably the first few papers that actually publish about this problem in the Bayesian community (that we know of).

- Bayesian Analysis of Error-in-Variables Regression Models - Dellaportas & Stephens (1995)
- Error in Variables Regression: What is the Appropriate Model? - Gillard et. al. (2007) [
**Thesis**]

### Monte Carlo Sampling¶

So almost all of the papers in the first few years mention that you can do this. But I haven't seen a paper explicitly walking through the pros and cons of doing this. However, you can see the most implementations of the PILCO method as well as the Deep GP method do implement some form of this.

### Taylor Expansion¶

- Learning a Gaussian Process Model with Uncertain Inputs - Girard & Murray-Smith (2003) [
**Technical Report**]

### Moment Matching¶

This is where we approximate the mean function and the predictive variance function to be Gaussian by taking the mean and variance (the moments needed to describe the distribution).

## Details

- Gaussian Process Priors With Uncertain Inputs – Application to Multiple-Step Ahead Time Series Forecasting - Girard et. al. (2003)
- Approximate Methods for Propagation of Uncertainty in GP Models - Girard (2004) [
**Thesis**] - Prediction at an Uncertain Input for Gaussian Processes and Relevance Vector Machines Application to Multiple-Step Ahead Time-Series Forecasting - Quinonero-Candela et. al. (2003) [
**Technical Report**] - Analytic moment-based Gaussian process filtering - Deisenroth et. al. (2009)
- PILCO: A Model-Based and Data-Efficient Approach to Policy Search - Deisenroth et. al. (2011)
- Code - TensorFlow | GPyTorch | MXFusion I | MXFusion II

- Efficient Reinforcement Learning using Gaussian Processes - Deisenroth (2010) [
**Thesis**] - Chapter IV - Finding Uncertain Patterns in GPs (Lit review at the end)

### Covariance Functions¶

## Details

Daillaire constructed a modification to the RBF covariance function that takes into account the input noise.

for i\neq j and

for i=j. This was shown to have bad results if this \Sigma_x is not known. You can see the full explanation in the thesis of McHutchon (section 2.2.1) which can be found in Iterative section below.

- An approximate inference with Gaussian process to latent functions from uncertain data - Dallaire et. al. (2011) | Prezi | Code

### Iterative¶

- Gaussian Process Training with Input Noise - McHutchon & Rasmussen (2011) | Code
- Nonlinear Modelling and Control using GPs - McHutchon (2014) [
**Thesis**]- Chapter IV - Finding Uncertain Patterns in GPs

- System Identification through Online Sparse Gaussian Process Regression with Input Noise - Bijl et. al. (2017) | Code
- Gaussian Process Regression Techniques - Bijl (2018) [
**Thesis**] | Code- Chapter V - Noisy Input GPR

### Linearized (Unscented) Approximation¶

This is the linearized version of the Moment-Matching approach mentioned above. Also known as unscented GP. In this approximation, we only change the predictive variance. You can find an example colab notebook here with an example of how to use this with the GPy library.

## Details

**Note**: The inspiration of this comes from the Extended Kalman Filter (links below) which tries to find an approximation to a non-linear transformation, f of x when x comes from a distribution x \sim \mathcal{N}(\mu_x, \Sigma_x).

- GP-BayesFilters: Bayesian Filtering Using Gaussian Process Prediction - Ko and Fox (2008)
They originally came up with the linearized (unscented) approximation to the moment-matching method. They used it in the context of the extended Kalman filter which has a few more elaborate steps in addition to the input uncertainty propagation.

- Expectation Propagation in Gaussian Process Dynamical Systems - Deisenroth & Mohamed (2012)
The authors use expectation propagation as a way to propagate the noise through the test points. They mention the two ways to account for the input uncertainty referencing the GP-BayesFilters paper above: explicit moment-matching and the linearized (unscented) version. They also give the interpretation that the Moment-Matching approach with the kernel expectations is analogous to doing the KL-Divergence between prior distribution with the uncertain inputs p(x) and the approximate distribution q(x).

- Accounting for Input Noise in Gaussian Process Parameter Retrieval - Johnson et. al. (2019)
My paper where I use the unscented version to get better predictive uncertainty estimates.

**Note**: I didn't know about the unscented stuff until after the publication...unfortunately. - Unscented Gaussian Process Latent Variable Model: learning from uncertain inputs with intractable kernels - Souza et. al. (2019) [
**arxiv**]A very recent paper that's been on arxiv for a while. They give a formulation for approximating the linearized (unscented) version of the moment matching approach. Apparently it works better that the quadrature, monte carlo and the kernel expectations approach.

### Heteroscedastic Likelihood Models¶

- Heteroscedastic Gaussian Process Regression - Le et. al. (2005)
- Most Likely Heteroscedastic Gaussian Process Regression - Kersting et al (2007)
- Variational Heteroscedastic Gaussian Process Regression - Lázaro-Gredilla & Titsias (2011)
- Heteroscedastic Gaussian Processes for Uncertain and Incomplete Data - Almosallam (2017) [
**Thesis**] - Large-scale Heteroscedastic Regression via Gaussian Process - Lui et. al. (2019) [
**arxiv**] | Code

### Latent Variable Models¶

- Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data - Lawrence (2004)
- Generic Inference in Latent Gaussian Process Models - Bonilla et. al. (2016)
- A review on Gaussian Process Latent Variable Models - Li & Chen (2016)

### Latent Covariates¶

- Gaussian Process Regression with Heteroscedastic or Non-Gaussian Residuals - Wang & Neal (2012)
- Gaussian Process Conditional Density Estimation - Dutordoir et. al. (2018)
- Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models - Martens et. al. (2019)
- Deep Gaussian Processes with Importance-Weighted Variational Inference - Salimbeni et. al. (2019)

### Variational Strategies¶

- Bayesian Gaussian Process Latent Variable Model - Titsias & Lawrence (2010)
- Nonlinear Modelling and Control using GPs - McHutchon (2014) [
**Thesis**] - Variational Inference for Uncertainty on the Inputs of Gaussian Process Models - Damianou et. al. (2014)
- Deep GPs and Variational Propagation of Uncertainty - Damianou (2015) [
**Thesis**]- Chapter IV - Uncertain Inputs in Variational GPs
- Chapter II (2.1) - Lit Review

- Processes Non-Stationary Surrogate Modeling with Deep Gaussian - Dutordoir (2016) [
**Thesis**] > This is a good thesis that walks through the derivations of the moment matching approach and the Bayesian GPLVM approach. It becomes a little clearer how they are related after going through the derivations once. - Bringing Models to the Domain: Deploying Gaussian Processes in the Biological Sciences - Zwießele (2017) [
**Thesis**] - Chapter II (2.4, 2.5) - Sparse GPs, Variational Bayesian GPLVM

## Appendix¶

### Kernel Expectations¶

So [Girard 2003] came up with a name of something we call kernel expectations \{\mathbf{\xi, \Omega, \Phi}\}-statistics. These are basically calculated by taking the expectation of a kernel or product of two kernels w.r.t. some distribution. Typically this distribution is normal but in the variational literature it is a variational distribution.

## Details

The three kernel expectations that surface are:

To my knowledge, I only know of the following kernels that have analytically calculated sufficient statistics: Linear, RBF, ARD and Spectral Mixture. And furthermore, the connection is how these kernel statistics show up in many other GP literature than just uncertain inputs of GPs; for example in Bayesian GP-LVMs and Deep GPs.

#### Literature¶

- Oxford M:
- Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature - Gunter et. al. (2014)
- Batch Selection for Parallelisation of Bayesian Quadrature -
- Prüher et. al
- On the use of gradient information in Gaussian process quadratures (2016) > A nice introduction to moments in the context of Gaussian distributions.
- Gaussian Process Quadrature Moment Transform (2017)
- Student-t Process Quadratures for Filtering of Non-linear Systems with Heavy-tailed Noise (2017)
- Code: Nonlinear Sigma-Point Kalman Filters based on Bayesian Quadrature
This includes an implementation of the nonlinear Sigma-Point Kalman filter. Includes implementations of the

- Moment Transform
- Linearized Moment Transform
- MC Transform
- SigmaPointTransform,
- Spherical Radial Transform
- Unscented Transform
- Gaussian Hermite Transform
- Fully Symmetric Student T Transform
And a few experimental transforms:

- Truncated Transforms:
- Taylor GPQ+D w. RBF Kernel

- Code: Nonlinear Sigma-Point Kalman Filters based on Bayesian Quadrature

#### Toolboxes¶

### Connecting Concepts¶

#### Moment Matching¶

#### Derivatives of GPs¶

- Derivative observations in Gaussian Process Models of Dynamic Systems - Solak et. al. (2003)
- Differentiating GPs - McHutchon (2013)
A nice PDF with the step-by-step calculations for taking derivatives of the linear and RBF kernels.

- Exploiting gradients and Hessians in Bayesian optimization and Bayesian quadrature - Wu et. al. (2018)

#### Extended Kalman Filter¶

This is the origination of the Unscented transformation applied to GPs. It takes the Taylor approximation of your function

- Wikipedia
- Blog Posts by Harveen Singh - Kalman Filter | Unscented Kalman Filter | Extended Kalman Filter
- Intro to Kalman Filter and Its Applications - Kim & Bang (2018)
- Tutorial - Terejanu
- Videos
- Lecture by Cyrill Stachniss
- Lecture by Robotics Course | Notes
- Lecture explained with Python Code

### Uncertain Inputs in other ML fields¶

- Statistical Rethinking
- Course Page
- Lecture | Slides | PyMC3 Implementation

### Key Equations¶

## Details

Predictive Mean and Variance for Latent Function, f

## Details

Predictive Mean and Variance for mean output, y