# Uncertain Inputs in Gaussian Processe¶

## Motivation¶

This is my complete literature review of all the ways the GPs have been modified to allow for uncertain inputs.

## Algorithms¶

### Error-In-Variables Regression¶

This isn't really GPs per say but it is probably the first few papers that actually publish about this problem in the Bayesian community (that we know of).

### Monte Carlo Sampling¶

So almost all of the papers in the first few years mention that you can do this. But I haven't seen a paper explicitly walking through the pros and cons of doing this. However, you can see the most implementations of the PILCO method as well as the Deep GP method do implement some form of this.

### Moment Matching¶

This is where we approximate the mean function and the predictive variance function to be Gaussian by taking the mean and variance (the moments needed to describe the distribution).

Details
\begin{aligned} m(\mu_{x_*}, \Sigma_{x_*}) &= \mu(\mu_{x_*})\\ v(\mu_{x_*}, \Sigma_{x_*}) &= \nu^2(\mu_{x_*}) + \frac{\partial \mu(\mu_{x_*})}{\partial x_*}^\top \Sigma_{x_*} \frac{\partial \mu(\mu_{x_*})}{\partial x_*} + \frac{1}{2} \text{Tr}\left\{ \frac{\partial^2 \nu^2(\mu_{x_*})}{\partial x_* \partial x_*^\top} \Sigma_{x_*}\right\} \end{aligned}

### Covariance Functions¶

Details

Daillaire constructed a modification to the RBF covariance function that takes into account the input noise.

K_{ij} = \left| 2\Lambda^{-1}\Sigma_x + I \right|^{1/2} \sigma_f^2 \exp\left( -\frac{1}{2}(x_i - x_j)^\top (\Lambda + 2\Sigma_x)^{-1}(x_i - x_j) \right)

for $i\neq j$ and

K_{ij}=\sigma_f^2

for $i=j$. This was shown to have bad results if this $\Sigma_x$ is not known. You can see the full explanation in the thesis of McHutchon (section 2.2.1) which can be found in Iterative section below.

### Linearized (Unscented) Approximation¶

This is the linearized version of the Moment-Matching approach mentioned above. Also known as unscented GP. In this approximation, we only change the predictive variance. You can find an example colab notebook here with an example of how to use this with the GPy library.

Details
\begin{aligned} \tilde{\mu}_f(x_*) &= \underbrace{k_*^\top K^{-1}y}_{\mu_f(x_*)} \\ \tilde{\nu}^2(x_*) &= \underbrace{k_{**} - k_*^\top K^{-1} k_*}_{\nu^2(x_*)} + \partial \mu_f \text{ } \Sigma_x \text{ } \partial \mu_f^\top \end{aligned}

Note: The inspiration of this comes from the Extended Kalman Filter (links below) which tries to find an approximation to a non-linear transformation, $f$ of $x$ when $x$ comes from a distribution $x \sim \mathcal{N}(\mu_x, \Sigma_x)$.

• GP-BayesFilters: Bayesian Filtering Using Gaussian Process Prediction - Ko and Fox (2008)

They originally came up with the linearized (unscented) approximation to the moment-matching method. They used it in the context of the extended Kalman filter which has a few more elaborate steps in addition to the input uncertainty propagation.

• Expectation Propagation in Gaussian Process Dynamical Systems - Deisenroth & Mohamed (2012)

The authors use expectation propagation as a way to propagate the noise through the test points. They mention the two ways to account for the input uncertainty referencing the GP-BayesFilters paper above: explicit moment-matching and the linearized (unscented) version. They also give the interpretation that the Moment-Matching approach with the kernel expectations is analogous to doing the KL-Divergence between prior distribution with the uncertain inputs $p(x)$ and the approximate distribution $q(x)$.

• Accounting for Input Noise in Gaussian Process Parameter Retrieval - Johnson et. al. (2019)

My paper where I use the unscented version to get better predictive uncertainty estimates.

Note: I didn't know about the unscented stuff until after the publication...unfortunately.

• Unscented Gaussian Process Latent Variable Model: learning from uncertain inputs with intractable kernels - Souza et. al. (2019) [arxiv]

A very recent paper that's been on arxiv for a while. They give a formulation for approximating the linearized (unscented) version of the moment matching approach. Apparently it works better that the quadrature, monte carlo and the kernel expectations approach.

## Appendix¶

### Kernel Expectations¶

So [Girard 2003] came up with a name of something we call kernel expectations $\{\mathbf{\xi, \Omega, \Phi}\}$-statistics. These are basically calculated by taking the expectation of a kernel or product of two kernels w.r.t. some distribution. Typically this distribution is normal but in the variational literature it is a variational distribution.

Details

The three kernel expectations that surface are:

\mathbf \xi(\mathbf{\mu, \Sigma}) = \int_X \mathbf k(\mathbf x, \mathbf x)\mathcal{N}(\mathbf x|\mathbf \mu,\mathbf \Sigma)d\mathbf x
\mathbf \Omega(\mathbf{y, \mu, \Sigma}) = \int_X \mathbf k(\mathbf x, \mathbf y)\mathcal{N}(\mathbf x|\mathbf \mu,\mathbf \Sigma)d\mathbf x
\mathbf \Phi(\mathbf{y, z, \mu, \Sigma}) = \int_X \mathbf k(\mathbf x, \mathbf y)k(\mathbf x, \mathbf z)\mathcal{N}(\mathbf x|\mathbf \mu,\mathbf \Sigma)d\mathbf x

To my knowledge, I only know of the following kernels that have analytically calculated sufficient statistics: Linear, RBF, ARD and Spectral Mixture. And furthermore, the connection is how these kernel statistics show up in many other GP literature than just uncertain inputs of GPs; for example in Bayesian GP-LVMs and Deep GPs.

### Connecting Concepts¶

#### Extended Kalman Filter¶

This is the origination of the Unscented transformation applied to GPs. It takes the Taylor approximation of your function

### Key Equations¶

Details

Predictive Mean and Variance for Latent Function, f

\mu_f(x_*) = k_*^\top K^{-1}y
\sigma^2_f(x_*) = k_{**} - k_*^\top K^{-1} k_*
Details

Predictive Mean and Variance for mean output, y

\mu_f(x_*) = k_*^\top K^{-1}y
\sigma^2_f(x_*) = k_{**} - k_*^\top K^{-1} k_*