Skip to content
Source

Deep Gaussian Processes

These are GP models that stack GPs one after the other. As far as understanding, the best would be lectures as I have highlighted below.


πŸ‘¨πŸ½β€πŸ« | πŸ‘©πŸ½β€πŸ« Resources

Neil Lawrence @ MLSS 2019

I would say this is the best lecture to understand the nature of GPs and why we would might want to use them.

Blog | Lecture | Slides

Neil Lawrence @ GPSS 2019

Notes | Lecture

Maurizio Filippone @ DeepBayes.ru 2018

I would say this is the second best lecture because Maurizio gives a nice overview of the GP methods there already are (at the time).

Lecture | Slides | New Slides


Algorithms

The literature isn’t so big but there a number of different implementations depending on the lab:

  1. Variational Inference

    This is the most popular method and has been pursued the most. It's also the implementation that you will find standard in libraries like GPyTorch, GPFlow and Pyro.

  2. Expectation Propagation

    This group used expectation propagation to train the GP. They haven't really done so much since then and I'm not entirely sure why this line of the DGP has gone a bit dry. It would be nice if they resumed. I suspect it may be because of the software. I haven't seen too much software that focuses on clever expectation propagation schemes; they mainly focus on variational inference and MC sampling schemes.

  3. MC sampling

    One lab has tackled this where you can use some variants of MC sampling to train a Deep GP. You'll find this standard in many GP libraries because it's fairly easy to integrate in almost any scheme. MC sampling is famous for being slow but the community is working on it. I imagine a break-through is bound to happen.

  4. Random Feature Expansions

    This uses RFF to approximate a GP and then stacks these on top. I find this a big elegent and probably the simplest. But I didn't see too much research on the tiny bits of the algorithm like the training or the initialization procedures.

I don’t think there is any best one because I’m almost certain noone has done any complete comparison. I can say that the VI one is the most studied because that lab is still working on it. In the meantime, personally I would try to use implementations in standard libraries where the devs have ironed out the bugs and allowed for easy customization and configuration; so basically the doubly stochastic.


Variational Inference

Deep Gaussian Processes - Damianou & Lawrence (2013)

This paper is the original method of Deep GPs. It might not be useful for production but there are still many insights to be had from the originators.

Nested Variational Compression in Deep Gaussian Processes - Hensman & Lawrence (2014)
Doubly Stochastic Variational Inference for Deep Gaussian Processes - Salimbeni & Deisenroth (2017)

This paper uses stochastic gradient descent for training the Deep GP. I think this achieves the state-of-the-art results thus far. It also has the most implementations in the standard literature.


Random Fourier Features

Random Feature Expansions for Deep Gaussian Processes - Cutjar et. al. (2017)

This implementation uses ideas from random fourier features in conjunction with Deep GPs.


MC Sampling

Learning deep latent Gaussian models with Markov chain Monte Carlo - Hoffman (2017)
Inference in Deep Gaussian Processes Using Stochastic Gradient Hamiltonian Monte Carlo - Havasi et. al. (2018)

Expectation Propagation

Deep Gaussian Processes for Regression using Approximate Expectation Propagation - Bui et. al. (2016)

This paper uses an approximate expectation method for the inference in Deep GPs.

Paper | Code


Hybrids

Deep Gaussian Processes with Importance-Weighted Variational Inference - Salimbeni et. al. (2019)

This paper uses the idea that our noisy inputs are instead 'latent covariates' instead of additive noise or that our input itself is a latent covariate. They also propose a way to do importance sampling coupled with variational inference to improve single layer and multiple layer GPs and have shown that they can get equivalent or better results than just standard variational inference. The latent variables alone will improve performance for both the IWVI and the VI training procedures.


Misc

Inter-domain Deep Gaussian Processes - Rudner et. al. (2020)

-> Paper

-> Slides

Interpretable Deep Gaussian Processes with Moments - Lu et. al. (2020)

-> Paper

Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the Predictive Uncertainties - LIngdinger et al (2020)

-> Paper

-> Code


Insights

Problems with Deep GPs

Deep Gaussian Process Pathologies

This paper shows how some of the kernel compositions give very bad estimates of the functions between layers; similar to how residual NN do much better.


Software

There are scattered implementations throughout the page but this will centralize and suggest the most common/reliable packages available.

Variational Inference

GPFlow
GPyTorch
Pyro
Edward2

Expectation Propagation

Numpy

MCMC

Pyro

Random Feature Expansions

TensorFlow
Edward2