Deep Gaussian Processes¶
These are GP models that stack GPs one after the other. As far as understanding, the best would be lectures as I have highlighted below.
π¨π½βπ« | π©π½βπ« Resources¶
Neil Lawrence @ MLSS 2019
I would say this is the best lecture to understand the nature of GPs and why we would might want to use them.
Neil Lawrence @ GPSS 2019
Maurizio Filippone @ DeepBayes.ru 2018
I would say this is the second best lecture because Maurizio gives a nice overview of the GP methods there already are (at the time).
Lecture | Slides | New Slides
Algorithms¶
The literature isnβt so big but there a number of different implementations depending on the lab:
- Variational Inference
This is the most popular method and has been pursued the most. It's also the implementation that you will find standard in libraries like GPyTorch, GPFlow and Pyro.
- Expectation Propagation
This group used expectation propagation to train the GP. They haven't really done so much since then and I'm not entirely sure why this line of the DGP has gone a bit dry. It would be nice if they resumed. I suspect it may be because of the software. I haven't seen too much software that focuses on clever expectation propagation schemes; they mainly focus on variational inference and MC sampling schemes.
- MC sampling
One lab has tackled this where you can use some variants of MC sampling to train a Deep GP. You'll find this standard in many GP libraries because it's fairly easy to integrate in almost any scheme. MC sampling is famous for being slow but the community is working on it. I imagine a break-through is bound to happen.
- Random Feature Expansions
This uses RFF to approximate a GP and then stacks these on top. I find this a big elegent and probably the simplest. But I didn't see too much research on the tiny bits of the algorithm like the training or the initialization procedures.
I donβt think there is any best one because Iβm almost certain noone has done any complete comparison. I can say that the VI one is the most studied because that lab is still working on it. In the meantime, personally I would try to use implementations in standard libraries where the devs have ironed out the bugs and allowed for easy customization and configuration; so basically the doubly stochastic.
Variational Inference¶
Deep Gaussian Processes - Damianou & Lawrence (2013)
This paper is the original method of Deep GPs. It might not be useful for production but there are still many insights to be had from the originators.
Nested Variational Compression in Deep Gaussian Processes - Hensman & Lawrence (2014)
Doubly Stochastic Variational Inference for Deep Gaussian Processes - Salimbeni & Deisenroth (2017)
This paper uses stochastic gradient descent for training the Deep GP. I think this achieves the state-of-the-art results thus far. It also has the most implementations in the standard literature.
-
Authors Code | Pyro | GPyTorch | GPFlow 2.0
Random Fourier Features¶
Random Feature Expansions for Deep Gaussian Processes - Cutjar et. al. (2017)
This implementation uses ideas from random fourier features in conjunction with Deep GPs.
MC Sampling¶
Learning deep latent Gaussian models with Markov chain Monte Carlo - Hoffman (2017)
Inference in Deep Gaussian Processes Using Stochastic Gradient Hamiltonian Monte Carlo - Havasi et. al. (2018)
Expectation Propagation¶
Deep Gaussian Processes for Regression using Approximate Expectation Propagation - Bui et. al. (2016)
This paper uses an approximate expectation method for the inference in Deep GPs.
Hybrids¶
Deep Gaussian Processes with Importance-Weighted Variational Inference - Salimbeni et. al. (2019)
This paper uses the idea that our noisy inputs are instead 'latent covariates' instead of additive noise or that our input itself is a latent covariate. They also propose a way to do importance sampling coupled with variational inference to improve single layer and multiple layer GPs and have shown that they can get equivalent or better results than just standard variational inference. The latent variables alone will improve performance for both the IWVI and the VI training procedures.
- Paper | Code | Video | Poster | ICML 2019 Slides | Workshop Slides
Misc¶
Interpretable Deep Gaussian Processes with Moments - Lu et. al. (2020)
-> Paper
Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the Predictive Uncertainties - LIngdinger et al (2020)
-> Paper
-> Code
Insights¶
Problems with Deep GPs¶
Deep Gaussian Process Pathologies
This paper shows how some of the kernel compositions give very bad estimates of the functions between layers; similar to how residual NN do much better.
Software¶
There are scattered implementations throughout the page but this will centralize and suggest the most common/reliable packages available.
Variational Inference
Pyro
- Blog Post by a Pyro Developer
Edward2
- Bayesian Layers - keras-like with Exact and sparse GPs
Expectation Propagation
Numpy
MCMC
Pyro
- Blog Post by a Pyro Developer
Random Feature Expansions
TensorFlow
Edward2
- Bayesian Layers - keras-like layers