Skip to content
Source

Sparse GPs

πŸ“œ Papers

Subset Methods

All of these methods in some way shape or form, are trying to reduce the size of the kernel matrix.

Nystrom Approximation

-> Using Nystrom to Speed Up Kernel Machines- Williams & Seeger (2001)


-> scikit-learn

-> KRR Setting

-> Nice Blog (Slow to Load)

Random Fourier Features
FastFood

-> Fastfood: Approximate Kernel Expansions in Loglinear Time by Viet Le et. al. (2014) | Video | Code

-> A la Carte - Learning Fast Kernels by Yang et. al. (2014)

-> Efficient Approximate Inference with Walsh-Hadamard Variational Inference by Rossi et. al. (2020)

Mixture of Experts

Deep Structured Mixtures of Gaussian Processes - Trapp et. al. (2020)

-> Paper

-> Cde

Inducing Points

Fully Independent Training Conditional (FITC)

-> Sparse Gaussian Processes Using Pseudo-Inputs - Snelson and Ghahramani (2006)

-> Flexible and Efficient GP Models for Machine Learning - Snelson (2007)

-> Variational Orthogonal Features - by Burt et al (2020)

Parametric Gaussian Process Regressors by Jankowiak et. al. (2020)

-> Paper

-> GPyTorch Code-The-Predictive-Log-Likelihood)

Rethinking Sparse Gaussian Processes: Bayesian Approaches to Inducing-Variable Approximations by Rossi et al (2020)

-> Paper

Posterior Approximation

Variational Free Energy

Variational Learning of Inducing Variables in Sparse GPs - Titsias (2009)

The OG for this method. You'll see this paper cited a lot!

-> Paper

Understanding Probabilistic Sparse GP Approx - Bauer et. al. (2016)

A good paper which highlights some import differences between the FITC, DTC and VFE. It provides a clear notational differences and also mentions how VFE is a special case of DTC.

-> Paper

Other Papers

-> On Sparse Variational meethods and the KL Divergence between Stochastic Processes - Matthews et. al. (2015)

Stochastic Variational Inference (SVI)

Gaussian Processes for Big Data - Hensman et al. (2013)

-> Paper

Expectation Propagation (EP)

A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation - Bui (2017)

A good summary of all of the methods under one unified framework called the Power Expectation Propagation formula.

-> Paper


-> Code: Exact and Sparse Power EP

-> Updated | Other

-> Related Code

Variational

Rates of Convergence for Sparse Variational Gaussian Process Regression - Burt et. al. (2019)

All you need to do is cite this paper whenever people don't believe that Sparse GPs aren't good at approximating Exact GPs.

-> Paper | πŸ’» Code -> Convergence of Sparse Variational Inference in Gaussian Processes Regression | Code


Latest


Other

Adversarial Robustness Guarantees for Classification with Gaussian Processes - Blass et. al. (2020)

-> Paper


Thesis Explain

Often times the papers that people publish in conferences in Journals don't have enough information in them. Sometimes it's really difficult to go through some of the mathematics that people put in their articles especially with cryptic explanations like "it's easy to show that..." or "trivially it can be shown that...". For most of us it's not easy nor is it trivial. So I've included a few thesis that help to explain some of the finer details. I've arranged them in order starting from the easiest to the most difficult.

Sparse Gaussian Process Approximations and Applications by Van der Wilk (2018)

-> Thesis

Presentations

Notes

Gory Details Blogs

Some resources that break down some of the intricate mathematical details that are sometimes lost within the literature.

Code Examples

Some examples where people have implemented the algorithms very didactically.

  • SVGP - Recyclable GP