Kernel Functions¶

Extrapolation¶

One interesting problem that is related to uncertainty is how well this extrapolates for unseen regions (whether it is spatially or temporally).

Structures¶

Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) by Wilson & Nickisch (2015)

-> Code

-> Paper

Product Kernel Interpolation for Scalable Gaussian Processes by Gardner et. al. (2018)

-> Paper

-> GPyTorch Code

Deep Kernel Learning¶

This is a Probabilistic Neural Network (PNN). It's when we try to learn features through a Neural Network and then on the last layer, we fit a Gaussian Process. It's a great idea and I think that this has a lot of potential. One of the criticisms of people in the GP community (Bonilla et. al., 2016) is that we don't typically use very expressive kernels. That's where the power of GPs come from. So if we can have kernels from Neural Networks (one of the most expressive ML methods available to date), then we can get a potentially great ML algorithm. Even in practice, a developer have stated that we can get state-of-the-art results with some minimum tweaking of the architecture.

Comment

I've also heard this called "Deep Feature Extraction".
This is NOT a Deep GP. I've seen one paper that incorrectly called it that. A deep GP is where we stack GPs on top of each other. See the deep GP guide for more details.

Literature

Deep Kernel Learning - Wilson et. al. (2015)

-> Paper

-> GPyTorch

Stochastic Variational Deep Kernel learning - Wilson et. al. (2016)

-> Paper

-> GPyTorch

-> Pyro

-> GPFlow

A Representer Theorem for Deep Kernel Learning - Bohn et. al. (2019)

Misc¶

Function-Space Distributions over Kernels - Benton et. al. (2019)

Paper
Slides
Code (PyTorch, GPyTorch)
Poster

Smart Combination of Kernels¶

A GP would be better at extrapolating if ones uses kernel that is better defined for extrapolation.

For example, we can just use a combination of well defined kernels with actual thought into the trends we expect like they did for the classic Mauna dataset: (sklearn demo) , (tensorflow demo). They ended up using a combination of RBF + RBF * ExpSinSquared + RQ + RBF + White with arbitrary scalers in front of all of the terms. I have no clue how in the world they came up with that…

Fourier Basis Functions¶

In general any GP that is approximated with Fourier Basis functions will be good at finding periodic trends. For example the Sparse Spectrum GP (SSGP) (as you mentioned in your paper Gustau, pg68) (original paper for SSGP) is related to the Fourier features method but GP-ed.

Resources

Gustau Paper

Gustaus paper summarizing GPs in the context of Earth science. Briefly mentions SSGPs.
Presentation

A lab that does a lot of work on spectral kernels did a nice presentation with a decent summary of the literature.
SSGP Paper

original paper on SSGP by Titsias.
SSGP w. Uncertain Inputs Paper

Paper detailing how one can do the Taylor expansion to propagate the errors.
VSSGP Paper | Poster | ICML Presentation

Original paper by Yarin Gal and presentation about the variational approach to the SSGP method. The original author did the code in Theano so I don't recommend you use it nor try to understand it. It's ugly...
Variational Fourier Features for GPs - Paper

Original paper by James Hensman et al. using variational fourier features for GPs. To be honest, I'm still not 100% sure what the difference is between this method and the method by Yarin Gal... According to the paper they say: "Gal and Turner proposed variation inference in a sparse spectrum model that is derived form a GP model. Our work aims to directly approximate the posterior of the try models using a variational representation." I still don't get it.

Spectral Mixture Kernel¶

Or a kernel that was designed to include all the parameters necessary to find patterns like the spectral mixture kernel (paper, pg 2, eq. 12). Lastly, just use a neural network and slap a GP layer at the end and let the data tell you the pattern.

Resources

Original Paper

original paper with the derivation of the spectral mixture kernel.
Paper

some extenions to Multi-Task / Multi-Output / Multi-Fidelity problems
Thesis of Vincent Dutordoi (chapter 6, pg. 67)

Does the derivation for the spectral mixture kernel extension to incorporate uncertain inputs. Uses exact moment matching (which is an expensive operation...) but in theory, it should better propagate the uncertain inputs. It's also a really good thesis that explains how to propagate uncertain inputs through Gaussian processes (chapter 4, pg. 42). Warning: the equations are nasty...
GPyTorch Demo

The fastest implementation you'll find on the internet.

Other Kernels¶

Random Forest Density Kernel

Software¶

Multiple Kernel Learning - MKLpy
Kernel Methods - kernelmethods
pykernels > A huge suite of different python kernels.
kernpy

Library focused on statistical tests
keops

Use kernel methods on the GPU with autograd and without memory overflows. Backend of numpy and pytorch.
pyGPs

This is a GP library but I saw quite a few graph kernels implemented with different Laplacian matrices implemented.
megaman

A library for large scale manifold learning. I saw quite a few different Laplacian matrices implemented.