Kernel Functions¶
Extrapolation¶
One interesting problem that is related to uncertainty is how well this extrapolates for unseen regions (whether it is spatially or temporally).
Structures¶
Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) by Wilson & Nickisch (2015)
-> Code
-> Paper
Product Kernel Interpolation for Scalable Gaussian Processes by Gardner et. al. (2018)
-> Paper
-> GPyTorch Code
Deep Kernel Learning¶
This is a Probabilistic Neural Network (PNN). It's when we try to learn features through a Neural Network and then on the last layer, we fit a Gaussian Process. It's a great idea and I think that this has a lot of potential. One of the criticisms of people in the GP community (Bonilla et. al., 2016) is that we don't typically use very expressive kernels. That's where the power of GPs come from. So if we can have kernels from Neural Networks (one of the most expressive ML methods available to date), then we can get a potentially great ML algorithm. Even in practice, a developer have stated that we can get state-of-the-art results with some minimum tweaking of the architecture.
Literature
Stochastic Variational Deep Kernel learning - Wilson et. al. (2016)
-> Paper
-> GPyTorch
-> Pyro
-> GPFlow
- A Representer Theorem for Deep Kernel Learning - Bohn et. al. (2019)
Misc¶
Function-Space Distributions over Kernels - Benton et. al. (2019)
Smart Combination of Kernels¶
A GP would be better at extrapolating if ones uses kernel that is better defined for extrapolation.
For example, we can just use a combination of well defined kernels with actual thought into the trends we expect like they did for the classic Mauna dataset: (sklearn demo) , (tensorflow demo). They ended up using a combination of RBF
+ RBF * ExpSinSquared
+ RQ
+ RBF
+ White
with arbitrary scalers in front of all of the terms. I have no clue how in the world they came up with that…
Fourier Basis Functions¶
In general any GP that is approximated with Fourier Basis functions will be good at finding periodic trends. For example the Sparse Spectrum GP (SSGP) (as you mentioned in your paper Gustau, pg68) (original paper for SSGP) is related to the Fourier features method but GP-ed.
Resources
- Gustau Paper
Gustaus paper summarizing GPs in the context of Earth science. Briefly mentions SSGPs.
- Presentation
A lab that does a lot of work on spectral kernels did a nice presentation with a decent summary of the literature.
- SSGP Paper
original paper on SSGP by Titsias.
- SSGP w. Uncertain Inputs Paper
Paper detailing how one can do the Taylor expansion to propagate the errors.
- VSSGP Paper | Poster | ICML Presentation
Original paper by Yarin Gal and presentation about the variational approach to the SSGP method. The original author did the code in Theano so I don't recommend you use it nor try to understand it. It's ugly...
- Variational Fourier Features for GPs - Paper
Original paper by James Hensman et al. using variational fourier features for GPs. To be honest, I'm still not 100% sure what the difference is between this method and the method by Yarin Gal... According to the paper they say: "Gal and Turner proposed variation inference in a sparse spectrum model that is derived form a GP model. Our work aims to directly approximate the posterior of the try models using a variational representation." I still don't get it.
Spectral Mixture Kernel¶
Or a kernel that was designed to include all the parameters necessary to find patterns like the spectral mixture kernel (paper, pg 2, eq. 12). Lastly, just use a neural network and slap a GP layer at the end and let the data tell you the pattern.
Resources
- Original Paper
original paper with the derivation of the spectral mixture kernel.
- Paper
some extenions to Multi-Task / Multi-Output / Multi-Fidelity problems
- Thesis of Vincent Dutordoi (chapter 6, pg. 67)
Does the derivation for the spectral mixture kernel extension to incorporate uncertain inputs. Uses exact moment matching (which is an expensive operation...) but in theory, it should better propagate the uncertain inputs. It's also a really good thesis that explains how to propagate uncertain inputs through Gaussian processes (chapter 4, pg. 42). Warning: the equations are nasty...
- GPyTorch Demo
The fastest implementation you'll find on the internet.
Other Kernels¶
Software¶
- Multiple Kernel Learning - MKLpy
- Kernel Methods - kernelmethods
- pykernels > A huge suite of different python kernels.
- kernpy
Library focused on statistical tests
- keops
Use kernel methods on the GPU with autograd and without memory overflows. Backend of numpy and pytorch.
- pyGPs
This is a GP library but I saw quite a few graph kernels implemented with different Laplacian matrices implemented.
- megaman
A library for large scale manifold learning. I saw quite a few different Laplacian matrices implemented.
Comment