Source

Similarity Measures¶

Description

I am very interested in the notion of similarity: what it means, how can we estimate similarity and how does it work in practice. Below are some of the main projects I have been working on which include an empirical study, some applications and some software that was developed.

Kernel Parameter Estimation¶

In this project, I look at how we one can estimate the parameters of the RBF kernel for various variations of the HSIC method; kernel alignment and centered kernel alignment. Unsupervised kernel methods can suffer if the parameters are not estimated correctly. So I go through and empirically look at different ways we can represent our data and different ways we can estimate the parameters for the unsupervised kernel method.

I investigate the following questions:

Will standardizing the data beforehand affect the results?
How does the parameter estimator affect the results?
Which variation of HSIC gives the best representation of the similarity (center the kernel, normalize the score)?
How does this all compare to mutual information for known high-dimensional, multivariate distributions?

Important Links

Information Measures for Climate Model Comparisons¶

In this project, I used a Gaussianization model to look compare some CMIP5 models the spatial-temporal repre

Important Links

Information Measures for Drought Factors¶

Summary

In this project, I look at how we one can estimate the parameters of the RBF kernel for various variations of the HSIC method; kernel alignment and centered kernel alignment. In particular, I investigate the

Important Links

Software: PySim¶

Some highlights include:

Scikit-Learn Format to allow for pipeline, cross-validation and scoring
The HSIC and all of it's variations including the randomized implementation
Some basics for visualizations using the Taylor Diagram
Some other methods for estimating similarity

Important Links

Github Repository