Similarity Measures¶
Description
I am very interested in the notion of similarity: what it means, how can we estimate similarity and how does it work in practice. Below are some of the main projects I have been working on which include an empirical study, some applications and some software that was developed.
Kernel Parameter Estimation¶
In this project, I look at how we one can estimate the parameters of the RBF kernel for various variations of the HSIC method; kernel alignment and centered kernel alignment. Unsupervised kernel methods can suffer if the parameters are not estimated correctly. So I go through and empirically look at different ways we can represent our data and different ways we can estimate the parameters for the unsupervised kernel method.
I investigate the following questions:
- Will standardizing the data beforehand affect the results?
- How does the parameter estimator affect the results?
- Which variation of HSIC gives the best representation of the similarity (center the kernel, normalize the score)?
- How does this all compare to mutual information for known high-dimensional, multivariate distributions?
Important Links
Information Measures for Climate Model Comparisons¶
In this project, I used a Gaussianization model to look compare some CMIP5 models the spatial-temporal repre
Important Links
Information Measures for Drought Factors¶
Summary
In this project, I look at how we one can estimate the parameters of the RBF kernel for various variations of the HSIC method; kernel alignment and centered kernel alignment. In particular, I investigate the
Important Links
Software: PySim¶
Some highlights include:
- Scikit-Learn Format to allow for pipeline, cross-validation and scoring
- The HSIC and all of it's variations including the randomized implementation
- Some basics for visualizations using the Taylor Diagram
- Some other methods for estimating similarity