Skip to content
Source

Similarity Measures

Description

I am very interested in the notion of similarity: what it means, how can we estimate similarity and how does it work in practice. Below are some of the main projects I have been working on which include an empirical study, some applications and some software that was developed.


Kernel Parameter Estimation

In this project, I look at how we one can estimate the parameters of the RBF kernel for various variations of the HSIC method; kernel alignment and centered kernel alignment. Unsupervised kernel methods can suffer if the parameters are not estimated correctly. So I go through and empirically look at different ways we can represent our data and different ways we can estimate the parameters for the unsupervised kernel method.

I investigate the following questions:

  • Will standardizing the data beforehand affect the results?
  • How does the parameter estimator affect the results?
  • Which variation of HSIC gives the best representation of the similarity (center the kernel, normalize the score)?
  • How does this all compare to mutual information for known high-dimensional, multivariate distributions?
Important Links

Information Measures for Climate Model Comparisons

In this project, I used a Gaussianization model to look compare some CMIP5 models the spatial-temporal repre

Important Links

Information Measures for Drought Factors

Summary

In this project, I look at how we one can estimate the parameters of the RBF kernel for various variations of the HSIC method; kernel alignment and centered kernel alignment. In particular, I investigate the

Important Links

Software: PySim

Some highlights include:

  • Scikit-Learn Format to allow for pipeline, cross-validation and scoring
  • The HSIC and all of it's variations including the randomized implementation
  • Some basics for visualizations using the Taylor Diagram
  • Some other methods for estimating similarity
Important Links