Comparing HSIC versus Mutual Information measures¶
- Calculate kernel matrices for X and Y
- Center both kernel matrices
- Find the Frobenius norm between the kernel matrices
This works well but there is no certain way to estimate the parameter for each of the kernel matrices. There is another paper that is called the Kernel Tangent Alignment (KTA). This method is different as it is calculated like so:
- Calculate the kernel matrices for X and Y
- Find the Frobenius norm between the kernel matrices
- Normalize the value by the Frobenius norm of X and Y individually
This works in a similar way to the HSIC method. The difference is that you do the normalization procedure. The final algorithm is the Centered Kernel Tangent Alignment (cKTA) method which is a combination of the previous two methods. The algorithm is as follows:
- Calculate the kernel matrices for X and Y
- Center both kernel matrices
- Find the Frobenius norm between the kernel matrices
- Normalize the value by the Frobenius norm of X and Y individually
As you can see, it would appear that this method is the most complete in the sense that it incorporates all steps to ensure that our data is sufficiently scaled to a way that is directly comparable. This notebook will attempt to see which of these methods provides a good estimate for a sigma value for the kernel matrices and how do their output values compare to mutual information measures.
Covariance vs Correlation¶
The above methods can be put into perspective of the difference between covariance measures and correlation meausres. In this section, I will explain why that is so. We can write out the full definitions for covariance and correlation. The definition of covariance is:
The definition of correlation is:
The correlation measure is dimensionless whereas the covariance is in units obtained by multiplying the units of the two variables.
Source:
Code¶
import sys, os
import warnings
import tqdm
import random
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Insert path to model directory,.
cwd = os.getcwd()
path = f"{cwd}/../../src"
sys.path.insert(0, path)
# toy datasets
from data.toy import generate_dependence_data
# Kernel Dependency measure
from models.dependence import HSIC
from models.kernel import estimate_sigma, sigma_to_gamma, gamma_to_sigma, get_param_grid
# RBIG IT measures
from models.ite_algorithms import run_rbig_models
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
warnings.filterwarnings('ignore') # get rid of annoying warnings
%load_ext autoreload
%autoreload 2
Experiment Class¶
class ExperimentKTA:
def __init__(
self,
seed=123,
n_trials=10,
hsic_points=1000,
n_noise=10,
n_gamma=50,
factor=2,
mi_points=100_000,
sigma_est='silverman',
save_path=None,
save_name='test'
):
# fixed experimental params
self.seed = seed
self.hsic_points = hsic_points
self.n_noise = n_noise
self.n_gamma = n_gamma
self.factor = factor
self.mi_points = mi_points
self.sigma_est = sigma_est
self.n_trials = n_trials
self.save_path = save_path
self.save_name = save_name
# free experimental params
self.noise_params = np.logspace(-3, -.3, n_noise)
self.func_params = [
'line',
'sine',
'circ',
'rand',
]
self.scorers = [
'hsic',
'tka',
'ctka',
]
self.seeds = [i for i in range(1, n_trials + 1)]
# saved dataframe
pass
def run_experiment(self):
# initialize results dataframe
self.results_df = self.generate_results_df()
# Loop through functions
for ifunction in self.func_params:
print(f"Function: {ifunction}")
# Loop through noise parameters
for inoise in self.noise_params:
# Loop through random seeds
for iseed in self.seeds:
# generate data for MI measure
X, Y = self._generate_data(
inoise, ifunction, iseed, dataset='mi'
)
# calculate MI
mi_score, _ = run_rbig_models(
X, Y,
measure='mi',
verbose=None,
random_state=self.seed
)
# =======================
# HSIC MEASURES
# =======================
# initialize sigma
init_sigma, sigma_params = self._estimate_sigmas(X, Y)
# convert to gamma
init_gamma = sigma_to_gamma(init_sigma)
gamma_params = sigma_to_gamma(sigma_params)
# Loop through HSIC scoring methods
for hsic_method in self.scorers:
# Loop through gamma parameters
for igamma in gamma_params:
# generate data for MI measure
X, Y = self._generate_data(
inoise, ifunction, iseed, dataset='hsic'
)
# Calculate HSIC
hsic_score = self._get_hsic(X, Y, igamma, None, hsic_method)
# append results to results dataframe
self.results_df = self.append_results(
self.results_df,
ifunction,
iseed,
inoise,
init_gamma,
igamma,
hsic_method,
hsic_score,
mi_score
)
# save results to csv
self.save_data(self.results_df)
return self
def _generate_data(self, noise, function, seed, dataset='hsic'):
if dataset == 'hsic':
num_points = self.hsic_points
elif dataset == 'mi':
num_points = self.mi_points
else:
raise ValueError(f'Unrecoginized dataset: {dataset}')
# get dataset
X, Y = generate_dependence_data(
dataset=function,
num_points=num_points,
seed=seed,
noise_x=noise,
noise_y=noise
)
return X, Y
def _estimate_sigmas(self, X, Y):
# estimate initialize sigma
sigma_x = estimate_sigma(X, method=self.sigma_est)
sigma_y = estimate_sigma(Y, method=self.sigma_est)
# init overall sigma is mean between two
init_sigma = np.mean([sigma_x, sigma_y])
# get parameter grid
sigmas = get_param_grid(init_sigma, self.factor, self.n_gamma)
# convert to gammas
return init_sigma, sigmas
def _get_hsic(self, X, Y, gamma, subsample, scorer):
# initialize hsic
clf_hsic = HSIC(gamma=gamma, scorer=scorer, subsample=subsample)
# calculate HSIC value
clf_hsic.fit(X, Y)
# return HSIC score
return clf_hsic.score(X)
def generate_results_df(self):
return pd.DataFrame(columns=[
'trial',
'function',
'noise',
'init_gamma',
'gamma',
'scorer',
'value',
'mi',
])
def append_results(
self,
results_df,
function,
trial,
noise,
init_gamma,
gamma,
hsic_method,
hsic_score,
mi_score
):
# append data
return results_df.append({
'function': function,
'trial': trial,
'noise': noise,
'init_gamma': init_gamma,
'gamma': gamma,
'scorer': hsic_method,
'value': hsic_score,
'mi': mi_score
}, ignore_index=True)
def load_data(self):
pass
def save_data(self, results_df):
results_df.to_csv(f"{self.save_path}{self.save_name}.csv")
Experiment Run¶
# experimental params
seed = 123 # reproducibility
n_trials = 1 # number of trials
hsic_points = 10_000 # number of points used for HSIC
mi_points = 10_000 # number of points used for MI
n_noise = 50 # number of points in noise param grid
n_gamma = 50 # number of points in gamma param grid
factor = 1 # log factor for gamma param grid bounds
sigma_est = 'mean' # sigma initialization
save_path = f'{cwd}/../../results/hsic/'
save_name = 'trial_v2_s10k'
# initialize experiment class
clf_exp = ExperimentKTA(
seed=seed,
n_trials=n_trials,
hsic_points=hsic_points,
factor=factor,
mi_points=mi_points,
sigma_est=sigma_est,
n_noise=n_noise,
n_gamma=n_gamma,
save_path=save_path,
save_name=save_name,
)
# run full experiment
clf_exp.run_experiment()
projects/2019_hsic_align/results/hsic
Example¶
Visualize the Results¶
Figure I - Mutual Information & Noise¶
This first figure is to demonstrate how the mutual information compares with the amount of noise for each of the functions Linear, Sinusoidal, Circular, and Random.
res_noise = results_df['noise'].unique().tolist()
res_gammas = results_df['gamma'].unique().tolist()
res_lines = results_df['function'].unique().tolist()
fig, ax = plt.subplots()
ax.scatter(
results_df[results_df['function'] == 'line']['noise'],
results_df[results_df['function'] == 'line']['mi'],
label='Line'
)
plt.xscale('log')
plt.yscale('log')
ax.scatter(
results_df[results_df['function'] == 'sine']['noise'],
results_df[results_df['function'] == 'sine']['mi'],
label='Sine'
)
plt.xscale('log')
plt.yscale('log')
ax.scatter(
results_df[results_df['function'] == 'circ']['noise'],
results_df[results_df['function'] == 'circ']['mi'],
label='Circle'
)
plt.xscale('log')
plt.yscale('log')
ax.scatter(
results_df[results_df['function'] == 'rand']['noise'],
results_df[results_df['function'] == 'rand']['mi'],
label='Random'
)
plt.xscale('log')
plt.yscale('log')
ax.set_xlabel('Noise, $\sigma_y$')
ax.set_ylabel('Mutual Information, MI$(X,Y)$')
# ax.set_xscale('log')
plt.legend()
ax.set_title('Experimental Parameter Space')
plt.show()
This has the most amount of MI with little noise but log-linearly decreases as we increase the noise.
Sine
This has the 2nd most amount of MI with little noise. It also decreases as wee increase the noise. Curiously, we find that it has more MI than the linear function at some point around 10^{-2} amount of noise. It could be due to the amount of structure still present within the sine function.
Circle
The circle has less MI overall than the linear and sine function and also decreases with added noise.
Random
There is no mutual information between them no matter how much more noise we inject.
Plot Function¶
def plot_res_gamma(results_df, function='line', hsic_method='hsic'):
save_path = f'{cwd}/../../results/hsic/figures/'
# Set Title stuff
if function == 'line':
title = 'Linear Function'
elif function == 'sine':
title = 'Sine Function'
elif function == 'circ':
title = 'Circle Function'
elif function == 'rand':
title = 'Random Function'
else:
raise ValueError(f'Unrecognized function: {line}')
sub_results_df = results_df[results_df['function'] == function]
free_params = [
# 'gamma',
'function'
]
fixed_params = [
'gamma',
'value',
'method',
'mi'
]
groups = sub_results_df.groupby(free_params)
hue = 'gamma'
fig, ax = plt.subplots(nrows=1, figsize=(7, 3))
for iparams, idata in groups:
# Plot I - HSIC
pts = ax.scatter(
x=idata[idata['scorer']== hsic_method]['value'],
y=idata[idata['scorer']== hsic_method]['mi'],
c=idata[idata['scorer']== hsic_method]['gamma'],
s=20, cmap='Spectral',
norm=matplotlib.colors.LogNorm()
)
ax.set_xlabel( hsic_method.upper() )
ax.set_ylabel('Mutual Information')
fig.colorbar(pts, ax=ax, label='Gamma')
# ax[0].get_legend().remove()
# ax[1].get_legend().remove()
# ax[2].get_legend().remove()
ax.set_yscale('log')
ax.set_title(title)
plt.tight_layout()
plt.show()
fig.savefig(f"{save_path}trialv1_{function}_{hsic_method}.png")
return None
Figure II - MI vs Gamma vs HSIC (Linear Function)¶
plot_res_gamma(results_df, function='line', hsic_method='hsic')
plot_res_gamma(results_df, function='line', hsic_method='tka')
plot_res_gamma(results_df, function='line', hsic_method='ctka')
The scales for HSIC method is different than the other two methods. Both KTA and cKTA are bounded at 1 whereas HSIC is only bounded at 0. Or at least from what we can see from the graphs given. It's a slightly different interpretation in the sense that c/KTA will model if they are the same whereas HSIC models if they are
Observation II: The shape of the parameter space
We can see in the figures above that each of the HSIC estimators have different overall shapes. The HSIC has a minimum of 0 always but the others have a minimum of 1. KTA has a maximum of 1 always but cKTA's maximum changes as more noise is inputed into the system. I would say that the shapes for the maximum values for HSIC and the minimum values of c/KTA are similar with respect to the amount of noise but HSIC is inverted than the others.
Observation III: Reasonable values for minimum noise
The c/KTA methods have values close to 1 for almost all gamma values until the MI goes to about 3. After that, the values spread out more. HSIC has a large range of values that can be found but it seems that only with noise do the range of values decrease.
Case II - Sine¶
plot_res_gamma(results_df, function='sine', hsic_method='hsic')
plot_res_gamma(results_df, function='sine', hsic_method='tka')
plot_res_gamma(results_df, function='sine', hsic_method='ctka')
It appears that there is a maximum for the HSIC and the cKTA but not for the KTA. Yeet I think the max of the cKTA corresponds to the same values as the KTA.
Case III - Circle¶
plot_res_gamma(results_df, function='circ', hsic_method='hsic')
plot_res_gamma(results_df, function='circ', hsic_method='tka')
plot_res_gamma(results_df, function='circ', hsic_method='ctka')
The CTKA and the KTA both don't seem to have a maximum for higher values of MI.
Case IV - Random¶
plot_res_gamma(results_df, function='rand', hsic_method='hsic')
plot_res_gamma(results_df, function='rand', hsic_method='tka')
plot_res_gamma(results_df, function='rand', hsic_method='ctka')
Both the HSIC and the cKTA's range of gamma values seem reasonable.