Information Theory Measures using the ITE Toolbox¶
- Author: J. Emmanuel Johnson
- Email: jemanjohnson34@gmail.com
- Date: 4^{\text{th}} September, 2019
Literature Review (what we previous did)¶
Entropy¶
In our experiments, we were only looking at Shannon entropy. It is the general case of Renyi's entropy as \alpha \rightarrow 1. We chose not to look at Renyi's entropy because we did not want to go down a rabbit hole of measures that we cannont understand nor justify. So we stuck to the basics. It's also important to keep in mind that we were looking at measures that could calculate the joint entropy; i.e. for multivariate, multi-dimensional datasets.
Algorithms¶
KnnK¶
This uses the KNN method to estimate the entropy. From what I understand, it's the simplest method that may have some issues at higher dimensions and large number of samples (normal with KNN estimators). In relation to the other standard methods of density estimation, it is the most robust in higher dimensions due to its adaptive-like binning.
- A new class of random vector entropy estimators and its applications in testing statistical hypotheses - Goria et. al. (2005) - Paper
- Nearest neighbor estimates of entropy - Singh et. al. (2003) - paper
- A statistical estimate for the entropy of a random vector - Kozachenko et. al. (1987) - paper
KDP¶
This is the logical progression from KnnK. It uses KD partitioning trees (KDTree) algorithm to speed up the calculations I presume.
- Fast multidimensional entropy estimation by k-d partitioning - Stowell & Plumbley (2009) - Paper
expF¶
This is the close-form expression for the Sharma-Mittal entropy calculation for expontial families. This estimates Y using the maximum likelihood estimation and then uses the analytical formula for the exponential family.
- A closed-form expression for the Sharma-Mittal entropy of exponential families - Nielsen & Nock (2012) - Paper
vME¶
This estimates the Shannon differential entropy (H) using the von Mises expansion.
- Nonparametric von Mises estimators for entropies, divergences and mutual informations - Kandasamy et. al. (2015) - Paper
Ensemble¶
Estimates the entropy from the average entropy estimations on groups of samples
This is a simple implementation with the freedom to choose the estimator estimate_H
.
# split into groups
for igroup in batches:
H += estimate_H(igroup)
H /= len(batches)
- High-dimensional mutual information estimation for image registration - Kybic (2004) - Paper
Potential New Experiments¶
Voronoi¶
Estimates Shannon entropy using Voronoi regions. Apparently it is good for multi-dimensional densities.
- A new class of entropy estimators for multi-dimensional densities - Miller (2003) - Paper
Mutual Information¶
Total Correlation¶
Code¶
import sys, os
cwd = os.getcwd()
sys.path.insert(0, f'{cwd}/../../src')
sys.path.insert(0, f'{cwd}/../../src/itetoolbox')
import numpy as np
import ite
# from data.load_TishbyData import load_TishbyData
%matplotlib inline
%load_ext autoreload
%autoreload 2
Data¶
10**(-2)
np.random.seed(123) # reproducibility
n_samples = 1000
d_dimensions = 3
# create dataset X
X = np.random.randn(n_samples, d_dimensions)
# do some random rotation
A = np.random.rand(d_dimensions, d_dimensions)
# create dataset Y
Y = X @ A
Entropy¶
Algorithms¶
KnnK¶
This uses the KNN method to estimate the entropy. From what I understand, it's the simplest method that may have some issues at higher dimensions and large number of samples (normal with KNN estimators).
- A new class of random vector entropy estimators and its applications in testing statistical hypotheses - Goria et. al. (2005) - Paper
- Nearest neighbor estimates of entropy - Singh et. al. (2003) - paper
- A statistical estimate for the entropy of a random vector - Kozachenko et. al. (1987) - paper
This method works by calculating the nearest neighbors formula
KDP¶
This is the logical progression from KnnK. It uses KD partitioning trees (KDTree) algorithm to speed up the calculations I presume.
- Fast multidimensional entropy estimation by k-d partitioning - Stowell & Plumbley (2009) - Paper
Algorithm¶
- Calculate the KNN Distances using the distance matrix
- Calculate the Volume of the unit ball wrt d_dimensions
- Calculate the entropy measure
H = \log (N - 1) - \psi(k) + \log (v) + D * \frac{1}{N} \sum \log D
def entropy_gaussian(C):
'''
Entropy of a gaussian variable with covariance matrix C
'''
if np.isscalar(C): # C is the variance
return .5*(1 + np.log(2*pi)) + .5*np.log(C)
else:
n = C.shape[0] # dimension
return .5*n*(1 + np.log(2*pi)) + .5*np.log(abs(det(C)))
Shannon Entropy (KNN/KDP)¶
from scipy.special import gamma
from sklearn.neighbors import NearestNeighbors
from typing import Optional
# volume of unit ball
def volume_unit_ball(d_dimensions: int)-> float:
"""Volume of the d-dimensional unit ball
Parameters
----------
d_dimensions : int
Number of dimensions to estimate the volume
Returns
-------
vol : float
The volume of the d-dimensional unit ball
"""
return ( np.pi**(.5 * d_dimensions) ) / gamma(.5 * d_dimensions + 1)
# KNN Distances
def knn_distance(X: np.ndarray, n_neighbors: int=20, algorithm: str='brute', n_jobs: int=-1, kwargs: Optional[dict]=None)-> np.ndarray:
"""Light wrapper around sklearn library.
Parameters
----------
X : np.ndarray, (n_samples x d_dimensions)
The data to find the nearest neighbors for.
n_neighbors : int, default=20
The number of nearest neighbors to find.
algorithm : str, default='brute',
The knn algorithm to use.
('brute', 'ball_tree', 'kd_tree', 'auto')
n_jobs : int, default=-1
The number of cores to use to find the nearest neighbors
kwargs : dict, Optional
Any extra keyword arguments.
Returns
-------
distances : np.ndarray, (n_samples x d_dimensions)
"""
if kwargs:
clf_knn = NearestNeighbors(
n_neighbors=n_neighbors,
algorithm=algorithm,
n_jobs=n_jobs,
**kwargs
)
else:
clf_knn = NearestNeighbors(
n_neighbors=n_neighbors,
algorithm=algorithm,
n_jobs=n_jobs,
)
clf_knn.fit(X);
dists, _ = clf_knn.kneighbors(X)
return dists
from sklearn.base import BaseEstimator
from sklearn.utils import gen_batches
class Ensemble:
def __init__(self):
pass
def _fit_ensemble(self, X: np.ndarray, batch_size: int=100)-> float:
Hs = list()
for idx in gen_batches(X.shape[0], batch_size, 10):
Hs.append(self._fit(X[idx]))
return np.mean(Hs)
class EntropyKNN(BaseEstimator, Ensemble):
def __init__(self, n_neighbors: int=20, algorithm: str='brute', n_jobs: int=-1, ensemble=False, batch_size=100, kwargs: Optional[dict]=None)-> None:
self.n_neighbors = n_neighbors
self.algorithm = algorithm
self.n_jobs = n_jobs
self.ensemble = ensemble
self.kwargs = kwargs
self.batch_size = batch_size
def fit(self, X: np.ndarray)-> BaseEstimator:
self.vol = volume_unit_ball(X.shape[1])
if self.ensemble:
self.H_x = self._fit_ensemble(X, self.batch_size)
else:
self.H_x = self._fit(X)
return self
def _fit(self, X: np.ndarray)-> float:
# 1. Calculate the K-nearest neighbors
dist = knn_distance(
X,
n_neighbors=self.n_neighbors,
algorithm=self.algorithm,
n_jobs=self.n_jobs,
kwargs=self.kwargs
)
return np.log(n_samples - 1) - psi(n_neighbors) + np.log(self.vol) + ( d_dimensions / n_samples) * np.log(dist[:, n_neighbors-1]).sum()
def score(self, X):
return self.H_x
# parameters (default)
n_neighbors = 20
algorithm = 'brute'
n_jobs = -1
ensemble = False
batch_size = 50
kwargs = {'metric': 'euclidean'}
# initialize it estimator
clf_knnK = EntropyKNN(
n_neighbors=n_neighbors,
algorithm=algorithm,
n_jobs=n_jobs,
ensemble=ensemble,
batch_size=batch_size,
kwargs=kwargs,
)
# estimate entropy
H_x = clf_knnK.fit(X).score(X)
H_y = clf_knnK.fit(Y).score(Y)
print(f"H(X): {H_x:.3f} bits")
print(f"H(Y): {H_y:.3f} bits")
ITE Toolbox implementation¶
# parameters (default)
mult = True
knn_method = 'cKDTree' # fast version (slower version KNN)
k_neighbors = 10 # free parameter
eps = 0.1 # free parameter
# initialize it estimator
clf_knnK = ite.cost.BHShannon_KnnK(
mult=mult,
knn_method=knn_method,
k=k_neighbors,
eps=eps
)
# estimate entropy
H_x = clf_knnK.estimation(X)
H_y = clf_knnK.estimation(Y)
print(f"H(X): {H_x:.3f} bits")
print(f"H(Y): {H_y:.3f} bits")
Shannon Entropy: expF¶
- A closed-form expression for the Sharma-Mittal entropy of exponential families - Nielsen & Nock (2012) - Paper
- Statistical exponential families: A digest with flash cards - Paper
Source Parameters
where \mu \in \mathbb{R}^{d} and \Sigma > 0
Parameters
Log Normalizer
Gradient Log Normalizer
Final Entropy Calculation
H = F(\Theta) - \langle F(\Theta), \nabla F(\Theta) \rangle
n_samples, n_dims = X.shape
# source params, theta
theta_1 = X.mean(axis=0)
theta_2 = np.cov(X.T)
print('Source:', theta_1.shape, theta_2.shape)
# natural params, eta
eta_1 = np.linalg.inv(theta_2) @ theta_1[:, None]
eta_2 = .5 * np.linalg.inv(theta_2)
print('Natural:', eta_1.shape, eta_2.shape)
# log-normalizer, F(eta)
f_eta = .25 * np.trace(eta_1.T @ np.linalg.inv(eta_2) @ eta_1) - .5 * np.linalg.slogdet(eta_2)[1] + (n_dims / 2.) * np.log(np.pi)
print('Log Norm:', f_eta.shape)
# gradient log normalizer, dF(eta)
df_eta_1 = .5 * np.linalg.inv(eta_2) @ eta_1
df_eta_2 = -.5 * np.linalg.inv(eta_2) - .25 * (np.linalg.inv(eta_2) - eta_1) @ (np.linalg.inv(eta_2) - eta_1).T
print('Grad Log Norm:', df_eta_1.shape, df_eta_2.shape)
# outer product
t2 = np.outer(np.outer(eta_1, df_eta_1), np.outer(eta_2, df_eta_2))
print(t2.shape)
def expF_entropy(X):
# estimate Gaussian parameters
mean = X.mean(axis=0)
cov = np.cov(X.T)
# make Gaussian distribution
norm_dist = stats.multivariate_normal(mean=mean, cov=cov, seed=seed)
# estimate the entropy from closed form solution
H_x = norm_dist.entropy()
return H_x
H_x = expF_entropy(X)
H_y = expF_entropy(Y)
print(f"H(X): {H_x:.3f} bits")
print(f"H(Y): {H_y:.3f} bits")
mean = X.mean(axis=0)
cov = np.cov(X.T)
seed = 1
norm_dist = stats.multivariate_normal(mean=mean, cov=cov, seed=seed)
H_x = norm_dist.entropy()
print(f"H(X): {H_x:.3f} bits")
# 1. estimate the maximum likelihood params
mean = X.mean(axis=0)[:, None]
cov = np.cov(X.T)
inv_cov = np.linalg.inv(cov)
alpha = inv_cov @ mean
sigma = inv_cov / 2
mean.shape, cov.shape, inv_cov.shape, t1.shape, t2.shape
# Log Normalizer (Maximum Like)
F = (1/4) * np.trace(np.linalg.inv(t2) @ t1 @ t1.T) - (1/2) * np.log(np.linalg.det(cov)) + (X.shape[1] / 2) * np.log(np.pi)
# Gradient Log Normalizer
alpha_grad =
mean.shape, cov.shape, inv_cov.shape, t1.shape, t2.shape
vME¶
This nonparametric method that estimates the Shannon differential entropy (H) using the von Mises expansion. This method has a fast convergence rate than the KDE and KNN methods. This algorithm does have and in addition the can be tuned using cross-validation techniques. It is also less expensive than the KDE in terms of the numerical integration whereas this method has closed form solutions for some families of von Mises expansions.
- Nonparametric von Mises estimators for entropies, divergences and mutual informations - Kandasamy et. al. (2015) - Paper
Ensemble¶
Estimates the entropy from the average entropy estimations on groups of samples
This is a simple implementation with the freedom to choose the estimator estimate_H
.
# split into groups
for igroup in batches:
H += estimate_H(igroup)
H /= len(batches)
- High-dimensional mutual information estimation for image registration - Kybic (2004) - Paper
Potential New Experiments¶
Voronoi¶
Estimates Shannon entropy using Voronoi regions. Apparently it is good for multi-dimensional densities.
- A new class of entropy estimators for multi-dimensional densities - Miller (2003) - Paper
Mutual Information¶
The pseudo-code is fairly simple (in the MATLAB version).
- Organize the components
XY = [X, Y]
- Estimate the joint entropy, H(XY)
H_xy = - estimate_H(
np.hstack(XY) # stack the vectors dimension-wise
)
-
Estimate the marginals of XY; i.e. estimate X and Y individually, then sum them.
H_x_y = np.sum( # estimate the entropy for each marginal [estimate_H(imarginal) for imarginal in XY] )
-
Summation of the two quantities
MI_XY = H_x_y + H_xy
Naive¶
class Ensemble:
def _fit_ensemble(self, X: np.ndarray, vol: float, batch_size: int=100)-> float:
Hs = list()
for idx in gen_batches(X.shape[0], batch_size, 10):
Hs.append(self._fit(X[idx], vol))
return np.mean(Hs)
class MutualInfoKNN(BaseEstimator):
def __init__(self, n_neighbors: int=20, algorithm: str='brute', n_jobs: int=-1, kwargs: Optional[dict]=None)-> None:
self.n_neighbors = n_neighbors
self.algorithm = algorithm
self.n_jobs = n_jobs
self.kwargs = kwargs
def fit(self, X: np.ndarray, Y: np.ndarray)-> BaseEstimator:
# Calculate Volumes
vol_xy = volume_unit_ball(X.shape[1] + Y.shape[1])
vol_x = volume_unit_ball(X.shape[1])
vol_y = volume_unit_ball(Y.shape[1])
# Calculate Joint Entropy
H_xy = self._fit(np.vstack([X, Y]), vol_xy)
# Calculate Marginal Probabilities
H_x = self._fit(X, vol_x)
H_y = self._fit(Y, vol_y)
# Calculate Mutual Information
self.MI = H_x + H_y - H_xy
return self
def _fit(self, X: np.ndarray, vol: float)-> float:
# 1. Calculate the K-nearest neighbors
dist = knn_distance(
X,
n_neighbors=self.n_neighbors,
algorithm=self.algorithm,
n_jobs=self.n_jobs,
kwargs=self.kwargs
)
return np.log(n_samples - 1) - psi(n_neighbors) + np.log(vol) + ( d_dimensions / n_samples) * np.log(dist[:, n_neighbors-1]).sum()
def score(self, X):
return self.MI
# parameters (default)
n_neighbors = 10
algorithm = 'brute'
n_jobs = -1
ensemble = False
batch_size = 50
kwargs = {'metric': 'euclidean'}
# initialize it estimator
clf_knnK = MutualInfoKNN(
n_neighbors=n_neighbors,
algorithm=algorithm,
n_jobs=n_jobs,
kwargs=kwargs,
)
# estimate entropy
MI_xy = clf_knnK.fit(X, Y).score(X)
print(f"H(X): {MI_xy:.3f} bits")
ITE Toolbox¶
# parameters (default)
mult = True # ??
kl_co_name = 'BDKL_KnnK' # KLD calculator
kl_co_pars = None # parameters for the KLD calculator
# initialize it estimator
clf_mi = ite.cost.MIShannon_DKL(
# mult=mult,
# kl_co_name=kl_co_name,
# kl_co_pars=kl_co_pars,
)
# concat data
XY = np.concatenate((X, Y), axis=1)
# individual dimensions per
sub_dimensions = np.array([X.shape[1], Y.shape[1]])
# estimate mutual information
mi_XY = clf_mi.estimation(XY, sub_dimensions)
print(f"MI(X,Y): {mi_XY:.3f} bits")
# parameters (default)
mult = True # ??
kernel = {'name': 'RBF','sigma': 1} # KLD calculator
eta = 0.01 # parameters for the KLD calculator
# initialize it estimator
clf_mi = ite.cost.BIKGV(
# mult=mult,
# kernel=kernel,
# eta=eta,
)
# concat data
XY = np.concatenate((X, Y), axis=1)
# individual dimensions per
sub_dimensions = np.array([X.shape[1], Y.shape[1]])
# estimate mutual information
mi_XY = clf_mi.estimation(XY, sub_dimensions)
print(f"MI(X,Y): {mi_XY:.3f} bits")
Total Correlation (Multi-Information, Co-Information)¶
I(x_1, x_2, \ldots, x_D) = \sum_{d=1}^D H(x_d) - H(X)
# parameters (default)
mult = True
kl_co_name = 'BDKL_KnnK'
kl_co_pars = None
# initialize it estimator
clf_mi = ite.cost.MIShannon_DKL(
mult=mult,
kl_co_name=kl_co_name,
kl_co_pars=kl_co_pars,
)
# concat data
sub_dimensions = np.array(range(X.shape[1]))
# estimate mutual information
tc_X = clf_mi.estimation(X, sub_dimensions)
tc_Y = clf_mi.estimation(Y, sub_dimensions)
print(f"Shannon Total Correlation, TC(X): {tc_X:.3f} bits")
print(f"Shannon Total Correlation, TC(Y): {tc_Y:.3f} bits")