Kernels and Information Measures¶
This post will be based off of the paper from the following papers:
- Measures of Entropy from Data Using Infinitely Divisible Kernels - Giraldo et. al. (2014)
- Multivariate Extension of Matrix-based Renyi's \alpha-order Entropy Functional - Yu et. al. (2018)
Short Overview¶
The following IT measures are possible with the scheme mentioned above:
- Entropy
- Joint Entropy
- Conditional Entropy
- Mutual Information
Kernel Matrices¶
Entropy¶
\alpha=1¶
In this case, we can show that for kernel matrices, the Renyi entropy formulation becomes the eigenvalue decomposition of the kernel matrix.
$$ \begin{aligned} H_1(x) &= \log \int_\mathcal{X}f^1(x) \cdot dx \
\end{aligned}$$
\alpha=2¶
In this case, we will have the Kernel Density Estimation procedure.
Note: We have to use the convolution theorem for Gaussian functions. Source |
Practically¶
We can calculate this above formulation by simply multiplying the kernel matrix K_x by the vector 1_N.
where \mathbf{1}_N \in \mathbf{R}^{1 \times N}. The quantity \mathbf{1}_N^\top \mathbf{K}_x \mathbf{1}_N is known as the information potential, V.
Cross Information Potential RKHS¶
- Cross Information Potential \mathcal{V}
Joint Entropy¶
This formula uses the above entropy formulation. To incorporate both r.v.'s X,Y, we construct two kernel matrices A,B respectively
Note: * The trace is there for normalization. * The matrices A,B have to be the same size (due to the Hadamard product).
Multivariate¶
This extends to multiple variables. Let's say we have L variables, then we can calculate the joint entropy like so:
Conditional Entropy¶
This formula respects the traditional formula for conditional entropy; the joint entropy of r.v. X,Y minus the entropy of r.v. Y, (H(X|Y) = H(X,Y) - H(Y)). Assume we have the kernel matrix for r.v. X as A and the kernel matrix for r.v. Y as B. The following formula shows how this is calculated using kernel functions.
Mutual Information¶
The classic Shannon definition is the sum of the marginal entropies mines the intersection between the r.v.'s X,Y, i.e. MI(X;Y)=H(X)+H(Y)-H(X,Y). The following formula shows the MI with kernels:
The definition is the exactly the same and utilizes the entropy and joint entropy formulations above.
Multivariate¶
This can be extended to multiple variables. Let's use the same example for multi-variate solutions. Let's assume B is univariate but A is multivariate, i.e. A \in \{A_1, A_2, \ldots, A_L \}. We can write the MI as:
Total Correlation¶
This is a measure of redundancy for multivariate data. It is basically the entropy of each of the marginals minus the joint entropy of the multivariate distribution. Let's assume we have A as a multivarate distribution, i.e. A \in \{A_1, A_2, \ldots, A_L \}. Thus we can write this distribution using Kernel matrices: