Kernels and Information Measures¶
This post will be based off of the paper from the following papers:
- Measures of Entropy from Data Using Infinitely Divisible Kernels - Giraldo et. al. (2014)
- Multivariate Extension of Matrix-based Renyi's
-order Entropy Functional - Yu et. al. (2018)
Short Overview¶
The following IT measures are possible with the scheme mentioned above:
- Entropy
- Joint Entropy
- Conditional Entropy
- Mutual Information
Kernel Matrices¶
Entropy¶
¶
In this case, we can show that for kernel matrices, the Renyi entropy formulation becomes the eigenvalue decomposition of the kernel matrix.
$$ \begin{aligned} H_1(x) &= \log \int_\mathcal{X}f^1(x) \cdot dx \
\end{aligned}$$
¶
In this case, we will have the Kernel Density Estimation procedure.
Note: We have to use the convolution theorem for Gaussian functions. Source |
Practically¶
We can calculate this above formulation by simply multiplying the kernel matrix
where
Cross Information Potential RKHS¶
- Cross Information Potential
Joint Entropy¶
This formula uses the above entropy formulation. To incorporate both r.v.'s
Note:
* The trace is there for normalization.
* The matrices
Multivariate¶
This extends to multiple variables. Let's say we have L variables, then we can calculate the joint entropy like so:
Conditional Entropy¶
This formula respects the traditional formula for conditional entropy; the joint entropy of r.v.
Mutual Information¶
The classic Shannon definition is the sum of the marginal entropies mines the intersection between the r.v.'s
The definition is the exactly the same and utilizes the entropy and joint entropy formulations above.
Multivariate¶
This can be extended to multiple variables. Let's use the same example for multi-variate solutions. Let's assume
Total Correlation¶
This is a measure of redundancy for multivariate data. It is basically the entropy of each of the marginals minus the joint entropy of the multivariate distribution. Let's assume we have