Classic Methods#


Parametric#

Assume Gaussian#


Single Variate#

\[ f(x) = \frac{1}{\sqrt{2\pi}\sigma}\exp\left( \frac{-x^2}{2\sigma^2} \right) \]
Entropy#
\[ h(X) = \frac{1}{2}\log (2\pi e\sigma^2) \]
\[\begin{split} \begin{aligned} h(X) &= - \int_\mathcal{X} f(X) \log f(X) dx \\ &= - \int_\mathcal{X} f(X) \log \left( \frac{1}{\sqrt{2\pi}\sigma}\exp\left( \frac{-x^2}{2\sigma^2} \right) \right)dx \\ &= - \int_\mathcal{X} f(X) \left[ -\frac{1}{2}\log (2\pi \sigma^2) - \frac{x^2}{2\sigma^2}\log e \right]dx \\ &= \frac{1}{2} \log (2\pi\sigma^2) + \frac{\sigma^2}{2\sigma^2}\log e \\ &= \frac{1}{2} \log (2\pi e \sigma^2) \end{aligned} \end{split}\]

From Scratch

def entropy_gauss(sigma: float) -> float:
    return np.log(2 * np.pi * np.e * sigma**2)

Numpy

from scipy import stats

H_g = stats.norm(scale=sigma).entropy()
  • Lecture 8: Density Estimation: Parametric Approach - Lecture Notes


Histogram#


Kernel Density Estimation#

Software


K-Nearest Neighbours#

  • Paper - k-NEAREST NEIGHBOUR KERNEL DENSITY ESTIMATION, THE CHOICE OF OPTIMAL k

  • Helpful Presentation

  • Lecture 7: Density Estimation: k-Nearest Neighbor and Basis Approach - Prezi

  • KNN Density Estimation, a slecture by Qi Wang - Vid

  • Non-parametric density estimation - 3: k nearest neighbor - Video

  • Mod-05 Lec-12 Nonparametric estimation, Parzen Windows, nearest neighbour methods - Video

  • Modal-set Estimation using kNN graphs, and Applications to Clustering - Video

Entropy#

The full entropy expression:

\[ \hat{H}(\mathbf{X}) = \psi(N) - \psi(k) + \log{c_d} + \frac{d}{N}\sum_{i=1}^{N} \log{\epsilon(i)} \]

where:

  • \(\psi\) - the digamma function.

  • \(c_d=\frac{\pi^{\frac{d}{2}}}{\Gamma(1+\frac{d}{2})}\)

  • \(\Gamma\) - is the gamma function

  • \(\epsilon(i)\) is the distance to the \(i^{th}\) sample to its \(k^{th}\) neighbour.