Gaussian Distribution

PDF

\[f(X)= \frac{1}{\sqrt{(2\pi)^D|\Sigma|}} \text{exp}\left( -\frac{1}{2} (x-\mu)^\top \Sigma^{-1} (x-\mu)\right)\]

Likelihood

\[- \ln L = \frac{1}{2}\ln|\Sigma| + \frac{1}{2}(x-\mu)^\top \Sigma^{-1} (x - \mu) + \frac{D}{2}\ln 2\pi \]

Alternative Representation

\[X \sim \mathcal{N}(\mu, \Sigma)\]

where $\mu$ is the mean function and $\Sigma$ is the covariance. Let's decompose $\Sigma$ as with an eigendecomposition like so

\[\Sigma = U\Lambda U^\top = U \Lambda^{1/2}(U\Lambda^{-1/2})^\top\]

Now we can represent our Normal distribution as:

\[X \sim \mu + U\Lambda^{1/2}Z\]

where:

$U$ is a rotation matrix
$\Lambda^{-1/2}$ is a scale matrix
$\mu$ is a translation matrix
$Z \sim \mathcal{N}(0,I)$

or also

\[X \sim \mu + UZ\]

where:

$U$ is a rotation matrix
$\Lambda$ is a scale matrix
$\mu$ is a translation matrix
$Z_n \sim \mathcal{N}(0,\Lambda)$

Reparameterization

So often in deep learning we will learn this distribution by a reparameterization like so:

\[X = \mu + AZ \]

where:

$\mu \in \mathbb{R}^{d}$
$A \in \mathbb{R}^{d\times l}$
$Z_n \sim \mathcal{N}(0, I)$
$\Sigma=AA^\top$ - the cholesky decomposition

Entropy

1 dimensional

\[H(X) = \frac{1}{2} \log(2\pi e \sigma^2)\]

D dimensional $$H(X) = \frac{D}{2} + \frac{D}{2} \ln(2\pi) + \frac{1}{2}\ln|\Sigma|$$

KL-Divergence (Relative Entropy)

\[ D_\text{KL}(\mathcal{N}_0||\mathcal{N}_1) = \frac{1}{2} \left[ \text{tr}(\Sigma_1^{-1}\Sigma_0) + (\mu_1 - \mu_0)^\top \Sigma_1^{-1} (\mu_1 - \mu_0) - D + \ln \frac{|\Sigma_1|}{|\Sigma_0|} \right] \]

if $\mu_1=\mu_0$ then:

\[ D_\text{KL}(\Sigma_0||\Sigma_1) = \frac{1}{2} \left[ \text{tr}(\Sigma_1^{-1} \Sigma_0) - D + \ln \frac{|\Sigma_1|}{|\Sigma_0|} \right] \]

Mutual Information

\[I(X)= - \frac{1}{2} \ln | \rho_0 |\]

where $\rho_0$ is the correlation matrix derived from $\Sigma_0$.