Gaussian Distributions¶
Univariate Gaussian¶
Multivariate Gaussian¶
Joint Gaussian Distribution¶
Marginal Distribution \mathcal{P}(\cdot)¶
We have the marginal distribution of x
and in integral form:
\mathcal{P}(x) = \int_y \mathcal{P}(x,y)dy
and we have the marginal distribution of y
Conditional Distribution \mathcal{P}(\cdot | \cdot)¶
We have the conditional distribution of x given y.
where:
- \mu_{a|b} = a + BC^{-1}(y-b)
- \Sigma_{a|b} = A - BC^{-1}B^T
and we have the marginal distribution of y given x
where:
- \mu_{b|a} = b + AC^{-1}(x-a)
- \Sigma_{b|a} = B - AC^{-1}A^T
basically mirror opposites of each other. But this might be useful to know later when we deal with trying to find the marginal distributions of Gaussian process functions.
Source:
- Sampling from a Normal Distribution - blog
A really nice blog with nice plots of joint distributions.
- Two was to derive the conditional distributions - stack
- How to generate Gaussian samples = blog
Multivariate Gaussians and Detereminant - Lecturee Notes
Bandwidth Selection¶
Scotts
sigma = np.power(n_samples, -1.0 / (d_dimensions + 4))
Silverman
sigma = np.power(n_samples * (d_dimensions + 2.0) / 4.0, -1.0 / (d_dimensions + 4)
Gaussian Distribution¶
PDF¶
Likelihood¶
Alternative Representation¶
where \mu is the mean function and \Sigma is the covariance. Let's decompose \Sigma as with an eigendecomposition like so
Now we can represent our Normal distribution as:
where:
- U is a rotation matrix
- \Lambda^{-1/2} is a scale matrix
- \mu is a translation matrix
- Z \sim \mathcal{N}(0,I)
or also
where:
- U is a rotation matrix
- \Lambda is a scale matrix
- \mu is a translation matrix
- Z_n \sim \mathcal{N}(0,\Lambda)
Reparameterization¶
So often in deep learning we will learn this distribution by a reparameterization like so:
where:
- \mu \in \mathbb{R}^{d}
- A \in \mathbb{R}^{d\times l}
- Z_n \sim \mathcal{N}(0, I)
- \Sigma=AA^\top - the cholesky decomposition
Entropy¶
1 dimensional
D dimensional H(X) = \frac{D}{2} + \frac{D}{2} \ln(2\pi) + \frac{1}{2}\ln|\Sigma|
KL-Divergence (Relative Entropy)¶
if \mu_1=\mu_0 then:
Mutual Information
where \rho_0 is the correlation matrix from \Sigma_0.