Linear Layers#
Overview#
Permutations#
\[
\boldsymbol{f}(\mathbf{x}) = \mathbf{Px}
\]
where \(\mathbf{P}\) is a permutation matrix.
Free#
\[
\boldsymbol{f} = \mathbf{Ax}
\]
Orthogonal#
\[
\boldsymbol{f}(\mathbf{x}) = \mathbf{Ax}
\]
where \(\mathbf{AA}\top = \mathbf{I}\). There are two ways to accomplish this:
Random sample st it is
QR decomposition
HouseHolder Parameterization
HouseHolder Parameterization#
Sylvester#
1x1 Convolution#
Convolutional Exponential#
Forward
\[
\boldsymbol{f}(\mathbf{x}) = \exp(\mathbf{M})\mathbf{x}
\]
Inverse
\[
\boldsymbol{f}^{-1}(\mathbf{x}) = \exp(-\mathbf{M})\mathbf{x}
\]
This is because \(\exp(x)^{-1} = \exp(-x)\) (Golinski et al, 2019).
Log Determinant Jacobian
\[
\boldsymbol{\nabla}_{\mathbf{x}}\boldsymbol{f}(\mathbf{x}) =\log |\exp(\mathbf{M})| = \text{trace}(\mathbf{M})
\]
We can apply the same strategy for convolutions.
\[
\mathbf{m}_e^* \mathbf{x} = \mathbf{x} + \frac{\mathbf{mx}}{1!} + \frac{\mathbf{m}(\mathbf{mx})}{2!} + \ldots
\]
All higher order terms need the same consecutive convolutions.