Linear Layers

Linear Layers#

\[ \boldsymbol{f}(\mathbf{x}) = \mathbf{Px} \]

where \(\mathbf{P}\) is a permutation matrix.

\[ \boldsymbol{f} = \mathbf{Ax} \]

\[ \boldsymbol{f}(\mathbf{x}) = \mathbf{Ax} \]

where \(\mathbf{AA}\top = \mathbf{I}\). There are two ways to accomplish this:

Forward

\[ \boldsymbol{f}(\mathbf{x}) = \exp(\mathbf{M})\mathbf{x} \]

Inverse

\[ \boldsymbol{f}^{-1}(\mathbf{x}) = \exp(-\mathbf{M})\mathbf{x} \]

This is because \(\exp(x)^{-1} = \exp(-x)\) (Golinski et al, 2019).

Log Determinant Jacobian

\[ \boldsymbol{\nabla}_{\mathbf{x}}\boldsymbol{f}(\mathbf{x}) =\log |\exp(\mathbf{M})| = \text{trace}(\mathbf{M}) \]

We can apply the same strategy for convolutions.

\[ \mathbf{m}_e^* \mathbf{x} = \mathbf{x} + \frac{\mathbf{mx}}{1!} + \frac{\mathbf{m}(\mathbf{mx})}{2!} + \ldots \]

All higher order terms need the same consecutive convolutions.