Coupling Layers#
Overview#
There are three ingredients to coupling layers:
A split function
A coupling function, \(\boldsymbol{h}\)
A condition function, \(\boldsymbol{\Theta}\)
Element-wise
Autoregressive
Coupling
Algorithm (TLDR)#
Step 1: Split the features into two disjoint sets.
Step 2: Apply identity to partial \(A\).
Step 3: Apply conditioner, \(\Theta\), to partition \(A\).
Step 4: Apply bijection, \(\boldsymbol{h}\), given the parameters from the conditioner, \(\boldsymbol{\Theta}\).
Step 5: Concatenate the two partitions, \(A,B\).
Formulation#
We have a bijective, diffeomorphic parameterized function, \(\boldsymbol{T}_{\boldsymbol \theta}\), which is a mapping from inputs, \(\mathbf{x} \in \mathbb{R}^D\), to some outputs, \(\mathbf{y} \in \mathbb{R}^D\), i.e. \(\boldsymbol{T}:\mathcal{X}\in\mathbb{R}^{D} \rightarrow \mathcal{Y}\in\mathbb{R}^D\). So more compactly, we can write this as:
Let’s partition the inputs, \(\mathbf{x}\), into two disjoint subspaces
where \(\mathbf{x}^A \in \mathbb{R}^{D_A}\) and \(\mathbf{x}^B \in \mathbb{R}^{D_B}\) where \(D = D_A + D_B\).
Now we do not transform the \(A\) features however we use a bijective, diffeomorphic coupling function transformation, \(\boldsymbol{h}\), where the parameters are given by a conditioner function, \(\boldsymbol{\Theta}: \mathbb{R}^{D_A} \rightarrow \mathbb{R}^{D_{\boldsymbol{\theta}}}\). So we can write this explicitly as:
where
Simplification#
Log Determinant Jacobian#
To see how we can calculate the log determinant jacobian (LDJ), we can demonstrate this with a partition.
And so we can simply showcase the LDJ for each of them
The \(C\) partition is a function of the input partition, \(\mathbf{x}^A\) but the derivative is wrt to
The Jacobian, \(\mathbf{J},\nabla\)
We end up with a very simple formulation
General Form#
We can write this more generally if we consider a masked transformation. This formulation was introduced in the original REALNVP paper.
Whirlwind Tour#
Additive#
where \(\theta_a \in \mathbb{R}\).
Affine#
where \(\theta_s \neq 0\) and \(\theta_a \in \mathbb{R}\).
Neural Splines#
Mixture CDF#
where \(\theta_p = \left[ \boldsymbol{\pi}, \boldsymbol{\mu}, \boldsymbol{s}\right] \in \mathbb{R}^K\times\mathbb{R}^{K}\times\mathbb{R}^K\) are the parameters of the mixture CDF function, \(F\). The function, \(F\), is the Mixture CDF Transformation given by:
Inverse CDF#
We have the inverse CDF transform function, \(\text{InvCDF}(x_u): [0, 1]\rightarrow \mathbb{R}\).
Flow++#
The Flow++ algorithm did a composition of the Mixture CDF Flow and the Affine Flow.
NAF#
Extensive Literature Review#
Coupling Layers#
The concept of coupling layers was introduced in the NICE paper whereby the authors used an additive coupling layer. They also coined this type of coupling layer as non-volume preserving because the logdetjacobian
is equal to 1 in this case. The REALNVP paper was later extended to affine coupling layers
Additive (NICE, 2014)
Affine (RealNVP, 2015)
Spline Function (NSF, 2018; LRS, 2020)
Neural Network (NAF,2018; BNAF, 2019)
Affine MixtureCDF (Flow++, 2019)
Hierarchical (HINT, 2019)
Incompressible (GIN, 2020)
Lop Sided (IRN, 2020)
Bipartite (DenseFlow, 2021)
MixtureCDF (Gaussianization, 2022)
Conditioners#
Fully Connected (NICE, 2014)
Convolutional (NICE, 2014)
ResNet (NSF, 2018)
(Gated) Self-Attention (Flow++, 2019)
Transformer (Nystroemer) (DenseFlow, 2021)
AutoEncoder (Highway) (OODF, 2020)
Equivariant (NeuralFlows, 2021)
Fourier Neural Operator (Gaussianization, 2022)
Wavelet Neural Operator (Gaussianization, 2022)
Masks#
Half (NICE, 2014)
Permutation (RealNVP, 2015)
Checkerboard (RealNVP, 2015)
Horizontal (OODFlows, 2020)
Vertical (OODFlows, 2020)
Center (OODFlows, 2020)
Cycle (OODFlows, 2020)
Augmentation (DenseFlow, 2021)