Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Fair Gaussianization — input-side follow-up experiments

Seven alternatives that put the flow on inputs instead of outputs

Fair Gaussianization — follow-up experiments

0. Why a follow-up

The original experiment puts the Gaussianization flow on the predictor’s output sideTz(fθ(X))T_z(f_\theta(X)) and Tq(q)T_q(q) — and measures their dependence as a training-time fairness penalty. That works, but it has two structural problems we surfaced empirically in notebooks 06–07:

  1. Moving target on TzT_z. The predictor’s output distribution shifts during training, so TzT_z (pretrained on a baseline) goes off-support. Gradients keep flowing, but they encode “distance from the baseline distribution” rather than “distance from independence.” G-TC’s constant-predictor collapse is the symptom.
  2. Inputs are untouched. XX never moves during predictor training, so a flow on XX is always in-support. We’re not using that.

This doc proposes seven follow-up experiments that move the flow’s role around the pipeline. Three of them are pure preprocessing (the flow runs once, offline); two are training-time but on stable quantities; one is a counterfactual data-augmentation. Each comes with math, pseudocode, an explicit “ask” of what new infrastructure is needed, honest tradeoffs, and a falsifiable hypothesis.

1. TL;DR — the seven approaches at a glance

Table (1):The seven follow-up approaches. Each row is a separate downstream-training recipe; the flow’s job changes from row to row.

ApproachFlow’s jobWhen it runsPredictor seesFairness mechanism
A. Input whiteningTXT_X Gaussianises featuresOfflineTX(X)T_X(X)indirect (better conditioning)
B. Fair feature selectionPer-feature dependence scoreOfflineX:,SKX_{:, S_K} (top-KK independent)hard subset selection
C. Subspace projectionJoint TXT_X + q-orthogonal projection PPOfflinePTX(X)P^\top T_X(X)hard linear projection
D. Conditional flow TXqT_{X\|q}Gaussianises XX given qqOfflineTXq(X,q)T_{X\|q}(X, q)structural (ZqZ \perp q by construction)
E. Counterfactual augmentationGenerates X~\tilde X with qq flippedOffline (data prep)XX~X \cup \tilde Xdata-level + consistency loss
F. Density-ratio reweightingEstimates p(Xq)p(X\|q)Offline (weight prep)XX with weights wiw_iclassical importance weighting
G. Representation bottleneck (stretch)TRT_R on encoder outputTraining-time (with refresh)encoded RR then headsoft penalty on intermediate representation

All seven leave the predictor’s training-loop architecture unchanged (or nearly so) — the work happens before training, and the predictor sees a tweaked feature space or a weighted task loss. Compare to the original experiment, where the fairness logic was inside the optimisation loop. The follow-ups are easier to compose, easier to debug, and don’t carry the moving-target risk.

Notation throughout: XRn×dX \in \mathbb{R}^{n \times d} inputs, yRny \in \mathbb{R}^n targets, qRn×dqq \in \mathbb{R}^{n \times d_q} sensitive attribute, fθf_\theta the trainable predictor.


2. Approach A — Input whitening

The simplest and most boring of the six. Frozen Gaussianization flow as preprocessor — a direct descendant of RBIG-style whitening Laparra et al., 2011; no fairness machinery added on top.

2.1 Math

Pretrain a joint Gaussianization flow

TX:RdRd,TX(X)N(0,Id) marginally and jointly.T_X : \mathbb{R}^d \to \mathbb{R}^d, \qquad T_X(X) \approx \mathcal{N}(0, I_d) \text{ marginally and jointly.}

Freeze. The predictor sees Gaussianised inputs:

fθ(TX(X))y.f_\theta\bigl(T_X(X)\bigr) \approx y.

This is information-preserving (TXT_X is a diffeomorphism, so TX(X)T_X(X) carries the same information as XX), but the predictor operates on features with controlled marginals and a closer-to-isotropic joint.

2.2 Pipeline

2.3 Pseudocode

from gaussianization.fair import fit_and_freeze

# Stage 1: pretrain once
T_X, _ = fit_and_freeze(
    X_train, num_blocks=8, num_components=12, epochs=200, seed=0,
)

# Stage 2: standard Keras training on Gaussianised inputs
X_train_whitened = T_X(X_train)
X_test_whitened  = T_X(X_test)

mlp = keras.Sequential([
    keras.Input(shape=(d,)),
    keras.layers.Dense(32, "relu"),
    keras.layers.Dense(1),
])
mlp.compile(optimizer="adam", loss="mse")
mlp.fit(X_train_whitened, y_train, ...)

Or, if you want TXT_X inside the predictor graph (so saliency explanations live in original-X space), wrap it as a frozen layer:

inputs = keras.Input(shape=(d,))
whitened = GaussianizationLayer(T_X)(inputs)   # NEW: thin wrapper
out = keras.layers.Dense(32, "relu")(whitened)
out = keras.layers.Dense(1)(out)
mlp = keras.Model(inputs, out)

2.4 Asks (new infrastructure)

ItemEffortNotes
gauss_keras.GaussianizationLayer(flow, trainable=False)S (one wrapper)A keras.layers.Layer that forwards through the flow and refuses gradient updates on its params.
Notebook 08_input_whitening_baseline.ipynbMAdult + synthetic; ablation of whitening on/off.

No new losses. The existing fit_and_freeze handles the offline step.

2.5 Tradeoffs

Plus

Minus

2.6 Hypothesis


3. Approach B — Fair feature selection

Use per-feature Gaussianised dependence to rank features by their “q-leakage” and select the most-independent subset — a non-linear generalisation of the linear-CKA filter Cortes et al., 2012. The flow’s job is to compute a score per feature, not to enter the predictor’s forward pass.

3.1 Math

For each feature dimension i{1,,d}i \in \{1, \ldots, d\}:

ρiG  =  Corr^(TXi(Xi),Tq(q))    [1,1],\rho_i^{\text{G}} \;=\; \widehat{\text{Corr}}\bigl(T_{X_i}(X_i), T_q(q)\bigr) \;\in\; [-1, 1],

where TXiT_{X_i} is a per-feature 1-D Gaussianization flow (or, more efficiently, the ii-th marginal of a joint TXT_X). Rank features by ρiG|\rho_i^{\text{G}}| ascending; select the top-KK smallest:

SK  =  arg minS=KiSρiG.S_K \;=\; \operatorname*{arg\,min}_{|S| = K} \sum_{i \in S} |\rho_i^{\text{G}}|.

Train the predictor on X:,SKX_{:, S_K}.

Soft variant. Replace the hard top-KK with a learnable sigmoid mask m[0,1]dm \in [0, 1]^d:

L(θ,m)  =  Ltask(fθ(mX),y)  +  λimiρiG  +  γm1,\mathcal{L}(\theta, m) \;=\; \mathcal{L}_{\text{task}}\bigl(f_\theta(m \odot X), y\bigr) \;+\; \lambda \sum_i m_i\, |\rho_i^{\text{G}}| \;+\; \gamma \|m\|_1,

with ρiG\rho_i^{\text{G}} pre-computed (frozen). This is end-to-end differentiable in mm and θ, with ρG\rho^{\text{G}} as a fixed weight vector.

Higher-order variant. Replace ρiG\rho_i^{\text{G}} with G-MI or G-TC per feature — picks up non-monotone dependence that the Pearson-corr analog (|cor(X_i, q)|) misses.

3.2 Pipeline

3.3 Pseudocode

Hard selection:

from gaussianization.fair import (
    fit_and_freeze,
    score_features_g,      # NEW
)

# Stage 1: per-feature flows + dependence scores
flows = [fit_and_freeze(X_train[:, i:i+1], ...)[0] for i in range(d)]
T_q, _ = fit_and_freeze(q_train.reshape(-1, 1), ...)
rho_g = score_features_g(X_train, q_train, flows, T_q, metric="g_xcov")
# rho_g : np.ndarray of shape (d,), values in [0, 1]

# Stage 2: select top-K least dependent features
S_K = np.argsort(np.abs(rho_g))[:K]

# Stage 3: standard training on the selected subset
mlp = build_mlp(input_dim=K)
mlp.fit(X_train[:, S_K], y_train, ...)

Soft selection:

# rho_g pre-computed as above; freeze it as a non-trainable constant
class FeatureMaskedMLP(keras.Model):
    def __init__(self, d, rho_g, lam=1.0, gamma=0.01):
        super().__init__()
        # Trainable logits for the mask; sigmoid pushes to [0, 1]
        self.mask_logits = self.add_weight(shape=(d,), initializer="zeros")
        self.rho_g = ops.convert_to_tensor(rho_g, dtype="float32")
        self.mlp = build_mlp(d)
        self.lam, self.gamma = lam, gamma

    def call(self, x, training=False):
        m = ops.sigmoid(self.mask_logits)
        if training:
            self.add_loss(self.lam * ops.sum(m * ops.abs(self.rho_g)))
            self.add_loss(self.gamma * ops.sum(m))   # sparsity
        return self.mlp(x * m)

3.4 Asks (new infrastructure)

ItemEffortNotes
gaussianization.fair.score_features_g(X, q, flows, T_q, metric)SVectorised per-feature scoring; returns shape (d,).
gaussianization.fair.fit_marginals(X, ...)SConvenience: fit one 1-D flow per feature in parallel.
Notebook 08_fair_feature_selection.ipynbLBar chart of `

No new losses; the soft-variant uses standard add_loss plumbing.

3.5 Tradeoffs

Plus

Minus

3.6 Hypothesis


4. Approach C — Gaussianised subspace projection

Generalise B from “drop features” to “project out the q-direction in Gaussianised space” — a Gaussianisation analogue of fair PCA Olfat & Aswani, 2019. Same flow, more powerful selection.

4.1 Math

Pretrain a joint flow TX:XZRdT_X : X \to Z \in \mathbb{R}^d with ZN(0,I)Z \sim \mathcal{N}(0, I). The flow’s rotation layers have implicitly chosen a basis for the Gaussianised latent space. In that basis, the “q-direction” is the cross-covariance:

uq  =  Cov^(Z,Tq(q))    Rd×dq.u_q \;=\; \widehat{\mathrm{Cov}}(Z, T_q(q)) \;\in\; \mathbb{R}^{d \times d_q}.

Hard projection (single sensitive attribute, dq=1d_q = 1). The unit-length q-direction in ZZ-space is u^q=uq/uq2\hat u_q = u_q / \|u_q\|_2. The orthogonal projection is

P  =  Idu^qu^q,Z=ZP.P \;=\; I_d - \hat u_q \hat u_q^\top, \qquad Z' = Z P.

By construction Cov^(Z,Tq(q))=0\widehat{\mathrm{Cov}}(Z', T_q(q)) = 0 — the linear component of dependence is zero. Because Z,Tq(q)Z, T_q(q) are marginally Gaussian, this is most of the dependence.

Hard projection (multi-class, dq>1d_q > 1). SVD of uqu_q, project onto the orthogonal complement of its kk largest singular vectors. Strips the top-kk q-correlated directions.

Soft projection (learnable basis). Parameterise a linear map PRd×kP \in \mathbb{R}^{d \times k} with orthogonality constraint, train end-to-end with task loss and a G-XCOV penalty on PZP^\top Z:

minθ,P:PP=IkLtask(fθ(PZ),y)+μLG-XCOV(PZ,Tq(q)).\min_{\theta, P : P^\top P = I_k} \mathcal{L}_{\text{task}}\bigl(f_\theta(P^\top Z), y\bigr) + \mu\, \mathcal{L}_{\text{G-XCOV}}\bigl(P^\top Z, T_q(q)\bigr).

The orthogonality constraint can be enforced via Stiefel-manifold optimisation or a soft penalty PPIkF2\|P^\top P - I_k\|_F^2.

4.2 Pipeline

4.3 Pseudocode

Hard projection:

from gaussianization.fair import fit_and_freeze, q_orthogonal_projection

T_X, _ = fit_and_freeze(X_train, num_blocks=8, ...)        # joint flow
T_q, _ = fit_and_freeze(q_train.reshape(-1, 1), ...)

Z_train = np.asarray(T_X(X_train))
Q_train = np.asarray(T_q(q_train.reshape(-1, 1)))
P = q_orthogonal_projection(Z_train, Q_train)             # NEW: (d, d) matrix

# Predictor sees the projected representation
mlp = build_mlp(input_dim=d)
def features(X):
    return ops.matmul(T_X(X), P)
mlp.fit(features(X_train), y_train, ...)

Soft variant (Stiefel-soft):

GaussianizedXCovLoss would re-apply T_X to its z_pred input and also shape-mismatch when k != d, so we compute the cross-covariance penalty directly on the already-projected ZpZ_p against Tq(q)T_q(q):

class FairProjMLP(keras.Model):
    def __init__(self, d, k, T_X, T_q, mu=1.0, ortho_lam=10.0):
        super().__init__()
        self.T_X, self.T_q = T_X, T_q
        self.P = self.add_weight(
            shape=(d, k), initializer="orthogonal"
        )
        self.mlp = build_mlp(input_dim=k)
        self.mu, self.ortho_lam = mu, ortho_lam

    def _xcov_penalty(self, Zp, q):
        # ||Cov(Zp, T_q(q))||_F^2  / (||Cov(Zp)||_F · ||Cov(T_q(q))||_F)
        qg = self.T_q(q)
        Zp_c = Zp - ops.mean(Zp, axis=0, keepdims=True)
        qg_c = qg - ops.mean(qg, axis=0, keepdims=True)
        n = ops.cast(ops.shape(Zp_c)[0], Zp_c.dtype)
        denom = ops.maximum(n - 1.0, 1.0)
        C    = ops.matmul(ops.transpose(Zp_c), qg_c) / denom
        S_z  = ops.matmul(ops.transpose(Zp_c), Zp_c) / denom
        S_q  = ops.matmul(ops.transpose(qg_c), qg_c) / denom
        fz = ops.sqrt(ops.sum(S_z * S_z)); fq = ops.sqrt(ops.sum(S_q * S_q))
        return ops.sum(C * C) / (fz * fq + 1e-12)

    def call(self, inputs, training=False):
        x, q = inputs["x"], inputs["q"]
        Z = self.T_X(x)
        Zp = ops.matmul(Z, self.P)
        if training:
            # Fairness penalty on the projected latent (k-dim, not d-dim)
            self.add_loss(self.mu * self._xcov_penalty(Zp, q))
            # Orthogonality constraint as soft penalty
            PtP = ops.matmul(ops.transpose(self.P), self.P)
            self.add_loss(self.ortho_lam *
                          ops.sum((PtP - ops.eye(self.P.shape[1])) ** 2))
        return self.mlp(Zp)

(For the eventual implementation, factor out _xcov_penalty as a free function — it’s the same linear-CKA computation used in GaussianizedXCovLoss, just without the leading TzT_z pass.)

4.4 Asks

ItemEffortNotes
gaussianization.fair.q_orthogonal_projection(Z, Q, rank=1)SSVD-based; handles multi-dim qq.
gaussianization.fair.FairProjModel (soft variant)MComposes GaussianizationLayer (Approach A) + Stiefel-soft trainable PP.
Notebook 09_subspace_projection.ipynbLHard vs soft on Adult; orthogonality monitoring.

4.5 Tradeoffs

Plus

Minus

4.6 Hypothesis


5. Approach D — Conditional flow TXqT_{X \mid q}

The most ambitious — and close in spirit to the conditional normalising flows of Winkler et al. (2019) and the invariant-representation objective of Moyer et al. (2018). A flow whose parameters depend on qq, Gaussianising XX given qq. The residual is structurally independent of qq.

5.1 Math

Train a conditional Gaussianization flow

TXq:Rd×RdqRdT_{X \mid q}: \mathbb{R}^d \times \mathbb{R}^{d_q} \to \mathbb{R}^d

such that for every value of qq,

TXq(X,q)    N(0,Id)when Xp(Xq).T_{X \mid q}(X, q) \;\sim\; \mathcal{N}(0, I_d) \quad \text{when } X \sim p(X \mid q).

Freeze. Train a predictor on the residual Z=TXq(X,q)Z = T_{X \mid q}(X, q):

fθ(Z)ywithZq by construction.f_\theta(Z) \approx y \qquad \text{with} \qquad Z \perp q \text{ by construction.}

Mechanism. Coupling-layer Gaussianization with FiLM conditioning: each coupling layer’s conditioner MLP takes both the active half of XX and the conditioning qq, and produces shift/scale parameters that depend on both. The marginal Gaussianization layers’ mixture-CDF parameters also become qq-dependent (a small MLP from qq to mixture parameters).

Why it’s “by construction”. Z=TXq(X,q)Z = T_{X \mid q}(X, q) has the same distribution N(0,I)\mathcal{N}(0, I) for every value of qq — that’s the training objective. So p(Zq)=p(Z)p(Z \mid q) = p(Z), i.e. ZqZ \perp q.

5.2 Pipeline

5.3 Pseudocode

from gaussianization.gauss_keras.conditional import (
    ConditionalGaussianizationFlow,     # NEW class
)
from gaussianization.fair import freeze_flow

# Stage 1: pretrain conditional flow
T_xq = ConditionalGaussianizationFlow(
    input_dim=d,
    cond_dim=d_q,
    num_blocks=8,
    num_components=12,
)
T_xq.compile(optimizer=keras.optimizers.Adam(1e-3), loss=base_nll_loss)
T_xq.fit(
    [X_train, q_train],   # input is a (data, condition) pair
    X_train,              # NLL target
    epochs=200,
    batch_size=256,
)
freeze_flow(T_xq)

# Stage 2: standard predictor on the residual
def residual(X, q):
    return T_xq([X, q])

mlp = build_mlp(input_dim=d)
mlp.compile(optimizer="adam", loss="mse")
mlp.fit(residual(X_train, q_train), y_train, ...)

5.4 Asks

ItemEffortNotes
gauss_keras.bijectors.MixtureCDFGaussianization accepts a condition inputMFiLM-style conditioning on mixture params via a small head MLP.
gauss_keras.bijectors.MixtureCDFCoupling already takes a conditioner — extend its conditioner to accept q alongside the active half.SOne arg change.
gauss_keras.flows.ConditionalGaussianizationFlowMThreads q through every layer; subclasses or wraps existing GaussianizationFlow.
fair.fit_and_freeze_conditional(X, q, ...)SConvenience helper.
Notebook 10_conditional_flow.ipynbLComparison against A–C on Adult; sanity check ZqZ \perp q after freezing.

This is the most invasive change: it touches the core gauss_keras library, not just fair/. Worth doing because conditional flows are broadly useful (density estimation conditional on covariates).

5.5 Tradeoffs

Plus

Minus

5.6 Hypothesis


6. Approach E — Counterfactual sample augmentation

Use the (conditional) flow’s inverse pass to generate counterfactual X~\tilde X with qq flipped, then train a predictor to make the same decision on both. Targets individual counterfactual fairness in the sense of Kusner et al. (2017), not just population-level statistics.

6.1 Math

For each training example (Xi,qi,yi)(X_i, q_i, y_i), define the counterfactual

X~i  =  TX1qi1(TXqi(Xi,qi),1qi),\tilde X_i \;=\; T_{X \mid 1 - q_i}^{-1}\bigl(T_{X \mid q_i}(X_i, q_i),\, 1 - q_i\bigr),

i.e. Gaussianise XiX_i given qiq_i, then invert the Gaussianisation under the opposite qq-value. The result has the same “position in the Gaussianised latent” but the marginal of the opposite group.

(For continuous or multi-class qq, swap 1qi1 - q_i for a chosen reference value or sample of values.)

Augmented dataset: D={(Xi,qi,yi),(X~i,1qi,yi)}D' = \{(X_i, q_i, y_i), (\tilde X_i, 1 - q_i, y_i)\}. Train with a consistency loss:

L(θ)  =  Ltask(fθ(X),y)  +  λEifθ(Xi)fθ(X~i)2.\mathcal{L}(\theta) \;=\; \mathcal{L}_{\text{task}}(f_\theta(X), y) \;+\; \lambda \, \mathbb{E}_i \bigl\|f_\theta(X_i) - f_\theta(\tilde X_i)\bigr\|^2.

The consistency term explicitly says: an individual’s prediction must not change if you flip their sensitive attribute (Kusner et al. 2017, “Counterfactual Fairness”).

6.2 Pipeline

6.3 Pseudocode

from gaussianization.fair import (
    generate_counterfactuals,      # NEW
    CounterfactualConsistencyLoss, # NEW
)

# Stage 1: pretrain a conditional flow (Approach D's machinery)
T_xq, _ = fit_and_freeze_conditional(X_train, q_train, ...)

# Stage 2: generate counterfactuals for the training set
X_tilde = generate_counterfactuals(T_xq, X_train, q_train)
# X_tilde[i] is the counterfactual of X_train[i] with q flipped

# Stage 3: train predictor with consistency loss
class FairCFMLP(keras.Model):
    def __init__(self, d, lam=1.0):
        super().__init__()
        self.mlp = build_mlp(d)
        self.lam = lam

    def call(self, inputs, training=False):
        x, x_tilde = inputs["x"], inputs["x_tilde"]
        y_hat = self.mlp(x)
        if training:
            y_hat_tilde = self.mlp(x_tilde)
            self.add_loss(self.lam * ops.mean((y_hat - y_hat_tilde) ** 2))
        return y_hat

model = FairCFMLP(d, lam=1.0)
model.compile(optimizer="adam", loss="mse")
model.fit({"x": X_train, "x_tilde": X_tilde}, y_train, ...)

6.4 Asks

ItemEffortNotes
Approach D’s ConditionalGaussianizationFlow and its inverseLBig — see Approach D.
generate_counterfactuals(flow, X, q)SOne forward + one inverse pass per batch; precomputable.
CounterfactualConsistencyLoss (or use plain add_loss as above)SJust an MSE between two predictor calls.
Notebook 11_counterfactual_fairness.ipynbLVisualise counterfactual quality (a few Xi,X~iX_i, \tilde X_i pairs); individual fairness metrics.

Building blocks: needs Approach D first.

6.5 Tradeoffs

Plus

Minus

6.6 Hypothesis


7. Approach F — Density-ratio reweighting

Use the (conditional) flow’s log-density to estimate per-sample weights that rebalance the training set. Classical importance weighting in the style of Calders & Verwer (2010), with flow-based densities replacing the usual kernel-density or logistic-regression-style propensity estimates.

7.1 Math

Two equivalent formulations:

Group-balanced reweighting. Estimate the group-conditional density ratio:

wi  =  p(Xiq=0)p(Xiq=qi)Ew[Ltask(fθ(X),y)q=qi]  =  E[Ltaskq=0]qi.w_i \;=\; \frac{p(X_i \mid q = 0)}{p(X_i \mid q = q_i)} \quad\Longrightarrow\quad \mathbb{E}_{w}\bigl[L_{\text{task}}(f_\theta(X), y) \mid q = q_i\bigr] \;=\; \mathbb{E}\bigl[L_{\text{task}} \mid q = 0\bigr] \quad \forall q_i.

That is, the weighted task loss is the same across groups — closing the population-level disparity without a fairness penalty.

Inverse-propensity reweighting. Use the flow to estimate p(qX)p(q \mid X) via Bayes, then wi=1/p(qiXi)w_i = 1 / p(q_i \mid X_i).

Both forms come from the same set of densities; the choice is just which factorisation is more numerically stable.

The flow estimates p(Xq)p(X \mid q) directly via logp(Xq)=logN(TXq(X,q);0,I)+logdetTXq\log p(X \mid q) = \log \mathcal{N}(T_{X \mid q}(X, q); 0, I) + \log |\det \nabla T_{X \mid q}|.

7.2 Pipeline

7.3 Pseudocode

from gaussianization.fair import (
    fit_and_freeze_conditional,
    density_ratio_weights,         # NEW
)

# Stage 1: conditional flow gives log p(X | q)
T_xq, _ = fit_and_freeze_conditional(X_train, q_train, ...)

# Stage 2: per-sample weights w_i = p(X_i | q=0) / p(X_i | q=q_i)
w_train = density_ratio_weights(
    T_xq, X_train, q_train, target_q=0, clip=10.0,
)
# Returns shape (n,), positive, normalised to mean 1.

# Stage 3: standard weighted training
mlp = build_mlp(d)
mlp.compile(optimizer="adam", loss="mse")
mlp.fit(X_train, y_train, sample_weight=w_train, ...)

7.4 Asks

ItemEffortNotes
Conditional flow log-density (Approach D’s machinery)LNeeded first.
density_ratio_weights(flow, X, q, target_q, clip)SOne-liner once log-density is available; clipping handles tail.
Notebook 12_density_ratio_reweighting.ipynbMPareto: G-XCOV penalty vs IPW weighting.

7.5 Tradeoffs

Plus

Minus

7.6 Hypothesis


8. Approach G — Information-bottleneck on representations (stretch)

A seventh idea worth recording, even if it’s the most speculative: apply the fairness penalty to an intermediate layer of the predictor, not to its output.

8.1 Sketch

Most fair-representation literature does exactly this — see e.g. the VFAE of Louizos et al. (2016). An encoder eϕ:XRe_\phi : X \to R, a head hψ:Ryh_\psi : R \to y, and a penalty μDep(R,q)\mu \cdot \text{Dep}(R, q) on the bottleneck representation. The encoder learns a representation that is task-useful but qq-uninformative.

Drop in any of our existing losses on RR and Tq(q)T_q(q):

minϕ,ψ Ltask(hψ(eϕ(X)),y)  +  μLG-MI(eϕ(X),q).\min_{\phi, \psi}\ \mathcal{L}_{\text{task}}(h_\psi(e_\phi(X)), y) \;+\; \mu \, \mathcal{L}_{\text{G-MI}}\bigl(e_\phi(X),\, q\bigr).

For G-MI we need a flow on RR — but RR is high-dim and moves during training, same problem as the original output-side experiment. Two workarounds:

  1. Periodic refresh: refit TRT_R on {eϕ(X)}\{e_\phi(X)\} every NN epochs. Costs a few extra minutes; gives a fresh dependence probe.
  2. VAE-style structural fix: make RR Gaussian by construction (e.g. with a KL-to-N(0,I)\mathcal{N}(0, I) regulariser on eϕe_\phi, like a VAE encoder). Now TR=idT_R = \text{id} and G-XCOV reduces to plain linear cross-covariance on the bottleneck — cheap and exact.

8.2 Tradeoffs (briefly)

Plus: composes with any downstream architecture; lets a small classifier head sit on top of a strongly-fair representation; the representation itself can be reused for multiple downstream tasks.

Minus: moving target on TRT_R (same risk as the original experiment); requires architectural surgery on the predictor.

8.3 Hypothesis


9. Cross-cutting comparison matrix

A (whiten)B (select)C (project)D (cond. flow)E (CF aug.)F (IPW)G (bottleneck)
Flow on inputs?✅ joint✅ marginals✅ joint✅ conditional✅ conditional✅ conditional❌ representations
Flow frozen?⚠️ refresh
Predictor sees fairness penalty during training?❌ (soft: ✅)❌ (soft: ✅)✅ (consistency)
Pretraining costlowmedium (d marginals)mediumhighhighhighmedium
Information lossnonehardrank-kktotal q-conditionednoneimportance-weightedtask-driven
Granularity of fairnessn/apopulationpopulationstructuralindividualpopulationrepresentation
Needs new core (gauss_keras) infra?thin layern/an/ayesyesyesmaybe
Composes with G-XCOV / G-MI?redundant
Effort estimate (S/M/L)SMMLLML

10. Recommended sequencing

Round 1 alone is plausibly the strongest paper of the set — A+B+C give three preprocessing-only fairness baselines that the original output-side losses can be benchmarked against. If Round 1 is sufficient in practice, Rounds 2–4 become “did you really need to add a fairness loss at all?” — a sharp result either way.


11. New library additions, summarised

If we eventually ship Rounds 1–3, the public API of gaussianization.fair grows by:

from gaussianization.fair import (
    # existing
    GaussianizedXCovLoss,
    GaussianizedMutualInfoLoss,
    GaussianizedTotalCorrelationLoss,
    fit_and_freeze,
    fit_and_freeze_joint,
    freeze_flow,
    is_fully_frozen,
    demographic_parity_difference,
    equalized_odds_difference,
    pearson_corr,

    # Round 1
    score_features_g,                  # B
    q_orthogonal_projection,           # C
    fit_marginals,                     # B convenience

    # Round 2 (infrastructure)
    fit_and_freeze_conditional,        # D

    # Round 3
    generate_counterfactuals,          # E
    density_ratio_weights,             # F
)

And gauss_keras grows:

from gaussianization.gauss_keras import (
    # existing
    GaussianizationFlow,
    make_gaussianization_flow,
    make_coupling_flow,
    ...

    # Round 1
    GaussianizationLayer,              # A: frozen-flow wrapper as keras.Layer

    # Round 2
    ConditionalGaussianizationFlow,    # D
)

Six new public symbols in gaussianization.fair (score_features_g, q_orthogonal_projection, fit_marginals, fit_and_freeze_conditional, generate_counterfactuals, density_ratio_weights), two new classes in gauss_keras (GaussianizationLayer — a thin frozen-flow wrapper for Approach A; ConditionalGaussianizationFlow — the conditional flow for D/E/F), and one extended bijector (MixtureCDFGaussianization gains an optional condition input). None of the existing API breaks.


12. Open questions

  1. Joint flow vs dd marginal flows for Approach B. Per-feature flows are independent and parallelisable, but they ignore cross-feature structure. A joint flow scores features via its marginal projections but is harder to fit. Worth a small ablation in the B notebook.

  2. Pre-vs-post Gaussianisation for qq. All approaches assume qq has its own Gaussianisation flow TqT_q. For binary qq this is overkill — TqT_q is essentially a sign-flip + scale. Is there an ablation showing TqT_q matters? Or can we use raw qq for the sensitive side when dq=1d_q = 1?

  3. Continuous qq semantics. For age-as-sensitive-attribute (a continuous variable), what does “DP-diff” even mean? Approaches D and E need to specify a counterfactual policy: do we flip a 25-year-old to a 45-year-old, or to the average, or to the marginal distribution?

  4. Flow capacity vs predictor capacity. All approaches assume the flow has “enough” capacity to faithfully model p(X)p(X), p(Xq)p(X \mid q), etc. An under-capacity flow gives bad scores / bad projections / bad counterfactuals — and the failure mode is silent (the predictor just inherits the flow’s blind spots). Possible mitigation: diagnostic that monitors per-feature TXT_X log-likelihood on held-out data.

  5. Combining approaches. Nothing prevents stacking — e.g. whiten inputs (A), select features (B), project out residual q-direction (C), and then add a small G-XCOV penalty for defence in depth. Does that compound, or does each subsequent step add nothing?


References
  1. Laparra, V., Camps-Valls, G., & Malo, J. (2011). Iterative Gaussianization: From ICA to Random Rotations. IEEE Transactions on Neural Networks, 22(4), 537–549. 10.1109/TNN.2011.2106511
  2. Cortes, C., Mohri, M., & Rostamizadeh, A. (2012). Algorithms for Learning Kernels Based on Centered Alignment. Journal of Machine Learning Research, 13, 795–828.
  3. Olfat, M., & Aswani, A. (2019). Convex Formulations for Fair Principal Component Analysis. AAAI Conference on Artificial Intelligence. 10.1609/aaai.v33i01.3301663
  4. Winkler, C., Worrall, D. E., Hoogeboom, E., & Welling, M. (2019). Learning Likelihoods with Conditional Normalizing Flows. arXiv Preprint arXiv:1912.00042.
  5. Moyer, D., Gao, S., Brekelmans, R., Galstyan, A., & Ver Steeg, G. (2018). Invariant Representations without Adversarial Training. Advances in Neural Information Processing Systems (NeurIPS).
  6. Kusner, M. J., Loftus, J. R., Russell, C., & Silva, R. (2017). Counterfactual Fairness. Advances in Neural Information Processing Systems (NeurIPS).
  7. Calders, T., & Verwer, S. (2010). Three Naive Bayes Approaches for Discrimination-Free Classification. Data Mining and Knowledge Discovery, 21(2), 277–292. 10.1007/s10618-010-0190-x
  8. Louizos, C., Swersky, K., Li, Y., Welling, M., & Zemel, R. (2016). The Variational Fair Autoencoder. International Conference on Learning Representations (ICLR).