Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Fair learning with frozen Gaussianization flows

Overview, reading order, and status

This sub-project replaces the CKA fairness penalty in keras-fairkl with a family of penalties built from a frozen Gaussianization flow. The flow is trained once, frozen, and reused as a differentiable Gaussian-space probe inside any downstream predictor’s optimisation loop.

The one-paragraph pitch

A Gaussianization flow T:RdRdT : \mathbb{R}^d \to \mathbb{R}^d turns arbitrary marginals into approximately standard normals while preserving all statistical dependence (T(Z)T(Q)    ZQT(Z) \perp T(Q) \iff Z \perp Q). Once trained and frozen, TT acts as a fixed, scale-normalising, differentiable preprocessor — it absorbs the kernel/bandwidth choices of CKA and HSIC into its mixture-CDF parameters, and turns “measure non-linear dependence between zz and qq” into “measure linear dependence between near-Gaussian variables.” Three concrete penalties exploit this: G-XCOV, G-MI, and G-TC.

Reading order

PageWhat it is for
1Fair learning with frozen Gaussianization flows — design docDesign doc. Mental model, math, hypotheses, experiment plan, risks, milestones. Read first.
2Pretrain & freeze a Gaussianization flowNotebook 05. Pretrain + freeze a flow on a 2-D dataset; four diagnostics that prove the flow Gaussianises, freezes, and inverts.
3Fair MLP regression with a frozen Gaussianization flowNotebook 06. Fair MLP regression on synthetic data; Pareto curve of (RMSE,corr(y^,q))(\text{RMSE}, |\mathrm{corr}(\hat y, q)|) across G-XCOV, G-MI, G-TC, and CKA.
4UCI Adult Census — real-data fair classificationNotebook 07. Same setup on UCI Adult Census; Pareto curves on AUC vs. DP-diff and EO-diff.
5Fair Gaussianization — input-side follow-up experimentsFollow-up doc. Seven input-side alternatives that move the flow from the predictor’s output to its input / representation / data pipeline.

The three penalties at a glance

Table (2):Output-side fairness penalties built on a frozen Gaussianization flow.

LossCapturesClosed form?Joint flow needed?
G-XCOV2nd-moment dependence in Gaussianised space (linear CKA)yesno — two marginal flows
G-MIMI assuming joint-Gaussian after Gaussianisationyesno — two marginal flows
G-TCFull MI / total correlation, no joint-Gaussian assumptionno — via flow NLLyes — one joint flow over (z,q)(z, q)

All three are differentiable in the predictor’s parameters and plug into FairModelWrapper via its fairness_loss=... argument. See §4 of the design doc for the math, and §4.4 — the comparison table for the property comparison.

Status

MilestoneAcceptance
Skeleton: fair/{losses,freeze,pretrain,metrics}.py + testspytest tests/test_fair.py green
Notebook 05: pretrain + freeze + 4 diagnosticsExecuted and committed
Notebook 06: synthetic Pareto with G-XCOV vs CKAPareto curve from RMSE 0.11 → 1.35
Notebook 07: Adult Pareto with G-XCOV vs CKAPareto traced
🟡G-MI + G-TC losses + testsIn flight
🟡Notebooks 06/07 re-executed with G-MI and G-TC curvesPending
H3 quadratic-dependence experiment (08_quadratic_dependence.ipynb)Pending
Input-side follow-ups (see Fair Gaussianization — input-side follow-up experiments)Pending