Bayesian Models & Neural Networks Tutorial Master List
A reconciled, exhaustive curriculum for Bayesian linear models, parametric regression, parametric classification, and Bayesian neural networks . The progression mirrors the textbook arc: Bayesian linear regression → basis functions → random / spectral features → shallow MLPs → deep BNNs. Each block lays out its own likelihood / constraint zoo and its own inference zoo so the curriculum can be entered at any depth.
Companion lists:
Cross-listed items (RFF, deep kernels, last-layer-Bayes, BLR, Laplace, VI guides) are flagged 🔁.
Legend — Source columns:
G = exists in gaussx (docs/notebooks/<name>)P = exists in pyrox (docs/notebooks/<name>)R = exists in research_notebook (projects/<area>/notebooks/<path>)— = does not exist yet (gap)Scope tag :
🧱 fundamental — small, library-API demo (gaussx/pyrox docs) 🔬 research — applied / dataset-driven (research_notebook/projects/bayesian_nns) 🌉 bridge — useful in either; cross-link 🔁 cross-listed — also in GP or neural-fields master list Refs column : gh:<repo>#N = open GitHub issue (e.g., gh:pyrox#71) · dd:path = pyrox design_docs/pyrox/<path> · mc# = numbered model from examples/nn/regression_masterclass_eqx.md · xref:GP#X.Y = pointer into GP master list.
Curriculum at a glance ¶ Part A — Bayesian Regression A.A — Bayesian linear regression A.B — Fixed feature maps A.C — Random Fourier features A.D — Spectral basis layers (HSGP) A.E — Bridges to GP A.F — Likelihood zoo (regression heads) A.G — Constrained & physics-informed losses A.H — 9-model regression masterclass (pick-apart) A.I — NN MAP baseline & library patterns Part B — Bayesian Classification B.A — Bayesian logistic regression B.B — Multinomial / softmax B.C — Inference variants for classification B.D — Feature-based & last-layer classifiers B.E — Calibration for Bayesian classifiers Part C — NN ↔ GP Bridges C.A — Theory (NNGP, NTK, ArcCosine) C.B — Deep kernels C.C — Functional priors C.D — Pathwise BNN sampling C.E — Shared infrastructure Part D — Bayesian Inference for Neural Networks D.I — Point estimates: MLE / MAP / regularisation-as-prior D.II — Gaussian (Laplace) approximations D.III — Variational inference D.IV — Sampling-based inference D.V — Last-layer & functional posteriors D.VI — Stochastic / implicit BayesD.VII — Tempering, prior choice, diagnostics Part E — Ensembles Part F — Calibration, OOD, Active Learning Part G — Bayesian Neural Fields Part H — Applied Case Studies (research_notebook/projects/bayesian_nns) Part A — Bayesian Regression ¶ A.A — Bayesian linear regression ¶ Key equations / models:
BLR posterior: Σ = ( Φ ⊤ R − 1 Φ + S 0 − 1 ) − 1 \Sigma = (\Phi^\top R^{-1}\Phi + S_0^{-1})^{-1} Σ = ( Φ ⊤ R − 1 Φ + S 0 − 1 ) − 1 , μ = Σ Φ ⊤ R − 1 y \mu = \Sigma\,\Phi^\top R^{-1}y μ = Σ Φ ⊤ R − 1 y Precision (natural) form: Λ = Φ ⊤ R − 1 Φ + S 0 − 1 \Lambda = \Phi^\top R^{-1}\Phi + S_0^{-1} Λ = Φ ⊤ R − 1 Φ + S 0 − 1 , η = Φ ⊤ R − 1 y + S 0 − 1 μ 0 \eta = \Phi^\top R^{-1} y + S_0^{-1}\mu_0 η = Φ ⊤ R − 1 y + S 0 − 1 μ 0 Sequential / online update via Sherman–Morrison: rank-1 covariance update per new observation Predictive: p ( y ∗ ∣ x ∗ , D ) = N ( ϕ ( x ∗ ) ⊤ μ , ϕ ( x ∗ ) ⊤ Σ ϕ ( x ∗ ) + σ 2 ) p(y_*\mid x_*, \mathcal{D}) = \mathcal{N}(\phi(x_*)^\top\mu,\, \phi(x_*)^\top\Sigma\phi(x_*) + \sigma^2) p ( y ∗ ∣ x ∗ , D ) = N ( ϕ ( x ∗ ) ⊤ μ , ϕ ( x ∗ ) ⊤ Σ ϕ ( x ∗ ) + σ 2 ) # Tutorial Source Scope Refs / Notes A.1 Bayesian linear regression from scratch (mean-cov form) — 🧱 GAP — dd:mc#1 polynomial features + Vandermonde + MCMCA.2 BLR in precision / natural form G numpyro_precision 🧱 🔁 xref:GP#11.2 A.3 Sequential / online BLR updates — 🧱 🔁 GAP — api: blr_diag_update, blr_full_update; moved from GP#6.19 A.4 Polynomial basis regression with uncertainty — 🧱 GAP — dd:mc#1A.5 Empirical Bayes for BLR — type-II MLE for noise + prior scales — 🧱 GAP
A.B — Fixed feature maps ¶ Key equations / models:
Generic linear model: f ( x ) = ϕ ( x ) ⊤ w f(x) = \phi(x)^\top w f ( x ) = ϕ ( x ) ⊤ w with fixed ϕ : R d → R D \phi:\mathbb{R}^d\to\mathbb{R}^D ϕ : R d → R D Polynomial, Vandermonde, Chebyshev, Fourier, wavelet, Gaussian-bump bases # Tutorial Source Scope Refs / Notes A.6 Fixed feature maps: Fourier, polynomial, wavelet, Gaussian-bump — 🧱 GAP A.7 Spectral kernel models — visual guide P spectral_kernel_models 🧱 🔁 xref:GP#7.5
A.C — Random Fourier features ¶ Key equations / models:
Rahimi–Recht: ϕ ( x ) = 2 / D cos ( ω ⊤ x + b ) \phi(x) = \sqrt{2/D}\cos(\omega^\top x + b) ϕ ( x ) = 2/ D cos ( ω ⊤ x + b ) , ω ∼ S ( ω ) \omega\sim S(\omega) ω ∼ S ( ω ) , k ( x , x ′ ) ≈ ϕ ( x ) ⊤ ϕ ( x ′ ) k(x,x')\approx \phi(x)^\top\phi(x') k ( x , x ′ ) ≈ ϕ ( x ) ⊤ ϕ ( x ′ ) SSGP: BLR in RFF space, O ( D 2 N ) O(D^2 N) O ( D 2 N ) VSSGP: variational posterior over frequencies ω ORF: ω i \omega_i ω i on the sphere → variance reduction # Tutorial Source Scope Refs / Notes A.8 Random Fourier Features → SSGP → VSSGP P random_fourier_features 🧱 🔁 dd:mc#5, xref:GP#7.8 A.9 RFF as a (shallow) neural network — fixed / learned / ensemble P rff_as_neural_networks 🌉 🔁 A.10 SSGP — Sparse Spectrum GP via RFF + BLR, O ( D 2 N ) O(D^2 N) O ( D 2 N ) — 🧱 GAP — dd:examples/nn/models.mdA.11 Heteroscedastic RFF — dual-head (mean + log-noise) — 🧱 GAP — dd:features/nn/random_features.mdA.12 Approximate GP via RFF + hierarchical prior on signal variance — 🧱 GAP — dd:mc#6A.13 Variational Fourier Features (VSSGP) — learnable posterior over RFF freqs — 🧱 GAP — pyrox .plans/spectral-inducing-features.mdA.14 Orthogonal Random Features (ORF) — 🧱 GAP — pyrox .plans/spectral-inducing-features.md
A.D — Spectral basis layers (HSGP) ¶ Key equations / models:
HSGP (Solin–Särkkä): k ( x , x ′ ) ≈ ∑ j = 1 M S ( λ j ) ϕ j ( x ) ϕ j ( x ′ ) k(x,x')\approx \sum_{j=1}^M S(\sqrt{\lambda_j})\,\phi_j(x)\phi_j(x') k ( x , x ′ ) ≈ ∑ j = 1 M S ( λ j ) ϕ j ( x ) ϕ j ( x ′ ) , ( λ j , ϕ j ) (\lambda_j,\phi_j) ( λ j , ϕ j ) Laplacian eigenpairs Deterministic basis (vs random RFF) → diagonal K u u K_{uu} K uu , O ( N M + M 3 ) O(NM + M^3) O ( NM + M 3 ) # Tutorial Source Scope Refs / Notes A.15 HSGP — Hilbert-Space GP layer, deterministic Laplacian basis + spectral-density prior — 🧱 🔁 GAP — dd:features/nn/random_features.md; xref:GP#7.12
A.E — Bridges to GP ¶ Key equations / models:
Whitened SVGP-as-BLR view: inducing variables u = L m m u ~ u = L_{mm}\tilde u u = L mm u ~ → BLR on u ~ \tilde u u ~ Equivalence map: kernel ridge ↔ MAP-BLR ↔ exact GP posterior mean (with appropriate features) # Tutorial Source Scope Refs / Notes A.16 Whitened SVGP as Bayesian linear regression G whitened_svgp 🌉 🔁 xref:GP#5.7 A.17 Kernel ridge ↔ MAP-BLR ↔ exact GP mean — three views, one estimator — 🧱 GAP
A.F — Likelihood zoo (regression heads) ¶ Key equations / models:
Gaussian: y = f ( x ) + ϵ y = f(x) + \epsilon y = f ( x ) + ϵ , ϵ ∼ N ( 0 , σ 2 ) \epsilon\sim\mathcal{N}(0,\sigma^2) ϵ ∼ N ( 0 , σ 2 ) Student-t: heavy-tailed, ν controls robustness Laplace: ∣ y − f ∣ |y-f| ∣ y − f ∣ noise → L1 / Huber-like Heteroscedastic: σ ( x ) \sigma(x) σ ( x ) predicted alongside μ ( x ) \mu(x) μ ( x ) Mixture density: p ( y ∣ x ) = ∑ k π k ( x ) N ( y ; μ k ( x ) , σ k ( x ) ) p(y\mid x) = \sum_k \pi_k(x)\mathcal{N}(y; \mu_k(x), \sigma_k(x)) p ( y ∣ x ) = ∑ k π k ( x ) N ( y ; μ k ( x ) , σ k ( x )) Quantile / pinball: ρ τ ( u ) = u ( τ − 1 { u < 0 } ) \rho_\tau(u) = u(\tau - \mathbb{1}\{u<0\}) ρ τ ( u ) = u ( τ − 1 { u < 0 }) Censored / Tobit / survival: likelihood truncated / right-censored Log-Gaussian Cox Process: λ ( x ) = exp ( f ( x ) ) \lambda(x) = \exp(f(x)) λ ( x ) = exp ( f ( x )) , Poisson observations Warped GP / NF-warped: g ( y ) = f ( x ) g(y) = f(x) g ( y ) = f ( x ) , g g g monotone bijection (Box–Cox, NF) # Tutorial Source Scope Refs / Notes A.18 Gaussian, Student-t, Laplace likelihoods — robust regression — 🧱 GAP A.19 Heteroscedastic NLL — dual-head (mean + log-noise) — 🧱 GAP A.20 Mixture density networks (MDN) — 🌉 GAP A.21 Quantile / pinball / expectile regression — 🌉 GAP A.22 Censored / Tobit / survival likelihoods — 🔬 GAP A.23 Log-Gaussian Cox Process — spatial point-process intensity — 🔬 🔁 GAP — moved from GP#6.20 ; dd:examples/gp/moments.mdA.24 Warped regression (Box–Cox) — skewed targets — 🧱 🔁 GAP — moved from GP#6.21 A.25 Warped regression with normalizing-flow bijection — 🔬 🔁 GAP — moved from GP#6.24 ; xref to gaussianization list
Key equations / models:
Positivity / monotone / convex output via reparameterization (softplus, cumulative) Equality constraint via augmented Lagrangian: L ( θ , λ ) + ρ 2 ∥ c ( θ ) ∥ 2 + λ ⊤ c ( θ ) \mathcal{L}(\theta,\lambda) + \tfrac{\rho}{2}\|c(\theta)\|^2 + \lambda^\top c(\theta) L ( θ , λ ) + 2 ρ ∥ c ( θ ) ∥ 2 + λ ⊤ c ( θ ) PDE residual (PINN): L = L data + α ∥ N [ u θ ] ( x i ) ∥ 2 \mathcal{L} = \mathcal{L}_\text{data} + \alpha\|\mathcal{N}[u_\theta](x_i)\|^2 L = L data + α ∥ N [ u θ ] ( x i ) ∥ 2 on collocation points Boundary / initial penalty: extra weighted term on ∂ Ω \partial\Omega ∂ Ω Conservation / symmetry: divergence-free reparameterization, equivariant layers Smoothness / TV regularisation: ∥ ∇ u ∥ 2 \|\nabla u\|_2 ∥∇ u ∥ 2 , ∥ ∇ u ∥ 1 \|\nabla u\|_1 ∥∇ u ∥ 1 KL annealing / β-VAE: trade reconstruction vs KL # Tutorial Source Scope Refs / Notes A.26 Positivity, monotone, convex output constraints via reparameterization — 🧱 GAP A.27 Equality constraints via augmented Lagrangian — 🌉 GAP A.28 Bayesian PINN — PDE residual + data likelihood under prior — 🔬 GAP A.29 Boundary / initial-condition penalties — 🔬 GAP A.30 Conservation / symmetry penalties (divergence-free, equivariant) — 🔬 GAP A.31 Smoothness / TV regularisation as prior on outputs — 🌉 GAP A.32 KL annealing & β-tempered ELBO ablation — 🧱 GAP — see also D.VII
A.H — 9-model regression masterclass (pick-apart) ¶ The pyrox examples/nn/regression_masterclass_eqx.md (~927 lines) is a single monolithic notebook. We break it into nine standalone tutorials, each running on the same dataset so a learner can ablate one block at a time.
Key equations / models: see Models 1–9 below — each row hyperlinks to the corresponding row in A.A–A.C / D.
# Tutorial Source Scope Refs / Notes A.33 Model 1 — Bayesian linear regression with polynomial features (NUTS) — 🧱 dd:mc#1; pairs with A.1, A.4 A.34 Model 2 — Neural network MAP — single hidden MLP via SVI + AutoDelta — 🧱 dd:mc#2 (deterministic baseline); pairs with D.1 A.35 Model 3 — MC-Dropout NN — dropout as approximate Bayes — 🧱 dd:mc#3; pairs with D.16 A.36 Model 4 — Bayesian NN via HMC/NUTS — small-scale weight-space inference — 🧱 dd:mc#4; pairs with D.13 A.37 Model 5 — SVR via Random Fourier Features (BLR on ϕ ( x ) \phi(x) ϕ ( x ) ) — 🧱 dd:mc#5; pairs with A.8, A.10 A.38 Model 6 — Approximate GP via RFF + hierarchical prior on signal variance — 🧱 dd:mc#6; pairs with A.12 A.39 Model 7 — Deep GP via stacked RFF layers (Cutajar 2017) — 🧱 🔁 dd:mc#7; pairs with C.5 A.40 Model 8 — Last-layer Bayes (BLR on penultimate features) — 🧱 NEW — dd:mc extension; pairs with D.7A.41 Model 9 — Mean-field VI BNN — full weight-space VI on the same MLP — 🧱 NEW — dd:mc extension; pairs with D.10A.42 Capstone — calibration & predictive-distribution shootout across Models 1–9 — 🔬 GAP — leads into Part F
A.I — NN MAP baseline & library patterns ¶ # Tutorial Source Scope Refs / Notes A.43 NN MAP baseline — single hidden MLP via SVI + AutoDelta (standalone) — 🧱 GAP — dd:mc#2A.44 Three-pattern regression masterclass — tree_at / pyrox_sample / Parameterized P regression_masterclass_treeat, _pyrox_sample, _parameterized 🧱 🔁 xref:GP#11.3 A.45 sklearn-style EstimatorBase facade for parametric Bayes — 🧱 GAP — gh:pyrox#71
Part B — Bayesian Classification ¶ B.A — Bayesian logistic regression ¶ Key equations / models:
Binary likelihood: y i ∼ B e r n o u l l i ( σ ( ϕ ( x i ) ⊤ w ) ) y_i\sim\mathrm{Bernoulli}(\sigma(\phi(x_i)^\top w)) y i ∼ Bernoulli ( σ ( ϕ ( x i ) ⊤ w )) Probit alternative: y i ∼ B e r n o u l l i ( Φ ( ϕ ( x i ) ⊤ w ) ) y_i\sim\mathrm{Bernoulli}(\Phi(\phi(x_i)^\top w)) y i ∼ Bernoulli ( Φ ( ϕ ( x i ) ⊤ w )) Pólya–Gamma augmentation (Polson et al. 2013): σ ( η ) y ( 1 − σ ( η ) ) 1 − y = 2 − 1 exp ( κ η ) ∫ 0 ∞ exp ( − ω η 2 / 2 ) p ( ω ) d ω \sigma(\eta)^y(1-\sigma(\eta))^{1-y} = 2^{-1}\exp(\kappa\eta)\int_0^\infty\exp(-\omega\eta^2/2)p(\omega)\,d\omega σ ( η ) y ( 1 − σ ( η ) ) 1 − y = 2 − 1 exp ( κ η ) ∫ 0 ∞ exp ( − ω η 2 /2 ) p ( ω ) d ω Augmented model → conjugate Gaussian update on w ∣ ω w \mid \omega w ∣ ω # Tutorial Source Scope Refs / Notes B.1 Bayesian logistic regression from scratch (Bishop §4.5 walkthrough) — 🧱 GAP B.2 Probit vs logit — link-function comparison — 🧱 GAP B.3 Pólya–Gamma augmentation → conjugate Gibbs for logistic — 🧱 GAP B.4 Jaakkola–Jordan variational bound for logistic — 🧱 GAP B.5 Online / sequential logistic update — Laplace + Sherman–Morrison — 🧱 GAP
B.B — Multinomial / softmax classification ¶ Key equations / models:
Multinomial: y ∼ C a t ( s o f t m a x ( W ⊤ ϕ ( x ) ) ) y\sim\mathrm{Cat}(\mathrm{softmax}(W^\top\phi(x))) y ∼ Cat ( softmax ( W ⊤ ϕ ( x ))) One-vs-rest / multinomial-probit alternatives Augmentation schemes (Pólya–Gamma stick-breaking, Albert–Chib) # Tutorial Source Scope Refs / Notes B.6 Bayesian softmax / multinomial logistic regression — 🧱 GAP B.7 Stick-breaking + Pólya–Gamma for multinomial — 🧱 GAP B.8 Ordinal regression — cumulative-link Bayesian model — 🌉 GAP B.9 Multi-label classification — independent vs structured priors — 🔬 GAP
B.C — Inference variants for classification ¶ # Tutorial Source Scope Refs / Notes B.10 Laplace approximation for logistic regression — 🧱 🔁 GAP — canonical Bishop example; uses D.II machineryB.11 Variational logistic / softmax — mean-field & full-rank — 🧱 🔁 GAP — uses D.IIIB.12 HMC / NUTS for small Bayesian classifiers — 🧱 🔁 GAP — uses D.IVB.13 Expectation Propagation for GP classification — re-used here for BLR-classification — 🧱 🔁 xref:GP#6.17
B.D — Feature-based & last-layer classifiers ¶ # Tutorial Source Scope Refs / Notes B.14 RFF + Bayesian logistic regression (kernel classification) — 🌉 GAP B.15 Last-layer Bayesian classifier (Laplace / BLR head on a deterministic MLP) — 🔬 GAP B.16 SNGP for classification — distance-aware uncertainty — 🔬 GAP — gh:pyrox#42; pairs with D.21B.17 Random-feature GP classifier (LaplaceRandomFeatureCovariance) — 🧱 GAP — dd:features/nn/edward2_layers.md
B.E — Calibration for Bayesian classifiers ¶ # Tutorial Source Scope Refs / Notes B.18 Reliability diagrams & ECE for Bayesian classifiers — 🔬 GAP — see also F.1B.19 Temperature scaling on top of Bayesian posteriors — 🔬 GAP B.20 Predictive entropy / mutual information for classification uncertainty — 🔬 GAP
Part C — NN ↔ GP Bridges ¶ C.A — Theory ¶ Key equations / models:
NNGP (Lee et al. 2018): infinite-width MLP with i.i.d. priors → GP with recursive kernel K ( ℓ ) = T σ ( K ( ℓ − 1 ) ) K^{(\ell)} = T_\sigma(K^{(\ell-1)}) K ( ℓ ) = T σ ( K ( ℓ − 1 ) ) NTK (Jacot et al. 2018): Θ ( x , x ′ ) = E ⟨ ∂ θ f ( x ) , ∂ θ f ( x ′ ) ⟩ \Theta(x,x') = \mathbb{E}\langle\partial_\theta f(x),\partial_\theta f(x')\rangle Θ ( x , x ′ ) = E ⟨ ∂ θ f ( x ) , ∂ θ f ( x ′ )⟩ , frozen in the infinite-width limit ArcCosine (Cho & Saul 2009): k n ( x , x ′ ) = 1 π ∥ x ∥ n ∥ x ′ ∥ n J n ( θ ) k_n(x,x') = \tfrac{1}{\pi}\|x\|^n\|x'\|^n J_n(\theta) k n ( x , x ′ ) = π 1 ∥ x ∥ n ∥ x ′ ∥ n J n ( θ ) # Tutorial Source Scope Refs / Notes C.1 Infinite-width NN as a GP (NNGP) — 🧱 GAP — Lee et al. 2018C.2 Neural Tangent Kernel intro — 🧱 GAP C.3 ArcCosine kernel — NN-correspondence via infinite-width limits — 🧱 🔁 GAP — dd:features/gp/gpflow.md; xref:GP#2.7
C.B — Deep kernels ¶ # Tutorial Source Scope Refs / Notes C.4 Deep kernels — NN-warped GP inputs R pyroxgp/04_svgp_rff_nn 🌉 🔁 xref:GP#2.6 C.5 Deep RFF / stacked spectral GPs (Cutajar 2017) P deep_random_fourier_features 🔬 🔁 dd:mc#7
C.C — Functional priors ¶ # Tutorial Source Scope Refs / Notes C.6 Functional priors — BNNs that match a target GP prior — 🔬 GAP C.7 Prior predictive checks for BNNs — sample-and-visualise — 🌉 GAP
C.D — Pathwise BNN sampling ¶ # Tutorial Source Scope Refs / Notes C.8 Pathwise sampling for BNNs (analogue of Wilson 2020) — 🧱 🔁 GAP — needs gh:gaussx#77, #78; xref:GP#9.1
C.E — Shared infrastructure ¶ # Tutorial Source Scope Refs / Notes C.9 Shared pyrox._basis — VFF (GP) + HSGP (NN) sharing Laplacian eigenfunctions — 🧱 GAP — pyrox .plans/spectral-inducing-features.md
Part D — Bayesian Inference for Neural Networks ¶ Reorganised around the kind of approximation rather than the layer flavour. Layer-flavour (Edward2, Conv/RNN/Attn) lives in D.VI/D.VIII .
D.I — Point estimates: MLE / MAP / regularisation-as-prior ¶ Key equations / models:
MLE : θ ^ = arg max θ ∏ i p ( y i ∣ x i , θ ) \hat\theta = \arg\max_\theta \prod_i p(y_i\mid x_i,\theta) θ ^ = arg max θ ∏ i p ( y i ∣ x i , θ ) MAP: θ ^ = arg max θ p ( θ ) ∏ i p ( y i ∣ x i , θ ) \hat\theta = \arg\max_\theta p(\theta)\prod_i p(y_i\mid x_i,\theta) θ ^ = arg max θ p ( θ ) ∏ i p ( y i ∣ x i , θ ) L2 ridge ↔ Gaussian prior; L1 lasso ↔ Laplace prior; elastic-net = sum Weight decay / spectral / Jacobian regularisation as implicit priors # Tutorial Source Scope Refs / Notes D.1 MLE vs MAP — same architecture, prior-tuning sweep— 🧱 GAP D.2 Regularisation-as-prior — L2/L1/elastic-net ↔ Gaussian / Laplace / mixture — 🧱 GAP D.3 Spectral / Jacobian / weight-decay regularisation as implicit prior — 🌉 GAP
D.II — Gaussian (Laplace) approximations ¶ Key equations / models:
Laplace: q ( θ ) = N ( θ ^ , − H − 1 ) q(\theta) = \mathcal{N}(\hat\theta, -H^{-1}) q ( θ ) = N ( θ ^ , − H − 1 ) , H = ∇ 2 log p ( θ ∣ D ) H = \nabla^2\log p(\theta\mid\mathcal{D}) H = ∇ 2 log p ( θ ∣ D ) GGN: H ≈ J ⊤ R J H \approx J^\top R J H ≈ J ⊤ R J (drops 2nd-order); KFAC: block-Kronecker GGN Diagonal Laplace: d i a g ( H ) \mathrm{diag}(H) diag ( H ) via Hutchinson Linearised Laplace: predict via f θ ( x ) ≈ f θ ^ ( x ) + J θ ^ ( x ) ( θ − θ ^ ) f_\theta(x)\approx f_{\hat\theta}(x) + J_{\hat\theta}(x)(\theta-\hat\theta) f θ ( x ) ≈ f θ ^ ( x ) + J θ ^ ( x ) ( θ − θ ^ ) SWAG: low-rank + diagonal Gaussian over SGD iterates Moment-matching / unscented predictive # Tutorial Source Scope Refs / Notes D.4 Laplace approximation — pure mechanics (canonical) P advanced_gp_laplace 🧱 🔁 xref:GP#6.7 D.5 Gauss–Newton / GGN approximation P advanced_gp_gauss_newton 🧱 🔁 xref:GP#6.8 D.6 Quasi-Newton / L-BFGS site update P advanced_gp_qn 🧱 🔁 xref:GP#6.9 D.7 Posterior linearisation (Bayes-Newton) P advanced_gp_pl 🧱 🔁 xref:GP#6.10 D.8 Hutchinson Hessian / GGN diagonal for BNN Laplace — 🧱 🔁 GAP — api: hutchinson_hessian_diag; xref:GP#6.13D.9 KFAC Laplace — block-Kronecker GGN over a full network — 🔬 GAP D.10 Linearised Laplace predictive (Immer et al.) — 🔬 GAP D.11 SWAG — stochastic weight averaging Gaussian — 🔬 GAP D.12 Subspace inference — PCA of SGD trajectory — 🔬 GAP D.13 Moment-matching predictive — unscented / sigma-point propagation through NN — 🌉 🔁 GAP — xref:GP#10.1
D.III — Variational inference ¶ Key equations / models:
ELBO: log p ( y ) ≥ E q [ log p ( y , θ ) ] − E q [ log q ( θ ) ] \log p(y)\geq\mathbb{E}_q[\log p(y,\theta)] - \mathbb{E}_q[\log q(\theta)] log p ( y ) ≥ E q [ log p ( y , θ )] − E q [ log q ( θ )] Variational families: delta · mean-field diagonal · low-rank (S = V V ⊤ + d i a g S = VV^\top + \mathrm{diag} S = V V ⊤ + diag ) · full-rank Cholesky · normalising flow · whitened Natural gradient: ∇ ~ L = F − 1 ∇ L \tilde\nabla\mathcal{L} = F^{-1}\nabla\mathcal{L} ∇ ~ L = F − 1 ∇ L CVI sites (Khan & Lin 2017); reparameterisation, local-reparam, flipout # Tutorial Source Scope Refs / Notes D.14 Variational guides — delta / mean-field / low-rank / full-rank / whitened / flow — 🧱 🔁 GAP — dd:features/gp/variational_families.md; xref:GP#6.14D.15 Natural gradient VI G natural_gradient_vi 🌉 🔁 xref:GP#6.15 D.16 Mean-field VI for BNNs (MFVI) — Bayes-by-Backprop — 🔬 GAP — needs gh:gaussx#39 logdetD.17 Full-rank / low-rank structured VI for BNNs — 🔬 GAP D.18 Normalising-flow posteriors over weights — 🔬 GAP — bridge to gaussianization listD.19 Reparameterisation tricks — local reparam, flipout, weight-norm — 🧱 GAP D.20 Functional VI — variational posterior on f ( ⋅ ) f(\cdot) f ( ⋅ ) rather than θ — 🔬 GAP
D.IV — Sampling-based inference ¶ # Tutorial Source Scope Refs / Notes D.21 HMC / NUTS for small BNNs — 🔬 GAP — dd:mc#4D.22 SGLD / SG-HMC — stochastic-gradient Langevin & Hamiltonian — 🔬 GAP D.23 Stein Variational Gradient Descent (SVGD) — 🔬 GAP D.24 Ensemble-of-MCMC — multi-chain pooling — 🔬 GAP D.25 MCMC diagnostics for BNNs — R ^ \hat R R ^ , effective sample size, posterior-predictive checks — 🔬 GAP
D.V — Last-layer & functional posteriors ¶ # Tutorial Source Scope Refs / Notes D.26 Last-layer Bayes via Laplace — 🔬 GAP — api: gauss_newton_precision, ggn_diagonalD.27 Last-layer Bayes via RFF (BLR on penultimate features) — 🔬 GAP D.28 RandomFeatureGaussianProcess + LaplaceRandomFeatureCovariance — SNGP output layer — 🧱 GAP — dd:features/nn/edward2_layers.mdD.29 Subnetwork inference — only-some-layers-Bayesian — 🔬 GAP
D.VI — Stochastic / implicit Bayes¶ # Tutorial Source Scope Refs / Notes D.30 MC-Dropout as approximate Bayes — 🔬 GAP — dd:mc#3D.31 DenseVariationalDropout — learned per-weight dropout rates — 🧱 GAP — dd:features/nn/edward2_layers.mdD.32 DenseDVI — analytic Gaussian moment propagation — 🧱 GAP — dd:features/nn/edward2_layers.mdD.33 DenseRank1 / BatchEnsemble — shared W W W + per-member rank-1 perturbations — 🧱 GAP — dd:features/nn/edward2_layers.mdD.34 MCSoftmaxDenseFA / MCSigmoidDenseFA — heteroscedastic output (low-rank + diagonal) — 🧱 GAP — dd:features/nn/edward2_layers.mdD.35 DenseHierarchical — horseshoe prior (local + global shrinkage, ARD) — 🧱 GAP — dd:features/nn/edward2_layers.mdD.36 NCPNormalOutput — output-side noise contrastive prior — 🧱 GAP — dd:features/nn/edward2_layers.mdD.37 Conv2DReparameterization — Bayesian 2D conv — 🧱 GAP — dd:features/nn/layers_conv_rnn.mdD.38 Conv2DFlipout — lower-variance Bayesian conv — 🧱 GAP — dd:features/nn/layers_conv_rnn.mdD.39 LSTMCellVariational — Bayesian LSTM — 🧱 GAP — dd:features/nn/layers_conv_rnn.mdD.40 GRUCellVariational — Bayesian GRU (scan-compatible) — 🧱 GAP — dd:features/nn/layers_conv_rnn.mdD.41 MultiHeadAttentionVariational / MultiHeadAttentionBE — Bayesian attention — 🧱 GAP — dd:features/nn/edward2_layers.md, layers_conv_rnn.md
D.VII — Tempering, prior choice, diagnostics ¶ # Tutorial Source Scope Refs / Notes D.42 Cold posteriors & temperature scaling — 🔬 GAP D.43 KL annealing / β-tempered ELBO — 🧱 GAP — see also A.32D.44 Prior elicitation for BNNs — Gaussian / Laplace / horseshoe / mixture — 🌉 GAP D.45 Posterior predictive checks & residual diagnostics — 🌉 GAP D.46 Bayesian model averaging vs marginal likelihood / WAIC / LOO — 🔬 GAP D.47 Continual / online BNN updates — Laplace propagation, BLR-style refresh — 🔬 GAP — links to A.3
D.VIII — Distance-aware uncertainty ¶ # Tutorial Source Scope Refs / Notes D.48 SNGP — Spectral-Normalized GP head — 🔬 GAP — gh:pyrox#42, dd:features/nn/spectral_norm.mdD.49 DUE — Deterministic Uncertainty Estimation (spectral norm + inducing-point GP head) — 🔬 GAP — dd:features/nn/spectral_norm.md
Part E — Ensembles ¶ E.A — Vanilla ensembles ¶ # Tutorial Source Scope Refs / Notes E.1 Deep ensembles — vanilla — 🔬 GAP E.2 Ensemble primitives — three ways P ensemble_primitives_tutorial 🧱 🔁 xref:GP#12.1 E.3 EnsembleMAP & EnsembleVI runners P ensemble_runner_tutorial 🧱 🔁 xref:GP#12.2 E.4 Ensemble-of-MAP / -of-VI runner via vmap over PRNG keys — 🧱 GAP — gh:pyrox#70
E.B — Diversity strategies ¶ # Tutorial Source Scope Refs / Notes E.5 Snapshot / cyclical-LR ensembles — 🔬 GAP E.6 Hyper-deep ensembles (DenseRank1 substrate) — 🔬 GAP
E.C — Comparison ¶ # Tutorial Source Scope Refs / Notes E.7 Deep ensembles vs MFVI vs Laplace — calibration shootout — 🔬 GAP
Part F — Calibration, OOD, Active Learning ¶ # Tutorial Source Scope Refs / Notes F.1 Predictive calibration — ECE, reliability diagrams (regression + classification) — 🔬 GAP F.2 Temperature scaling & post-hoc calibration — 🔬 GAP F.3 NLPD / CRPS / coverage diagnostics for BNNs — 🔬 🔁 GAP — xref:GP#15F.4 Out-of-distribution detection with BNNs (predictive entropy, mutual info) — 🔬 GAP F.5 Active learning / Bayesian acquisition functions (BALD, max-entropy) — 🔬 GAP F.6 Selective prediction / abstention under uncertainty — 🔬 GAP
Part G — Bayesian Neural Fields ¶ Core (deterministic) neural-fields content lives in ../neural_fields/TUTORIAL_MASTER_LIST.md . This section is the Bayesian layer on top — point estimates and uncertainty for INRs.
# Tutorial Source Scope Refs / Notes G.1 Bayesian INR — probabilistic SIREN with MFVI weights — 🔬 GAP — pairs with xref:NF#B.1 (SIREN)G.2 Bayesian INR via last-layer Laplace on a SIREN — 🔬 GAP G.3 Bayesian NeRF — uncertainty in volumetric scenes — 🔬 GAP — pairs with xref:NF#C.1 (vanilla NeRF)G.4 Functional priors for INRs — match a target spatial GP — 🔬 GAP G.5 BNF layer family + BNFEstimator / MLE / VI runners — 🔬 GAP — gh:pyrox#72G.6 Bayesian neural fields flagship demo (bayesian_neural_fields.ipynb) — 🔬 GAP — gh:pyrox#73
Part H — Applied Case Studies (research_notebook/projects/bayesian_nns) ¶ H.A — Bayesian benchmarks ¶ # Tutorial Source Scope Refs / Notes H.1 Last-layer Bayesian NN on UCI regression suite — 🔬 GAP H.2 Deep RFF on geophysical / climate data P deep_random_fourier_features (port + extend) 🔬 🔁 H.3 scalable_gp_spectral demo — 5k 1D regression, dense GP vs VFF, ≥10× speedup — 🔬 GAP — pyrox .plans/spectral-inducing-features.md
H.B — Emulators & PDEs ¶ # Tutorial Source Scope Refs / Notes H.4 BNN emulator for a numerical simulator — 🔬 GAP H.5 Bayesian PINN — Burgers / heat / shallow-water — 🔬 GAP — pairs with A.28H.6 Bayesian operator learning — DeepONet / FNO with weight uncertainty — 🔬 GAP
H.C — Image / signal regression ¶ # Tutorial Source Scope Refs / Notes H.7 BNN for image regression / denoising — 🔬 GAP H.8 Probabilistic super-resolution via RFF / INR — 🔬 GAP
H.D — Capstone progressions ¶ # Tutorial Source Scope Refs / Notes H.9 Full 9-model regression masterclass — single dataset, methodical climb — 🔬 GAP — dd:examples/nn/regression_masterclass_eqx.md (~927 lines); see A.HH.10 Classification capstone — same dataset, Models 1–9 ported to classification — 🔬 GAP — mirrors A.H for Part B
Cross-list summary (items shared with GP list) ¶ Item GP ID BNN ID Suggested canonical home Spectral kernel models GP 7.5 A.7 pyrox (GP), cross-listed Random Fourier Features intro GP 7.8 A.8 pyrox (canonical), link both RFF as neural networks — A.9 pyrox Whitened SVGP / BLR view GP 5.7 A.16 gaussx (mechanics) BLR updates (blr_*_update) moved A.3 gaussx primitive demo — migrated out of GP list BLR in precision form GP 11.2 A.2 gaussx Three-pattern masterclass GP 11.3 A.44 pyrox Deep kernels GP 2.6 C.4 research_notebook Deep RFF (Cutajar) — C.5 / H.2 research_notebook (BNN) ArcCosine kernel GP 2.7 C.3 pyrox Pathwise sampling GP 9.1 C.8 pyrox Laplace mechanics GP 6.7 D.4 pyrox Gauss–Newton GP 6.8 D.5 pyrox Quasi-Newton sites GP 6.9 D.6 pyrox Posterior linearisation GP 6.10 D.7 pyrox Hutchinson Hessian diag GP 6.13 D.8 gaussx primitive + BNN application VI guides GP 6.14 D.14 pyrox (canonical for both) Natural-gradient VI GP 6.15 D.15 gaussx Moment matching predictive GP 10.1 D.13 gaussx primitive Log-Gaussian Cox Process moved A.23 migrated out of GP list Warped GP (Box–Cox) moved A.24 migrated out of GP list Warped GP w/ NF bijection moved A.25 migrated out of GP list EP for classification GP 6.17 B.13 pyrox Ensemble primitives GP 12.1 E.2 pyrox Ensemble runners GP 12.2 E.3 pyrox
Proposed final homes ¶ gaussx/docs/notebooks/ → A.A (BLR primitives), A.E (whitened SVGP / BLR view), D.II primitives (D.8 Hutchinson, D.13 moment matching), D.15 nat-grad VIpyrox/docs/notebooks/ → A.B, A.C, A.D, A.F, A.G, A.H, A.I, B.A–B.E, C.* shared infra, D.IV–D.VI library demos, E.A–E.Bresearch_notebook/projects/bayesian_nns/notebooks/ → all of D.III–D.IV applied, D.V, D.VII, E.C, F, G, HIn-scope vs aspirational ¶ In scope today (have library support in pyrox/gaussx): A.2, A.3, A.7, A.8, A.9, A.16, A.44, B.13, C.4, C.5, D.4–D.8, D.15, E.2, E.3, H.2In scope with planned features (open issues / .plans/): A.1, A.4–A.6, A.10–A.15, A.17–A.25, A.32, A.33–A.45, B.1–B.12, B.14–B.17, C.3, C.9, D.28, D.31–D.41, D.48, E.4, G.5, G.6, H.3, H.9, H.10Aspirational (need new infra or genuine research work): A.26–A.31, B.18–B.20, C.1, C.2, C.6, C.7, C.8, D.1–D.3, D.9–D.14 (applied), D.16–D.20, D.21–D.27 (applied), D.29, D.30, D.42–D.47, D.49, E.1, E.5–E.7, F.*, G.1–G.4, H.1, H.4–H.8