Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Bayesian Models & Neural Networks Tutorial Master List

A reconciled, exhaustive curriculum for Bayesian linear models, parametric regression, parametric classification, and Bayesian neural networks. The progression mirrors the textbook arc: Bayesian linear regression → basis functions → random / spectral features → shallow MLPs → deep BNNs. Each block lays out its own likelihood / constraint zoo and its own inference zoo so the curriculum can be entered at any depth.

Companion lists:

Cross-listed items (RFF, deep kernels, last-layer-Bayes, BLR, Laplace, VI guides) are flagged 🔁.

Legend — Source columns:

Scope tag:

Refs column: gh:<repo>#N = open GitHub issue (e.g., gh:pyrox#71) · dd:path = pyrox design_docs/pyrox/<path> · mc# = numbered model from examples/nn/regression_masterclass_eqx.md · xref:GP#X.Y = pointer into GP master list.


Curriculum at a glance


Part A — Bayesian Regression

A.A — Bayesian linear regression

Key equations / models:

#TutorialSourceScopeRefs / Notes
A.1Bayesian linear regression from scratch (mean-cov form)🧱GAP — dd:mc#1 polynomial features + Vandermonde + MCMC
A.2BLR in precision / natural formG numpyro_precision🧱 🔁xref:GP#11.2
A.3Sequential / online BLR updates🧱 🔁GAP — api: blr_diag_update, blr_full_update; moved from GP#6.19
A.4Polynomial basis regression with uncertainty🧱GAP — dd:mc#1
A.5Empirical Bayes for BLR — type-II MLE for noise + prior scales🧱GAP

A.B — Fixed feature maps

Key equations / models:

#TutorialSourceScopeRefs / Notes
A.6Fixed feature maps: Fourier, polynomial, wavelet, Gaussian-bump🧱GAP
A.7Spectral kernel models — visual guideP spectral_kernel_models🧱 🔁xref:GP#7.5

A.C — Random Fourier features

Key equations / models:

#TutorialSourceScopeRefs / Notes
A.8Random Fourier Features → SSGP → VSSGPP random_fourier_features🧱 🔁dd:mc#5, xref:GP#7.8
A.9RFF as a (shallow) neural network — fixed / learned / ensembleP rff_as_neural_networks🌉 🔁
A.10SSGP — Sparse Spectrum GP via RFF + BLR, O(D2N)O(D^2 N)🧱GAP — dd:examples/nn/models.md
A.11Heteroscedastic RFF — dual-head (mean + log-noise)🧱GAP — dd:features/nn/random_features.md
A.12Approximate GP via RFF + hierarchical prior on signal variance🧱GAP — dd:mc#6
A.13Variational Fourier Features (VSSGP) — learnable posterior over RFF freqs🧱GAP — pyrox .plans/spectral-inducing-features.md
A.14Orthogonal Random Features (ORF)🧱GAP — pyrox .plans/spectral-inducing-features.md

A.D — Spectral basis layers (HSGP)

Key equations / models:

#TutorialSourceScopeRefs / Notes
A.15HSGP — Hilbert-Space GP layer, deterministic Laplacian basis + spectral-density prior🧱 🔁GAP — dd:features/nn/random_features.md; xref:GP#7.12

A.E — Bridges to GP

Key equations / models:

#TutorialSourceScopeRefs / Notes
A.16Whitened SVGP as Bayesian linear regressionG whitened_svgp🌉 🔁xref:GP#5.7
A.17Kernel ridge ↔ MAP-BLR ↔ exact GP mean — three views, one estimator🧱GAP

A.F — Likelihood zoo (regression heads)

Key equations / models:

#TutorialSourceScopeRefs / Notes
A.18Gaussian, Student-t, Laplace likelihoods — robust regression🧱GAP
A.19Heteroscedastic NLL — dual-head (mean + log-noise)🧱GAP
A.20Mixture density networks (MDN)🌉GAP
A.21Quantile / pinball / expectile regression🌉GAP
A.22Censored / Tobit / survival likelihoods🔬GAP
A.23Log-Gaussian Cox Process — spatial point-process intensity🔬 🔁GAPmoved from GP#6.20; dd:examples/gp/moments.md
A.24Warped regression (Box–Cox) — skewed targets🧱 🔁GAPmoved from GP#6.21
A.25Warped regression with normalizing-flow bijection🔬 🔁GAPmoved from GP#6.24; xref to gaussianization list

A.G — Constrained & physics-informed losses

Key equations / models:

#TutorialSourceScopeRefs / Notes
A.26Positivity, monotone, convex output constraints via reparameterization🧱GAP
A.27Equality constraints via augmented Lagrangian🌉GAP
A.28Bayesian PINN — PDE residual + data likelihood under prior🔬GAP
A.29Boundary / initial-condition penalties🔬GAP
A.30Conservation / symmetry penalties (divergence-free, equivariant)🔬GAP
A.31Smoothness / TV regularisation as prior on outputs🌉GAP
A.32KL annealing & β-tempered ELBO ablation🧱GAP — see also D.VII

A.H — 9-model regression masterclass (pick-apart)

The pyrox examples/nn/regression_masterclass_eqx.md (~927 lines) is a single monolithic notebook. We break it into nine standalone tutorials, each running on the same dataset so a learner can ablate one block at a time.

Key equations / models: see Models 1–9 below — each row hyperlinks to the corresponding row in A.A–A.C / D.

#TutorialSourceScopeRefs / Notes
A.33Model 1 — Bayesian linear regression with polynomial features (NUTS)🧱dd:mc#1; pairs with A.1, A.4
A.34Model 2 — Neural network MAP — single hidden MLP via SVI + AutoDelta🧱dd:mc#2 (deterministic baseline); pairs with D.1
A.35Model 3 — MC-Dropout NN — dropout as approximate Bayes🧱dd:mc#3; pairs with D.16
A.36Model 4 — Bayesian NN via HMC/NUTS — small-scale weight-space inference🧱dd:mc#4; pairs with D.13
A.37Model 5 — SVR via Random Fourier Features (BLR on ϕ(x)\phi(x))🧱dd:mc#5; pairs with A.8, A.10
A.38Model 6 — Approximate GP via RFF + hierarchical prior on signal variance🧱dd:mc#6; pairs with A.12
A.39Model 7 — Deep GP via stacked RFF layers (Cutajar 2017)🧱 🔁dd:mc#7; pairs with C.5
A.40Model 8 — Last-layer Bayes (BLR on penultimate features)🧱NEW — dd:mc extension; pairs with D.7
A.41Model 9 — Mean-field VI BNN — full weight-space VI on the same MLP🧱NEW — dd:mc extension; pairs with D.10
A.42Capstone — calibration & predictive-distribution shootout across Models 1–9🔬GAP — leads into Part F

A.I — NN MAP baseline & library patterns

#TutorialSourceScopeRefs / Notes
A.43NN MAP baseline — single hidden MLP via SVI + AutoDelta (standalone)🧱GAP — dd:mc#2
A.44Three-pattern regression masterclass — tree_at / pyrox_sample / ParameterizedP regression_masterclass_treeat, _pyrox_sample, _parameterized🧱 🔁xref:GP#11.3
A.45sklearn-style EstimatorBase facade for parametric Bayes🧱GAP — gh:pyrox#71

Part B — Bayesian Classification

B.A — Bayesian logistic regression

Key equations / models:

#TutorialSourceScopeRefs / Notes
B.1Bayesian logistic regression from scratch (Bishop §4.5 walkthrough)🧱GAP
B.2Probit vs logit — link-function comparison🧱GAP
B.3Pólya–Gamma augmentation → conjugate Gibbs for logistic🧱GAP
B.4Jaakkola–Jordan variational bound for logistic🧱GAP
B.5Online / sequential logistic update — Laplace + Sherman–Morrison🧱GAP

B.B — Multinomial / softmax classification

Key equations / models:

#TutorialSourceScopeRefs / Notes
B.6Bayesian softmax / multinomial logistic regression🧱GAP
B.7Stick-breaking + Pólya–Gamma for multinomial🧱GAP
B.8Ordinal regression — cumulative-link Bayesian model🌉GAP
B.9Multi-label classification — independent vs structured priors🔬GAP

B.C — Inference variants for classification

#TutorialSourceScopeRefs / Notes
B.10Laplace approximation for logistic regression🧱 🔁GAP — canonical Bishop example; uses D.II machinery
B.11Variational logistic / softmax — mean-field & full-rank🧱 🔁GAP — uses D.III
B.12HMC / NUTS for small Bayesian classifiers🧱 🔁GAP — uses D.IV
B.13Expectation Propagation for GP classification — re-used here for BLR-classification🧱 🔁xref:GP#6.17

B.D — Feature-based & last-layer classifiers

#TutorialSourceScopeRefs / Notes
B.14RFF + Bayesian logistic regression (kernel classification)🌉GAP
B.15Last-layer Bayesian classifier (Laplace / BLR head on a deterministic MLP)🔬GAP
B.16SNGP for classification — distance-aware uncertainty🔬GAP — gh:pyrox#42; pairs with D.21
B.17Random-feature GP classifier (LaplaceRandomFeatureCovariance)🧱GAP — dd:features/nn/edward2_layers.md

B.E — Calibration for Bayesian classifiers

#TutorialSourceScopeRefs / Notes
B.18Reliability diagrams & ECE for Bayesian classifiers🔬GAP — see also F.1
B.19Temperature scaling on top of Bayesian posteriors🔬GAP
B.20Predictive entropy / mutual information for classification uncertainty🔬GAP

Part C — NN ↔ GP Bridges

C.A — Theory

Key equations / models:

#TutorialSourceScopeRefs / Notes
C.1Infinite-width NN as a GP (NNGP)🧱GAP — Lee et al. 2018
C.2Neural Tangent Kernel intro🧱GAP
C.3ArcCosine kernel — NN-correspondence via infinite-width limits🧱 🔁GAP — dd:features/gp/gpflow.md; xref:GP#2.7

C.B — Deep kernels

#TutorialSourceScopeRefs / Notes
C.4Deep kernels — NN-warped GP inputsR pyroxgp/04_svgp_rff_nn🌉 🔁xref:GP#2.6
C.5Deep RFF / stacked spectral GPs (Cutajar 2017)P deep_random_fourier_features🔬 🔁dd:mc#7

C.C — Functional priors

#TutorialSourceScopeRefs / Notes
C.6Functional priors — BNNs that match a target GP prior🔬GAP
C.7Prior predictive checks for BNNs — sample-and-visualise🌉GAP

C.D — Pathwise BNN sampling

#TutorialSourceScopeRefs / Notes
C.8Pathwise sampling for BNNs (analogue of Wilson 2020)🧱 🔁GAP — needs gh:gaussx#77, #78; xref:GP#9.1

C.E — Shared infrastructure

#TutorialSourceScopeRefs / Notes
C.9Shared pyrox._basis — VFF (GP) + HSGP (NN) sharing Laplacian eigenfunctions🧱GAP — pyrox .plans/spectral-inducing-features.md

Part D — Bayesian Inference for Neural Networks

Reorganised around the kind of approximation rather than the layer flavour. Layer-flavour (Edward2, Conv/RNN/Attn) lives in D.VI/D.VIII.

D.I — Point estimates: MLE / MAP / regularisation-as-prior

Key equations / models:

#TutorialSourceScopeRefs / Notes
D.1MLE vs MAP — same architecture, prior-tuning sweep🧱GAP
D.2Regularisation-as-prior — L2/L1/elastic-net ↔ Gaussian / Laplace / mixture🧱GAP
D.3Spectral / Jacobian / weight-decay regularisation as implicit prior🌉GAP

D.II — Gaussian (Laplace) approximations

Key equations / models:

#TutorialSourceScopeRefs / Notes
D.4Laplace approximation — pure mechanics (canonical)P advanced_gp_laplace🧱 🔁xref:GP#6.7
D.5Gauss–Newton / GGN approximationP advanced_gp_gauss_newton🧱 🔁xref:GP#6.8
D.6Quasi-Newton / L-BFGS site updateP advanced_gp_qn🧱 🔁xref:GP#6.9
D.7Posterior linearisation (Bayes-Newton)P advanced_gp_pl🧱 🔁xref:GP#6.10
D.8Hutchinson Hessian / GGN diagonal for BNN Laplace🧱 🔁GAP — api: hutchinson_hessian_diag; xref:GP#6.13
D.9KFAC Laplace — block-Kronecker GGN over a full network🔬GAP
D.10Linearised Laplace predictive (Immer et al.)🔬GAP
D.11SWAG — stochastic weight averaging Gaussian🔬GAP
D.12Subspace inference — PCA of SGD trajectory🔬GAP
D.13Moment-matching predictive — unscented / sigma-point propagation through NN🌉 🔁GAP — xref:GP#10.1

D.III — Variational inference

Key equations / models:

#TutorialSourceScopeRefs / Notes
D.14Variational guides — delta / mean-field / low-rank / full-rank / whitened / flow🧱 🔁GAP — dd:features/gp/variational_families.md; xref:GP#6.14
D.15Natural gradient VIG natural_gradient_vi🌉 🔁xref:GP#6.15
D.16Mean-field VI for BNNs (MFVI) — Bayes-by-Backprop🔬GAP — needs gh:gaussx#39 logdet
D.17Full-rank / low-rank structured VI for BNNs🔬GAP
D.18Normalising-flow posteriors over weights🔬GAP — bridge to gaussianization list
D.19Reparameterisation tricks — local reparam, flipout, weight-norm🧱GAP
D.20Functional VI — variational posterior on f()f(\cdot) rather than θ🔬GAP

D.IV — Sampling-based inference

#TutorialSourceScopeRefs / Notes
D.21HMC / NUTS for small BNNs🔬GAP — dd:mc#4
D.22SGLD / SG-HMC — stochastic-gradient Langevin & Hamiltonian🔬GAP
D.23Stein Variational Gradient Descent (SVGD)🔬GAP
D.24Ensemble-of-MCMC — multi-chain pooling🔬GAP
D.25MCMC diagnostics for BNNs — R^\hat R, effective sample size, posterior-predictive checks🔬GAP

D.V — Last-layer & functional posteriors

#TutorialSourceScopeRefs / Notes
D.26Last-layer Bayes via Laplace🔬GAP — api: gauss_newton_precision, ggn_diagonal
D.27Last-layer Bayes via RFF (BLR on penultimate features)🔬GAP
D.28RandomFeatureGaussianProcess + LaplaceRandomFeatureCovariance — SNGP output layer🧱GAP — dd:features/nn/edward2_layers.md
D.29Subnetwork inference — only-some-layers-Bayesian🔬GAP

D.VI — Stochastic / implicit Bayes

#TutorialSourceScopeRefs / Notes
D.30MC-Dropout as approximate Bayes🔬GAP — dd:mc#3
D.31DenseVariationalDropout — learned per-weight dropout rates🧱GAP — dd:features/nn/edward2_layers.md
D.32DenseDVI — analytic Gaussian moment propagation🧱GAP — dd:features/nn/edward2_layers.md
D.33DenseRank1 / BatchEnsemble — shared WW + per-member rank-1 perturbations🧱GAP — dd:features/nn/edward2_layers.md
D.34MCSoftmaxDenseFA / MCSigmoidDenseFA — heteroscedastic output (low-rank + diagonal)🧱GAP — dd:features/nn/edward2_layers.md
D.35DenseHierarchical — horseshoe prior (local + global shrinkage, ARD)🧱GAP — dd:features/nn/edward2_layers.md
D.36NCPNormalOutput — output-side noise contrastive prior🧱GAP — dd:features/nn/edward2_layers.md
D.37Conv2DReparameterization — Bayesian 2D conv🧱GAP — dd:features/nn/layers_conv_rnn.md
D.38Conv2DFlipout — lower-variance Bayesian conv🧱GAP — dd:features/nn/layers_conv_rnn.md
D.39LSTMCellVariational — Bayesian LSTM🧱GAP — dd:features/nn/layers_conv_rnn.md
D.40GRUCellVariational — Bayesian GRU (scan-compatible)🧱GAP — dd:features/nn/layers_conv_rnn.md
D.41MultiHeadAttentionVariational / MultiHeadAttentionBE — Bayesian attention🧱GAP — dd:features/nn/edward2_layers.md, layers_conv_rnn.md

D.VII — Tempering, prior choice, diagnostics

#TutorialSourceScopeRefs / Notes
D.42Cold posteriors & temperature scaling🔬GAP
D.43KL annealing / β-tempered ELBO🧱GAP — see also A.32
D.44Prior elicitation for BNNs — Gaussian / Laplace / horseshoe / mixture🌉GAP
D.45Posterior predictive checks & residual diagnostics🌉GAP
D.46Bayesian model averaging vs marginal likelihood / WAIC / LOO🔬GAP
D.47Continual / online BNN updates — Laplace propagation, BLR-style refresh🔬GAP — links to A.3

D.VIII — Distance-aware uncertainty

#TutorialSourceScopeRefs / Notes
D.48SNGP — Spectral-Normalized GP head🔬GAP — gh:pyrox#42, dd:features/nn/spectral_norm.md
D.49DUE — Deterministic Uncertainty Estimation (spectral norm + inducing-point GP head)🔬GAP — dd:features/nn/spectral_norm.md

Part E — Ensembles

E.A — Vanilla ensembles

#TutorialSourceScopeRefs / Notes
E.1Deep ensembles — vanilla🔬GAP
E.2Ensemble primitives — three waysP ensemble_primitives_tutorial🧱 🔁xref:GP#12.1
E.3EnsembleMAP & EnsembleVI runnersP ensemble_runner_tutorial🧱 🔁xref:GP#12.2
E.4Ensemble-of-MAP / -of-VI runner via vmap over PRNG keys🧱GAP — gh:pyrox#70

E.B — Diversity strategies

#TutorialSourceScopeRefs / Notes
E.5Snapshot / cyclical-LR ensembles🔬GAP
E.6Hyper-deep ensembles (DenseRank1 substrate)🔬GAP

E.C — Comparison

#TutorialSourceScopeRefs / Notes
E.7Deep ensembles vs MFVI vs Laplace — calibration shootout🔬GAP

Part F — Calibration, OOD, Active Learning

#TutorialSourceScopeRefs / Notes
F.1Predictive calibration — ECE, reliability diagrams (regression + classification)🔬GAP
F.2Temperature scaling & post-hoc calibration🔬GAP
F.3NLPD / CRPS / coverage diagnostics for BNNs🔬 🔁GAP — xref:GP#15
F.4Out-of-distribution detection with BNNs (predictive entropy, mutual info)🔬GAP
F.5Active learning / Bayesian acquisition functions (BALD, max-entropy)🔬GAP
F.6Selective prediction / abstention under uncertainty🔬GAP

Part G — Bayesian Neural Fields

Core (deterministic) neural-fields content lives in ../neural_fields/TUTORIAL_MASTER_LIST.md. This section is the Bayesian layer on top — point estimates and uncertainty for INRs.

#TutorialSourceScopeRefs / Notes
G.1Bayesian INR — probabilistic SIREN with MFVI weights🔬GAP — pairs with xref:NF#B.1 (SIREN)
G.2Bayesian INR via last-layer Laplace on a SIREN🔬GAP
G.3Bayesian NeRF — uncertainty in volumetric scenes🔬GAP — pairs with xref:NF#C.1 (vanilla NeRF)
G.4Functional priors for INRs — match a target spatial GP🔬GAP
G.5BNF layer family + BNFEstimator / MLE / VI runners🔬GAP — gh:pyrox#72
G.6Bayesian neural fields flagship demo (bayesian_neural_fields.ipynb)🔬GAP — gh:pyrox#73

Part H — Applied Case Studies (research_notebook/projects/bayesian_nns)

H.A — Bayesian benchmarks

#TutorialSourceScopeRefs / Notes
H.1Last-layer Bayesian NN on UCI regression suite🔬GAP
H.2Deep RFF on geophysical / climate dataP deep_random_fourier_features (port + extend)🔬 🔁
H.3scalable_gp_spectral demo — 5k 1D regression, dense GP vs VFF, ≥10× speedup🔬GAP — pyrox .plans/spectral-inducing-features.md

H.B — Emulators & PDEs

#TutorialSourceScopeRefs / Notes
H.4BNN emulator for a numerical simulator🔬GAP
H.5Bayesian PINN — Burgers / heat / shallow-water🔬GAP — pairs with A.28
H.6Bayesian operator learning — DeepONet / FNO with weight uncertainty🔬GAP

H.C — Image / signal regression

#TutorialSourceScopeRefs / Notes
H.7BNN for image regression / denoising🔬GAP
H.8Probabilistic super-resolution via RFF / INR🔬GAP

H.D — Capstone progressions

#TutorialSourceScopeRefs / Notes
H.9Full 9-model regression masterclass — single dataset, methodical climb🔬GAP — dd:examples/nn/regression_masterclass_eqx.md (~927 lines); see A.H
H.10Classification capstone — same dataset, Models 1–9 ported to classification🔬GAP — mirrors A.H for Part B

Cross-list summary (items shared with GP list)

ItemGP IDBNN IDSuggested canonical home
Spectral kernel modelsGP 7.5A.7pyrox (GP), cross-listed
Random Fourier Features introGP 7.8A.8pyrox (canonical), link both
RFF as neural networksA.9pyrox
Whitened SVGP / BLR viewGP 5.7A.16gaussx (mechanics)
BLR updates (blr_*_update)movedA.3gaussx primitive demo — migrated out of GP list
BLR in precision formGP 11.2A.2gaussx
Three-pattern masterclassGP 11.3A.44pyrox
Deep kernelsGP 2.6C.4research_notebook
Deep RFF (Cutajar)C.5 / H.2research_notebook (BNN)
ArcCosine kernelGP 2.7C.3pyrox
Pathwise samplingGP 9.1C.8pyrox
Laplace mechanicsGP 6.7D.4pyrox
Gauss–NewtonGP 6.8D.5pyrox
Quasi-Newton sitesGP 6.9D.6pyrox
Posterior linearisationGP 6.10D.7pyrox
Hutchinson Hessian diagGP 6.13D.8gaussx primitive + BNN application
VI guidesGP 6.14D.14pyrox (canonical for both)
Natural-gradient VIGP 6.15D.15gaussx
Moment matching predictiveGP 10.1D.13gaussx primitive
Log-Gaussian Cox ProcessmovedA.23migrated out of GP list
Warped GP (Box–Cox)movedA.24migrated out of GP list
Warped GP w/ NF bijectionmovedA.25migrated out of GP list
EP for classificationGP 6.17B.13pyrox
Ensemble primitivesGP 12.1E.2pyrox
Ensemble runnersGP 12.2E.3pyrox

Proposed final homes

In-scope vs aspirational