Modern 4DVar

Optimization

How do we find the best solution to our problem?

CNRS

MEOM

1st Order Optimization¶

These are very common optimizers like SGD, Adam or AdamW.

2nd Order Optimization¶

These are less common optimizers but these typically converge much faster than 1st order methods. Most of these stem from the Gauss-Newton approximation method. However, the naive approach is often too expensive.

Low Rank Approximations - BFGS, L-BFGS
Iterative Methods - Hessian-Free Optimization
Structured Approximations - K-FAC methods

Gauss-Newton Dual Form¶

There is some recent work that tries to generalize these higher-order schemes under a single umbrella. They call it the Gauss-Newton dual criteria [Roulet & Blondel, 2023]

References

Roulet, V., & Blondel, M. (2023). Dual Gauss-Newton Directions for Deep Learning. arXiv. 10.48550/ARXIV.2308.08886

Inference

Variational Inference

Inference

Bayesian Learning Rule