1st Order Optimization¶
These are very common optimizers like SGD, Adam or AdamW.
2nd Order Optimization¶
These are less common optimizers but these typically converge much faster than 1st order methods. Most of these stem from the Gauss-Newton approximation method. However, the naive approach is often too expensive.
- Low Rank Approximations - BFGS, L-BFGS
- Iterative Methods - Hessian-Free Optimization
- Structured Approximations - K-FAC methods
Gauss-Newton Dual Form¶
There is some recent work that tries to generalize these higher-order schemes under a single umbrella. They call it the Gauss-Newton dual criteria [Roulet & Blondel, 2023]
- Roulet, V., & Blondel, M. (2023). Dual Gauss-Newton Directions for Deep Learning. arXiv. 10.48550/ARXIV.2308.08886