Optimization

How do we find the best solution to our problem?

CNRS
MEOM

1st Order Optimization

These are very common optimizers like SGD, Adam or AdamW.

2nd Order Optimization

These are less common optimizers but these typically converge much faster than 1st order methods. Most of these stem from the Gauss-Newton approximation method. However, the naive approach is often too expensive.

  • Low Rank Approximations - BFGS, L-BFGS
  • Iterative Methods - Hessian-Free Optimization
  • Structured Approximations - K-FAC methods

Gauss-Newton Dual Form

There is some recent work that tries to generalize these higher-order schemes under a single umbrella. They call it the Gauss-Newton dual criteria [Roulet & Blondel, 2023]

References
  1. Roulet, V., & Blondel, M. (2023). Dual Gauss-Newton Directions for Deep Learning. arXiv. 10.48550/ARXIV.2308.08886