Modern 4DVar

Learning to Learn

How can we use nested learning schemes to speed up optimization?

CNRS

MEOM

General Formulation¶

Whirlwind Tour¶

Optimized-Based¶

LSTM Meta-Optimizer¶

We will pay special attention to the

In many cases, we need to find the best state given the state (and parameters). Most gradient update schemes look like the following where it is fixed.

To find the optimal solution of this problem, we can write it down as:

\boldsymbol{z}^{(k+1)} = \boldsymbol{z}^{(k)} + \boldsymbol{g}_k

(1)

where $\boldsymbol{g}_k$ is some result of a generalized gradient operator

[\boldsymbol{g}_k, \boldsymbol{h}_{k+1}] = \boldsymbol{g}(\boldsymbol{\nabla_z}\boldsymbol{J},\boldsymbol{h}_k, k; \boldsymbol{\phi})

(2)

where $k$ is the iteration, $\boldsymbol{\phi}$ are the parameters of the gradient operator, and $\boldsymbol{h}$ is the hidden state.

\boldsymbol{L}_g\left(\boldsymbol{\phi} \right)

(3)

Inference

Bi-Level Optimization

Physics-Inspired

Overview