Learning to Learn
How can we use nested learning schemes to speed up optimization?
General Formulation¶
Whirlwind Tour¶
Optimized-Based¶
LSTM Meta-Optimizer¶
We will pay special attention to the
In many cases, we need to find the best state given the state (and parameters). Most gradient update schemes look like the following where it is fixed.
To find the optimal solution of this problem, we can write it down as:
where is some result of a generalized gradient operator
where is the iteration, are the parameters of the gradient operator, and is the hidden state.