Learning Rate Scheduler¶
- WarmUp
- Cosine Annealing
Large Batch-Size¶
- 16, 32, 64, 128, 512, 1024, 2048
Fine-Tuning¶
- Generic -> Adam
- Fine-Tune -> LBFGS
Transfer Learning¶
Temporal Causlity¶
Architecture¶
- Temporal Encoders
- Tanh, Sine+Cosine, Fourier,
Training Perspective¶
- Sequence-to-Sequence Training
- Causal Learning Loss