How to stabilize training?

4.4. How to stabilize training?#

  • loss spikes, SPAM and stable-SPAM?

  • precision concerns

  • lr scheduler

  • muP

Lemma 4.1 (Descent lemma)

this is a descent lemma

check Lemma 4.1