4.4. How to stabilize training?# loss spikes, SPAM and stable-SPAM? precision concerns lr scheduler muP Lemma 4.1 (Descent lemma) this is a descent lemma check Lemma 4.1