Skip to main content
Back to top
Ctrl
+
K
Optimization for ML & LLMs
1. Introduction to the Course
1.1. Course Outline
1.2. Notations
2. Classical Optimization Theory
2.1. Unconstrained Optimization
2.2. Gradient Methods
2.3. Constrained Optimization
2.4. Extension and Application of Gradient Descent Based Algorithms
2.5. Duality, Lagrangian Multiplier Theorem, and KKT Conditions
3. Introduction to Large Language Models
4. Stochastic Optimization for Large Language Models
4.1. Stochastic Gradient Descent
4.2. Adaptive algorithms
4.3. Discussions on modern adaptive methods
4.4. Learning rate scheduler and muP
5. References
Index