References

5. References#

[BN24]

Jeremy Bernstein and Laker Newhouse. Old optimizer, new norm: an anthology. arXiv preprint arXiv:2409.20325, 2024.

[Ber97]

Dimitri P Bertsekas. Nonlinear programming. Journal of the Operational Research Society, 48(3):334–334, 1997.

[BV04]

Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.

[B+15]

Sébastien Bubeck and others. Convex optimization: algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231–357, 2015.

[CLSH19]

Xiangyi Chen, Sijia Liu, Ruoyu Sun, and Mingyi Hong. On the convergence of a class of adam-type algorithms for non-convex optimization. In International Conference on Learning Representations. 2019. URL: https://openreview.net/forum?id=H1x-x309tm.

[DHS11]

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 2011.

[DefossezBBU22]

Alexandre Défossez, Leon Bottou, Francis Bach, and Nicolas Usunier. A simple convergence proof of adam and adagrad. Transactions on Machine Learning Research, 2022. URL: https://openreview.net/forum?id=ZPQhzTSWA7.

[HSS12]

Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky. Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Cited on, 14(8):2, 2012.

[KB15]

Diederik Kingma and Jimmy Ba. Adam: a method for stochastic optimization. ICLR, 2015.

[Lan20]

Guanghui Lan. First-order and stochastic optimization methods for machine learning. Volume 1. Springer, 2020.

[LH19]

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations. 2019. URL: https://openreview.net/forum?id=Bkg6RiCqY7.

[Nes18]

Yurii Nesterov. Lectures on convex optimization. Volume 137. Springer, 2018.

[RKK18]

Sashank J. Reddi, Satyen Kale, and Sanjiv Kumar. On the convergence of adam and beyond. In International Conference on Learning Representations. 2018. URL: https://openreview.net/forum?id=ryQu7f-RZ.

[ZCL22]

Yushun Zhang, Congliang Chen, and Zhi-Quan Luo. Does adam converge and when? In ICLR Blog Track. 2022. https://iclr-blog-track.github.io/2022/03/25/does-adam/. URL: https://iclr-blog-track.github.io/2022/03/25/does-adam/.

[ZCS+22]

Yushun Zhang, Congliang Chen, Naichen Shi, Ruoyu Sun, and Zhi-Quan Luo. Adam can converge without any modification on update rules. Advances in neural information processing systems, 35:28386–28399, 2022.