5. References#
Jeremy Bernstein and Laker Newhouse. Old optimizer, new norm: an anthology. arXiv preprint arXiv:2409.20325, 2024.
Dimitri P Bertsekas. Nonlinear programming. Journal of the Operational Research Society, 48(3):334–334, 1997.
Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
Sébastien Bubeck and others. Convex optimization: algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231–357, 2015.
Xiangyi Chen, Sijia Liu, Ruoyu Sun, and Mingyi Hong. On the convergence of a class of adam-type algorithms for non-convex optimization. In International Conference on Learning Representations. 2019. URL: https://openreview.net/forum?id=H1x-x309tm.
John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 2011.
Alexandre Défossez, Leon Bottou, Francis Bach, and Nicolas Usunier. A simple convergence proof of adam and adagrad. Transactions on Machine Learning Research, 2022. URL: https://openreview.net/forum?id=ZPQhzTSWA7.
Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky. Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Cited on, 14(8):2, 2012.
Diederik Kingma and Jimmy Ba. Adam: a method for stochastic optimization. ICLR, 2015.
Guanghui Lan. First-order and stochastic optimization methods for machine learning. Volume 1. Springer, 2020.
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations. 2019. URL: https://openreview.net/forum?id=Bkg6RiCqY7.
Yurii Nesterov. Lectures on convex optimization. Volume 137. Springer, 2018.
Sashank J. Reddi, Satyen Kale, and Sanjiv Kumar. On the convergence of adam and beyond. In International Conference on Learning Representations. 2018. URL: https://openreview.net/forum?id=ryQu7f-RZ.
Yushun Zhang, Congliang Chen, and Zhi-Quan Luo. Does adam converge and when? In ICLR Blog Track. 2022. https://iclr-blog-track.github.io/2022/03/25/does-adam/. URL: https://iclr-blog-track.github.io/2022/03/25/does-adam/.
Yushun Zhang, Congliang Chen, Naichen Shi, Ruoyu Sun, and Zhi-Quan Luo. Adam can converge without any modification on update rules. Advances in neural information processing systems, 35:28386–28399, 2022.