Convergence analysis of online learning algorithm with two-stage step size

Weilin Nie; Cheng Wang

doi:10.1515/ijnsns-2020-0155

Enjoy 40% off

academic books on De Gruyter Brill *

Article

Convergence analysis of online learning algorithm with two-stage step size

Weilin Nie and Cheng Wang

Published/Copyright: August 30, 2021

Published by

Become an author with De Gruyter Brill

Author Information

From the journal International Journal of Nonlinear Sciences and Numerical Simulation Volume 24 Issue 1

Abstract

Online learning is a classical algorithm for optimization problems. Due to its low computational cost, it has been widely used in many aspects of machine learning and statistical learning. Its convergence performance depends heavily on the step size. In this paper, a two-stage step size is proposed for the unregularized online learning algorithm, based on reproducing Kernels. Theoretically, we prove that, such an algorithm can achieve a nearly min–max convergence rate, up to some logarithmic term, without any capacity condition.

Keywords: learning theory; online learning; reproducing kernel Hilbert spaces; step size

Mathematics Subject Classification 2010: 68T05; 68Q32; 62J02; 41A46

Corresponding author: Cheng Wang, School of Mathematics and Statistics, Huizhou University, Huizhou 516007, P. R. China, E-mail: wangch@hzu.edu.cn

Funding source: Special Research Project on COVID-19 Prevention and Control in Colleges and Universities in Guangdong Province

Award Identifier / Grant number: 2020KZDZX1195

Funding source: Science and Technology Plan Project in Huizhou

Award Identifier / Grant number: 2020SD0402030

Funding source: Indigenous Innovation’s Capability Development Program of Huizhou University

Award Identifier / Grant number: HZU202003

Award Identifier / Grant number: HZU202020

Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: The work is supported partially by the Fund of Science and Technology Plan Project in Huizhou (No. 2020SD0402030), Indigenous Innovation’s Capability Development Program of Huizhou University (project numbers: HZU202003, HZU202020), and Special Research Project on COVID-19 Prevention and Control in Colleges and Universities in Guangdong Province (No. 2020KZDZX1195).
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References

[1] F. Cucker and S. Smale, “On the mathematical foundations of learning,” Bull. Am. Math. Soc., vol. 39, pp. 1–49, 2002.10.1090/S0273-0979-01-00923-5Search in Google Scholar

[2] F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, 2007.10.1017/CBO9780511618796Search in Google Scholar

[3] V. Vapnik, Statistical Learning Theory, John Wiley & Sons, 1998.Search in Google Scholar

[4] S. Smale and Y. Yao, “Online learning algorithms,” Found. Comput. Math., vol. 6, pp. 145–170, 2006. https://doi.org/10.1007/s10208-004-0160-z.Search in Google Scholar

[5] Q. Zhu, “Latent variable regression for supervised modeling and monitoring,” IEEE/CAA J. Autom. Sin., vol. 7, no. 3, pp. 800–811, 2020. https://doi.org/10.1109/jas.2020.1003153.Search in Google Scholar

[6] N. Manwani and M. Chandra, “Exact passive-aggressive algorithms for ordinal regression using interval labels,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 9, pp. 3259–3268, 2020. https://doi.org/10.1109/tnnls.2019.2939861.Search in Google Scholar

[7] X. Luo, Z. G. Liu, L. Jin, Y. Zhou, and M. C. Zhou, “Symmetric nonnegative matrix factorization-based community detection models and their convergence analysis,” IEEE Trans. Neural Netw. Learn. Syst., 2020.10.1109/TNNLS.2020.3041360Search in Google Scholar PubMed

[8] X. Luo, D. X. Wang, M. C. Zhou, and H. Q. Yuan, “Latent factor-based recommenders relying on extended stochastic gradient descent algorithms,” IEEE Trans. Syst. Man Cybern. Syst., vol. 51, no. 2, pp. 916–926, 2021. https://doi.org/10.1109/tsmc.2018.2884191.Search in Google Scholar

[9] Z. G. Liu, X. Luo, and Z. D. Wang, “Convergence analysis of single latent factor-dependent, nonnegative, and multiplicative update-based nonnegative latent factor models,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 4, pp. 1737–1749, 2021. https://doi.org/10.1109/tnnls.2020.2990990.Search in Google Scholar PubMed

[10] T. Zhang, Solving large scale linear prediction problems using stochastic gradient descent algorithms, ICML, 2004, pp. 919–926.10.1145/1015330.1015332Search in Google Scholar

[11] Y. Ying and M. Pontil, “Online gradient descent learning algorithms,” Found. Comput. Math., vol. 8, no. 5, pp. 561–596, 2008. https://doi.org/10.1007/s10208-006-0237-y.Search in Google Scholar

[12] J. H. Lin and D. X. Zhou, “Online learning algorithms can converge comparably fast as batch learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 6, pp. 2367–2378, 2017.10.1109/TNNLS.2017.2677970Search in Google Scholar PubMed

[13] Y. W. Lei, L. Shi, and Z. C. Guo, “Convergence of unregularized online learning algorithms,” J. Mach. Learn. Res., vol. 18, pp. 1–33, 2018.Search in Google Scholar

[14] Y. Ying and D. X. Zhou, “Online regularized classification algorithms,” IEEE Trans. Inf. Theor., vol. 52, no. 11, pp. 4775–4788, 2006. https://doi.org/10.1109/tit.2006.883632.Search in Google Scholar

[15] G. B. Ye and D. X. Zhou, “Fully online classification by regularization,” Appl. Comput. Harmon. Anal., vol. 23, pp. 198–214, 2007. https://doi.org/10.1016/j.acha.2006.12.001.Search in Google Scholar

[16] J. Langford, L. H. Li, and T. Zhang, “Sparse online learning via truncated gradient,” J. Mach. Learn. Res., vol. 10, pp. 777–801, 2009.Search in Google Scholar

[17] P. Tarrés and Y. Yao, “Online learning as stochastic approximations of regularization paths: optimality and almost-sure convergence,” IEEE Trans. Inf. Theor., vol. 60, no. 9, pp. 5716–5735, 2014. https://doi.org/10.1109/tit.2014.2332531.Search in Google Scholar

[18] A. Capponnetto and E. De Vito, “Optimal rates for the regularized least-squares algorithm,” Found. Comput. Math., vol. 7, pp. 331–368, 2007.10.1007/s10208-006-0196-8Search in Google Scholar

[19] Z. C. Guo and L. Shi, “Fast and strong convergence of online learning algorithms,” Adv. Comput. Math., vol. 45, pp. 2745–2770, 2019. https://doi.org/10.1007/s10444-019-09707-8.Search in Google Scholar

[20] Z. C. Guo, Y. M. Ying, and D. X. Zhou, “Online regularized learning with pairwise loss functions,” Adv. Comput. Math., vol. 43, pp. 127–150, 2017. https://doi.org/10.1007/s10444-016-9479-7.Search in Google Scholar

Received: 2020-07-10

Accepted: 2021-07-20

Published Online: 2021-08-30

Published in Print: 2023-02-23

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/ijnsns-2020-0155

Keywords for this article

learning theory; online learning; reproducing kernel Hilbert spaces; step size