Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems

Darina Dvinskikh; Alexander Gasnikov

doi:10.1515/jiip-2020-0068

Artikel

Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems

Darina Dvinskikh und Alexander Gasnikov

Veröffentlicht/Copyright: 22. Januar 2021

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

Aus der Zeitschrift Journal of Inverse and Ill-posed Problems Band 29 Heft 3

Abstract

We introduce primal and dual stochastic gradient oracle methods for decentralized convex optimization problems. Both for primal and dual oracles, the proposed methods are optimal in terms of the number of communication steps. However, for all classes of the objective, the optimality in terms of the number of oracle calls per node takes place only up to a logarithmic factor and the notion of smoothness. By using mini-batching technique, we show that the proposed methods with stochastic oracle can be additionally parallelized at each node. The considered algorithms can be applied to many data science problems and inverse problems.

Keywords: Stochastic optimization; convex optimization; decentralized optimization; complexity bounds; first-order method; mini-batch; sum-type inverse problems

MSC 2010: 90C25; 90C06; 90C15

Funding source: Russian Science Foundation

Award Identifier / Grant number: 18-71-10108

Funding source: Russian Foundation for Basic Research

Award Identifier / Grant number: 19-31-51001

Funding source: Ministry of Science and Higher Education of the Russian Federation

Award Identifier / Grant number: 075-00337-20-03

Award Identifier / Grant number: 0714-2020-0005

Funding statement: The work of D. Dvinskikh was supported (Sections 1 and 5) by RFBR 19-31-51001 and funded (Section 6) by the Russian Science Foundation (Project 18-71-10108). The work of A. Gasnikov was supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) No. 075-00337-20-03, Project No. 0714-2020-0005.

Acknowledgements

We would like to thank F. Bach, P. Dvurechensky, E. Gorbunov, A. Koloskova, A. Kulunchakov, J. Mairal, A. Nemirovski, A. Olshevsky, S. Parsegov, B. Polyak, N. Srebro A. Taylor and C. Uribe for useful discussions.

References

[1] Z. Allen-Zhu, Katyusha: The first direct acceleration of stochastic gradient methods, STOC’17—Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, ACM, New York (2017), 1200–1205. 10.1145/3055399.3055448Suche in Google Scholar

[2] Z. Allen-Zhu, How to make the gradients small stochastically: Even faster convex and nonconvex SGD, Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Neural Information Processing Systems Foundation, San Diego (2018), 1157–1167. Suche in Google Scholar

[3] Z. Allen-Zhu and E. Hazan, Optimal black-box reductions between optimization objectives, Advances in Neural Information Processing Systems 29 (NeurIPS 2016), Neural Information Processing Systems Foundation, San Diego (2016), 1614–1622. Suche in Google Scholar

[4] A. S. Anikin, A. V. Gasnikov, P. E. Dvurechensky, A. I. Tyurin and A. V. Chernov, Dual approaches to the minimization of strongly convex functionals with a simple structure under affine constraints, Comput. Math. Math. Phys. 57 (2017), no. 8, 1262–1276. 10.1134/S0965542517080048Suche in Google Scholar

[5] Y. Arjevani and O. Shamir, Communication complexity of distributed convex learning and optimization, Advances in Neural Information Processing Systems 28 (NeurIPS 2015), Neural Information Processing Systems Foundation, San Diego (2015), 1756–1764. Suche in Google Scholar

[6] A. d’Aspremont, Smooth optimization with approximate gradient, SIAM J. Optim. 19 (2008), no. 3, 1171–1183. 10.1137/060676386Suche in Google Scholar

[7] A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization. Analysis, Algorithms, and Engineering Applications, MPS/SIAM Ser. Optim., Society for Industrial and Applied Mathematics, Philadelphia, 2001. 10.1137/1.9780898718829Suche in Google Scholar

[8] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, Upper Saddle River, 1989. Suche in Google Scholar

[9] A. Beznosikov, E. Gorbunov and A. Gasnikov, Derivative-free method for decentralized distributed non-smooth optimization, preprint (2019), https://arxiv.org/abs/1911.10645. Suche in Google Scholar

[10] C. L. Byrne, Iterative Optimization in Inverse Problems, Monogr. Research Notes Math., CRC Press, Boca Raton, 2014. 10.1201/b16485Suche in Google Scholar

[11] Y. Carmon and J. Duchi, Gradient descent finds the cubic-regularized nonconvex Newton step, SIAM J. Optim. 29 (2019), no. 3, 2146–2178. 10.1137/17M1113898Suche in Google Scholar

[12] A. Chernov, P. Dvurechensky and A. Gasnikov, Fast primal-dual gradient method for strongly convex minimization problems with linear constraints, Discrete Optimization and Operations Research, Lecture Notes in Comput. Sci. 9869, Springer, Cham (2016), 391–403. 10.1007/978-3-319-44914-2_31Suche in Google Scholar

[13] M. B. Cohen, J. Diakonikolas and L. Orecchia, On acceleration with noise-corrupted gradients, preprint (2018), https://arxiv.org/abs/1805.12591. 10.29007/7mhjSuche in Google Scholar

[14] O. Devolder, Exactness, inexactness and stochasticity in first-order methods for large-scale convex optimization, PhD thesis, ICTEAM and CORE, Université Catholique de Louvain, 2013. Suche in Google Scholar

[15] O. Devolder, F. Glineur and Y. Nesterov, First-order methods of smooth convex optimization with inexact oracle, Math. Program. 146 (2014), no. 1–2, 37–75. 10.1007/s10107-013-0677-5Suche in Google Scholar

[16] D. Dvinskikh, E. Gorbunov, A. Gasnikov, P. Dvurechensky and C. A. Uribe, On dual approach for distributed stochastic convex optimization over networks, preprint (2019), https://arxiv.org/abs/1903.09844. 10.1109/CDC40024.2019.9029798Suche in Google Scholar

[17] D. Dvinskikh, A. Turin, A. Gasnikov and S. Omelchenko, Accelerated and Unaccelerated Stochastic Gradient Descent in Model Generality, Mat. Zametki 108 (2020), no. 4, 515–528. 10.1134/S0001434620090230Suche in Google Scholar

[18] P. Dvurechensky, D. Dvinskikh, A. Gasnikov, C. A. Uribe and A. Nedich, Decentralize and randomize: Faster algorithm for wasserstein barycenters, Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Neural Information Processing Systems Foundation, San Diego (2018), 10760–10770. Suche in Google Scholar

[19] P. Dvurechensky and A. Gasnikov, Stochastic intermediate gradient method for convex problems with stochastic inexact oracle, J. Optim. Theory Appl. 171 (2016), no. 1, 121–145. 10.1007/s10957-016-0999-6Suche in Google Scholar

[20] P. Dvurechensky, A. Gasnikov and A. Lagunovskaya, Parallel algorithms and probability of large deviation for stochastic convex optimization problems, Numer. Anal. Appl. 11 (2018), no. 1, 33–37. 10.1134/S1995423918010044Suche in Google Scholar

[21] P. Dvurechensky, A. Gasnikov and A. Tiurin, Randomized similar triangles method: A unifying framework for accelerated randomized optimization methods (coordinate descent, directional search, derivative-free method), preprint (2017), https://arxiv.org/abs/1707.08486. Suche in Google Scholar

[22] P. Dvurechensky, E. Gorbunov and A. Gasnikov, An accelerated directional derivative method for smooth stochastic convex optimization, European J. Oper. Res. 290 (2021), no. 2, 601–621. 10.1016/j.ejor.2020.08.027Suche in Google Scholar

[23] A. Fallah, M. Gurbuzbalaban, A. Ozdaglar, U. Simsekli and L. Zhu, Robust distributed accelerated stochastic gradient methods for multi-agent networks, preprint (2019), https://arxiv.org/abs/1910.08701. Suche in Google Scholar

[24] D. Foster, A. Sekhari, O. Shamir, N. Srebro, K. Sridharan and B. Woodworth, The complexity of making the gradient small in stochastic convex optimization, preprint (2019), https://arxiv.org/abs/1902.04686. Suche in Google Scholar

[25] Y. Gao and T. Blumensath, Distributed computation of linear inverse problems with application to computed tomography, preprint (2017), https://arxiv.org/abs/1709.00953. Suche in Google Scholar

[26] A. Gasnikov, Universal gradient descent, preprint (2017), https://arxiv.org/abs/1711.00394. Suche in Google Scholar

[27] A. Gasnikov, S. Kabanikhin, A. Mohammed and M. Shishlenin, Convex optimization in hilbert space with applications to inverse problems, preprint (2017), https://arxiv.org/abs/1703.00267. Suche in Google Scholar

[28] A. Gasnikov and Y. Nesterov, Universal method for stochastic composite optimization problems, Comput. Math. Math. Phys. 58 (2018), no. 1, 48–64. 10.1134/S0965542518010050Suche in Google Scholar

[29] S. Ghadimi and G. Lan, Stochastic first- and zeroth-order methods for nonconvex stochastic programming, SIAM J. Optim. 23 (2013), no. 4, 2341–2368. 10.1137/120880811Suche in Google Scholar

[30] A. Godichon-Baggioni and S. Saadane, On the rates of convergence of parallelized averaged stochastic gradient algorithms, Statistics 54 (2020), no. 3, 618–635. 10.1080/02331888.2020.1764557Suche in Google Scholar

[31] E. Gorbunov, D. Dvinskikh and A. Gasnikov, Optimal decentralized distributed algorithms for stochastic convex optimization, preprint (2019), https://arxiv.org/abs/1911.07363. Suche in Google Scholar

[32] V. Guigues, A. Juditsky and A. Nemirovski, Non-asymptotic confidence bounds for the optimal value of a stochastic program, Optim. Methods Softw. 32 (2017), no. 5, 1033–1058. 10.1080/10556788.2017.1350177Suche in Google Scholar

[33] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning. Data Mining, Inference, and Prediction, 2nd ed., Springer Ser. Statist., Springer, New York, 2009. 10.1007/978-0-387-84858-7Suche in Google Scholar

[34] H. Hendrikx, F. Bach and L. Massoulié, Accelerated decentralized optimization with local updates for smooth and strongly convex objectives, preprint (2018), https://arxiv.org/abs/1810.02660. Suche in Google Scholar

[35] H. Hendrikx, F. Bach and L. Massoulié, An accelerated decentralized stochastic proximal algorithm for finite sums, preprint (2019), https://arxiv.org/abs/1905.11394. Suche in Google Scholar

[36] H. Hendrikx, F. Bach and L. Massoulié, Asynchronous accelerated proximal stochastic gradient for strongly convex distributed finite sums, preprint (2019), https://arxiv.org/abs/1901.09865. Suche in Google Scholar

[37] H. Hendrikx, F. Bach and L. Massoulié, An optimal algorithm for decentralized finite sum optimization, preprint (2020), https://arxiv.org/abs/2005.10675. 10.1137/20M134842XSuche in Google Scholar

[38] H. Hendrikx, F. Bach and L. Massoulié, Dual-free stochastic decentralized optimization with variance reduction, preprint (2020), https://arxiv.org/abs/2006.14384. Suche in Google Scholar

[39] H. Hendrikx, L. Xiao, S. Bubeck, F. Bach and L. Massoulié, Statistically preconditioned accelerated gradient method for distributed optimization, preprint (2020), https://arxiv.org/abs/2002.10726. Suche in Google Scholar

[40] A. Ivanova, D. Grishchenko, A. Gasnikov and E. Shulgin, Adaptive catalyst for smooth convex optimization, preprint (2019), https://arxiv.org/abs/1911.11271. 10.1007/978-3-030-91059-4_2Suche in Google Scholar

[41] C. Jin, P. Netrapalli, R. Ge, S. M. Kakade and M. I. Jordan, A short note on concentration inequalities for random vectors with subgaussian norm, preprint (2019), https://arxiv.org/abs/1902.03736. Suche in Google Scholar

[42] S. Kakade, S. Shalev-Shwartz and A. Tewari, On the duality of strong convexity and strong smoothness: Learning applications and matrix regularization, unpublished manuscript (2009), http://ttic.uchicago.edu/shai/papers/KakadeShalevTewari09.pdf. Suche in Google Scholar

[43] S. P. Karimireddy, S. Kale, M. Mohri, S. J. Reddi, S. U. Stich and A. T. Suresh, Scaffold: Stochastic controlled averaging for federated learning, preprint (2019), https://arxiv.org/abs/1910.06378. Suche in Google Scholar

[44] V. M. Kibardin, Decomposition into functions in the minimization problem, Avtom. Telem. 1979 (1979), no. 9, 66–79. Suche in Google Scholar

[45] D. Kim and J. A. Fessler, Optimized first-order methods for smooth convex minimization, Math. Program. 159 (2016), no. 1–2, 81–107. 10.1007/s10107-015-0949-3Suche in Google Scholar PubMed PubMed Central

[46] A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi and S. U. Stich, A unified theory of decentralized SGD with changing topology and local updates, Proceedings of the 37th International Conference on Machine Learning. ICML 2020, ICML, San Diego (2020), 5381–5393; https://arxiv.org/abs/2003.10422. Suche in Google Scholar

[47] D. Kovalev, A. Salim and P. Richtarik, Optimal and practical algorithms for smooth and strongly convex decentralized optimization, preprint (2020), https://arxiv.org/abs/2006.11773. Suche in Google Scholar

[48] A. Kulunchakov and J. Mairal, A generic acceleration framework for stochastic composite optimization, preprint (2019), https://arxiv.org/abs/1906.01164. Suche in Google Scholar

[49] A. Kulunchakov and J. Mairal, Estimate sequences for stochastic composite optimization: Variance reduction, acceleration, and robustness to noise, preprint (2019), https://arxiv.org/abs/1901.08788. Suche in Google Scholar

[50] A. Kulunchakov and J. Mairal, Estimate sequences for variance-reduced stochastic composite optimization, preprint (2019), https://arxiv.org/abs/1905.02374. Suche in Google Scholar

[51] G. Lan, Gradient sliding for composite optimization, Math. Program. 159 (2016), no. 1–2, 201–235. 10.1007/s10107-015-0955-5Suche in Google Scholar

[52] G. Lan, Lectures on optimization methods for machine learning, Lecture notes (2019), http://pwp.gatech.edu/guanghui-lan/wp-content/uploads/sites/330/2019/08/LectureOPTML.pdf. Suche in Google Scholar

[53] G. Lan, S. Lee and Y. Zhou, Communication-efficient algorithms for decentralized and stochastic optimization, Math. Program. 180 (2020), no. 1–2, 237–284. 10.1007/s10107-018-1355-4Suche in Google Scholar

[54] G. Lan and Y. Zhou, An optimal randomized incremental gradient method, Math. Program. 171 (2018), no. 1–2, 167–215. 10.1007/s10107-017-1173-0Suche in Google Scholar

[55] G. Lan and Y. Zhou, Random gradient extrapolation for distributed and stochastic optimization, SIAM J. Optim. 28 (2018), no. 4, 2753–2782. 10.1137/17M1157891Suche in Google Scholar

[56] H. Li, C. Fang, W. Yin and Z. Lin, A sharp convergence rate analysis for distributed accelerated gradient methods, preprint (2018), https://arxiv.org/abs/1810.01053. 10.29007/h5vsSuche in Google Scholar

[57] H. Li and Z. Lin, Revisiting EXTRA for smooth distributed optimization, SIAM J. Optim. 30 (2020), no. 3, 1795–1821. 10.1137/18M122902XSuche in Google Scholar

[58] H. Li, Z. Lin and Y. Fang, Optimal accelerated variance reduced extra and diging for strongly convex and smooth decentralized optimization, preprint (2020), https://arxiv.org/abs/2009.04373. 10.29007/qwpkSuche in Google Scholar

[59] H. Lin, J. Mairal and Z. Harchaoui, A universal catalyst for first-order optimization, Proceedings of the 28th International Conference on Neural Information Processing Systems – NIPS’15, MIT Press, Cambridge (2015), 3384–3392. Suche in Google Scholar

[60] B. Mathieu, T. Adrien and B. Francis, Principled analyses and design of first-order methods with proximal inexact proximal operator, preprint (2020), https://arxiv.org/abs/2006.06041. Suche in Google Scholar

[61] H. B. McMahan, E. Moore, D. Ramage, S. Hampson and B. Agüera y Arcas, Communication-efficient learning of deep networks from decentralized data, preprint (2016), https://arxiv.org/abs/1602.05629. Suche in Google Scholar

[62] A. Nedić, Distributed optimization over networks, Multi-Agent Optimization, Lecture Notes in Math. 2224, Springer, Cham (2018), 1–84. 10.1007/978-3-319-97142-1_1Suche in Google Scholar

[63] A. Nedić, Distributed gradient methods for convex machine learning problems in networks: Distributed optimization, IEEE Signal Process. Mag. 37 (2020), no. 3, 92–101. 10.1109/MSP.2020.2975210Suche in Google Scholar

[64] A. Nedić, A. Olshevsky and C. A. Uribe, Graph-theoretic analysis of belief system dynamics under logic constraints, preprint (2018), https://arxiv.org/abs/1810.02456. 10.1038/s41598-019-45076-4Suche in Google Scholar PubMed PubMed Central

[65] A. Nemirovski, S. Onn and U. G. Rothblum, Accuracy certificates for computational problems with convex structure, Math. Oper. Res. 35 (2010), no. 1, 52–78. 10.1287/moor.1090.0427Suche in Google Scholar

[66] Y. Nesterov, Smooth minimization of non-smooth functions, Math. Program. 103 (2005), no. 1, 127–152. 10.1007/s10107-004-0552-5Suche in Google Scholar

[67] Y. Nesterov, Primal-dual subgradient methods for convex problems, Math. Program. 120 (2009), no. 1, 221–259. 10.1007/s10107-007-0149-xSuche in Google Scholar

[68] Y. Nesterov, Introduction to Convex Optimization, MCCME, Moscow, 2010. Suche in Google Scholar

[69] Y. Nesterov, How to make the gradients small, Optima 88 (2012), 10–11. Suche in Google Scholar

[70] Y. Nesterov, Gradient methods for minimizing composite functions, Math. Program. 140 (2013), no. 1, 125–161. 10.1007/s10107-012-0629-5Suche in Google Scholar

[71] Y. Nesterov, Universal gradient methods for convex optimization problems, Math. Program. 152 (2015), no. 1–2, 381–404. 10.1007/s10107-014-0790-0Suche in Google Scholar

[72] Y. Nesterov, Implementable tensor methods in unconstrained convex optimization, CORE Discussion Paper 2018/05, CORE UCL, 2018. Suche in Google Scholar

[73] Y. Nesterov, Lectures on Convex Optimization, Springer Optim. Appl. 137, Springer, Cham, 2018. 10.1007/978-3-319-91578-4Suche in Google Scholar

[74] Y. Nesterov and S. U. Stich, Efficiency of the accelerated coordinate descent method on structured optimization problems, SIAM J. Optim. 27 (2017), no. 1, 110–123. 10.1137/16M1060182Suche in Google Scholar

[75] A. Olshevsky, I. C. Paschalidis and S. Pu, Asymptotic network independence in distributed optimization for machine learning, preprint (2019), https://arxiv.org/abs/1906.12345. Suche in Google Scholar

[76] A. Olshevsky, I. C. Paschalidis and S. Pu, A non-asymptotic analysis of network independence for distributed stochastic gradient descent, preprint (2019), https://arxiv.org/abs/1906.02702. Suche in Google Scholar

[77] B. T. Poljak, Iterative algorithms for singular minimization problems, Nonlinear Programming 4 (Madison 1980), Academic Press, New York (1981), 147–166. 10.1016/B978-0-12-468662-5.50011-2Suche in Google Scholar

[78] B. T. Polyak, Introduction to Optimization, Transl. Ser. Math. Eng., Optimization Software, New York, 1987. Suche in Google Scholar

[79] R. T. Rockafellar, Convex Analysis, Princeton Math. Ser. 28, Princeton University Press, Princeton, 1970. 10.1515/9781400873173Suche in Google Scholar

[80] A. Rogozin and A. Gasnikov, Projected gradient method for decentralized optimization over time-varying networks, preprint (2019), https://arxiv.org/abs/1911.08527. Suche in Google Scholar

[81] A. Rogozin and A. Gasnikov, Penalty-based method for decentralized optimization over time-varying graphs, Optimization and Applications: Proceedings of the 11th International Conference. OPTIMA 2020 (Moscow 2020), Springer, Cham (2020), 239–256. 10.1007/978-3-030-62867-3_18Suche in Google Scholar

[82] A. Rogozin, V. Lukoshkin, A. Gasnikov, D. Kovalev and E. Shulgin, Towards accelerated rates for distributed optimization over time-varying networks, preprint (2020), https://arxiv.org/abs/2009.11069. 10.1007/978-3-030-91059-4_19Suche in Google Scholar

[83] A. Rogozin, C. A. Uribe, A. V. Gasnikov, N. Malkovsky and A. Nedić, Optimal distributed convex optimization on slowly time-varying graphs, IEEE Trans. Control Netw. Syst. 7 (2020), no. 2, 829–841. 10.1109/TCNS.2019.2949439Suche in Google Scholar

[84] K. Scaman, F. Bach, S. Bubeck, Y. T. Lee and L. Massoulié, Optimal algorithms for smooth and strongly convex distributed optimization in networks, Proceedings of the 34th International Conference on Machine Learning. ICML 2017 (Sysney 2017), ICML, San Diego (2017), 3027–3036. Suche in Google Scholar

[85] K. Scaman, F. Bach, S. Bubeck, L. Massoulié and Y. T. Lee, Optimal algorithms for non-smooth distributed optimization in networks, Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Neural Information Processing Systems Foundation, San Diego (2018), 2745–2754. Suche in Google Scholar

[86] S. Shalev-Shwartz, O. Shamir, N. Srebro and K. Sridharan, Stochastic convex optimization, Proceedings of the Conference on Learning Theory (COLT), COLT (2009), https://www.cs.mcgill.ca/~colt2009/papers/018.pdf. Suche in Google Scholar

[87] S. Shalev-Shwartz and T. Zhang, Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, Proceedings of the 34th International Conference on Machine Learning. ICML 2014 (Bejing 2014), ICML, San Diego (2014), (2014), 64–72. 10.1007/s10107-014-0839-0Suche in Google Scholar

[88] O. Shamir and S. Shalev-Shwartz, Matrix completion with the trace norm: learning, bounding, and transducing, J. Mach. Learn. Res. 15 (2014), 3401–3423. Suche in Google Scholar

[89] A. Shapiro, D. Dentcheva and A. Ruszczyński, Lectures on Stochastic Programming. Modeling and Theory, MPS/SIAM Ser. Optimiz. 9, Society for Industrial and Applied Mathematics, Philadelphia, 2009. 10.1137/1.9780898718751Suche in Google Scholar

[90] V. Spokoiny, Parametric estimation. Finite sample theory, Ann. Statist. 40 (2012), no. 6, 2877–2909. 10.1214/12-AOS1054Suche in Google Scholar

[91] S. Sra, Tractable optimization in machine learning, Tractability, Cambridge University Press, Cambridge (2014), 202–230. 10.1017/CBO9781139177801.008Suche in Google Scholar

[92] F. Stonyakin, D. Dvinskikh, P. Dvurechensky, A. Kroshnin, O. Kuznetsova, A. Agafonov, A. Gasnikov, A. Tyurin, C. A. Uribe, D. Pasechnyuk and S. Artamonov, Gradient methods for problems with inexact model of the objective, International Conference on Mathematical Optimization Theory and Operations Research, Lecture Notes in Comput. Sci. 11548, Springer, Cham (2019), 97–114. 10.1007/978-3-030-22629-9_8Suche in Google Scholar

[93] F. Stonyakin, A. Gasnikov, A. Tyurin, D. Pasechnyuk, A. Agafonov, P. Dvurechensky, D. Dvinskikh and V. Piskunova, Inexact model: A framework for optimization and variational inequalities, preprint (2019), https://arxiv.org/abs/1902.00990. 10.1080/10556788.2021.1924714Suche in Google Scholar

[94] F. Stonyakin, A. Stepanov, A. Gasnikov and A. Titov, Mirror descent for constrained optimization problems with large subgradient values of functional constraints, Comput. Res. Modell. 12 (2020), no. 2, 301–317. 10.20537/2076-7633-2020-12-2-301-317Suche in Google Scholar

[95] H. Sun and M. Hong, Distributed non-convex first-order optimization and information processing: Lower complexity bounds and rate optimal algorithms, IEEE Trans. Signal Process. 67 (2019), no. 22, 5912–5928. 10.1109/ACSSC.2018.8645279Suche in Google Scholar

[96] J. Tang, K. Egiazarian, M. Golbabaee and M. Davies, The practicality of stochastic optimization in imaging inverse problems, IEEE Trans. Comput. Imaging 6 (2020), 1471–1485. 10.1109/TCI.2020.3032101Suche in Google Scholar

[97] C. A. Uribe, D. Dvinskikh, P. Dvurechensky, A. Gasnikov and A. Nedić, Distributed computation of Wasserstein barycenters over networks, 2018 IEEE 57th Annual Conference on Decision and Control (CDC), IEEE Press, Piscataway (2018), 6544–6549. 10.1109/CDC.2018.8619160Suche in Google Scholar

[98] C. R. Vogel, Computational Methods for Inverse Problems, Front. Appl. Math. 23, Society for Industrial and Applied Mathematics, Philadelphia, 2002. 10.1137/1.9780898717570Suche in Google Scholar

[99] B. E. Woodworth, K. K. Patel, S. U. Stich, Z. Dai, B. Bullins, H. B. McMahan, O. Shamir and N. Srebro, Is local SGD better than minibatch SGD?, preprint (2020), https://arxiv.org/abs/2002.07839. Suche in Google Scholar

[100] B. E. Woodworth, K. K. Patel and N. Srebro, Minibatch vs local SGD for heterogeneous distributed learning, preprint (2020), https://arxiv.org/abs/2006.04735. Suche in Google Scholar

[101] B. E. Woodworth and N. Srebro, Tight complexity bounds for optimizing composite objectives, Advances in Neural Information Processing Systems 29 (NeurIPS 2016), Neural Information Processing Systems Foundation, San Diego (2016), 3639–3647. Suche in Google Scholar

[102] B. E. Woodworth, J. Wang, A. Smith, B. McMahan and N. Srebro, Graph oracle models, lower bounds, and gaps for parallel stochastic optimization, Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Neural Information Processing Systems Foundation, San Diego (2018), 8505–8515. Suche in Google Scholar

[103] J. Xu, Y. Tian, Y. Sun and G. Scutari, Accelerated primal-dual algorithms for distributed smooth convex optimization over networks, preprint (2019), https://arxiv.org/abs/1910.10666. Suche in Google Scholar

[104] H. Ye, L. Luo, Z. Zhou and T. Zhang, Multi-consensus decentralized accelerated gradient descent, preprint (2020), https://arxiv.org/abs/2005.00797. Suche in Google Scholar

[105] N. Ye, F. Roosta-Khorasani and T. Cui, Optimization methods for inverse problems, 2017 MATRIX Annals, MATRIX Book Ser. 2, Springer, Cham (2019), 121–140. 10.1007/978-3-030-04161-8_9Suche in Google Scholar

[106] H. Yuan and T. Ma, Federated accelerated stochastic gradient descent, preprint (2020), https://arxiv.org/abs/2006.08950. Suche in Google Scholar

Received: 2020-11-19

Accepted: 2020-12-10

Published Online: 2021-01-22

Published in Print: 2021-06-01

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/jiip-2020-0068

Schlagwörter für diesen Artikel

Stochastic optimization; convex optimization; decentralized optimization; complexity bounds; first-order method; mini-batch; sum-type inverse problems