Abstract
In this paper, we revisit Linear Neural Networks (LNNs) with single-output neurons performing linear operations. The study focuses on constructing an optimal regularized weight matrix Q from training pairs
Funding source: National Key Research and Development Program of China
Award Identifier / Grant number: 2022YFC3310300
Funding source: National Natural Science Foundation of China
Award Identifier / Grant number: 12171036
Funding statement: This work was supported by the National Key Research and Development Program of China (Grant No. 2022YFC3310300), the Beijing Natural Science Foundation (Grant No. Z210001), the Shenzhen Sci-Tech Fund (Grant No. RCJC20231211090030059), the National Natural Science Foundation of China (Grant No. 12171036) for Zhang Ye and the China Scholarship Council (Grant No. 202008230141) for Liu Shuang. The authors also acknowledge the support of the state assignment of Sobolev Institute of Mathematics (Theme 2024-FWNF-2024-0001 – “Theory, methods, and applications of inverse problems and statistical analysis”) for Sergey Kabanikhin, and a grant for research centers in the field of artificial intelligence (Grant No. 000000D730321P5Q0002) for Sergei Strijhak.
References
[1] H. Abdi, A neural network primer, J. Biol. Syst. 2 (1994), no. 3, 247–281. 10.1142/S0218339094000179Suche in Google Scholar
[2] E. Alpaydin, Machine Learning: The New AI, MIT, Cambridge, 2021. Suche in Google Scholar
[3] A. B. Bakushinsky, M. Y. Kokurin and A. Smirnova, Iterative Methods for Ill-Posed Problems, Inverse Ill-posed Probl. Ser. 54, Walter de Gruyter, Berlin, 2011. 10.1515/9783110250657Suche in Google Scholar
[4] Y. Bengio, Learning Deep Architectures for AI, Now Publishers, Hannover, 2009. 10.1561/9781601982957Suche in Google Scholar
[5] N. Bjorck, C. P. Gomes, B. Selman and K. Q. Weinberger, Understanding batch normalization, Advances in Neural Information Processing Systems. Vol. 31, Curran Associates, Red Hook (2018), 1–12. Suche in Google Scholar
[6] M. A. Botchev, S. I. Kabanikhin, M. A. Shishlenin and E. E. Tyrtyshnikov, The Cauchy problem for the 3D Poisson equation: Landweber iteration vs. horizontally diagonalize and fit method, J. Inverse Ill-Posed Probl. 31 (2023), no. 2, 203–221. 10.1515/jiip-2022-0092Suche in Google Scholar
[7] L. Bottou, Large-scale machine learning with stochastic gradient descent, Proceedings of COMPSTAT’2010, Springer, Heidelberg (2010), 177–186. 10.1007/978-3-7908-2604-3_16Suche in Google Scholar
[8] L. Bottou, Stochastic gradient descent tricks, Neural Networks: Tricks of the Trade, Springer Berlin (2012), 421–436. 10.1007/978-3-642-35289-8_25Suche in Google Scholar
[9] P. Bühlmann and S. Van De Geer, Statistics for High-Dimensional Data, Springer Ser. Statist., Springer, Heidelberg, 2011. 10.1007/978-3-642-20192-9Suche in Google Scholar
[10] S. Chatterjee and A. S. Hadi, Regression Analysis by Example, John Wiley & Sons, Hoboken, 2015. Suche in Google Scholar
[11] D. Chicco, M. J. Warrens and G. Jurman, The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation, PeerJ Comput. Sci. 7 (2021), Paper No. e623. 10.7717/peerj-cs.623Suche in Google Scholar PubMed PubMed Central
[12] K. R. Chowdhary, Fundamentals of Artificial Intelligence, Springer, New Delhi, 2020. 10.1007/978-81-322-3972-7Suche in Google Scholar
[13] J. W. Demmel, Applied Numerical Linear Algebra, Society for Industrial and Applied Mathematics, Philadelphia, 1997. 10.1137/1.9781611971446Suche in Google Scholar
[14] R. C. Deo, Machine learning in medicine, Circulation 132 (2015), no. 20, 1920–1930. 10.1161/CIRCULATIONAHA.115.001593Suche in Google Scholar PubMed PubMed Central
[15] J. Duchi, E. Hazan and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res. 12 (2011), 2121–2159. Suche in Google Scholar
[16] S. Dzhumaev, Approximate calculation of a pseudosolution, Dokl. Akad. Nauk Tadzhik. SSR 25 (1982), no. 10, 584–587. Suche in Google Scholar
[17] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau and S. Thrun, Dermatologist-level classification of skin cancer with deep neural networks, Nature 542 (2017), no. 7639, 115–118. 10.1038/nature21056Suche in Google Scholar PubMed PubMed Central
[18] A. T. Fam, MFIR filters: Properties and applications, IEEE Trans. Acoust. Speech Signal Process. 29 (1981), no. 6, 1128–1136. 10.1109/TASSP.1981.1163696Suche in Google Scholar
[19] J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc. 96 (2001), no. 456, 1348–1360. 10.1198/016214501753382273Suche in Google Scholar
[20] J. Friedman, T. Hastie and R. Tibshirani, Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw. 33 (2010), no. 1, 1–20. 10.18637/jss.v033.i01Suche in Google Scholar
[21] K. Fukumizu, Special statistical properties of neural network learning, Proceedings of 1997 International Symposium on Nonlinear Theory and its Applications. Vol. 2, IEICE, Tokyo (1997), 747–750. Suche in Google Scholar
[22] Y. Gal and Z. Ghahramani, Dropout as a bayesian approximation: Representing model uncertainty in deep learning, Proc. Mach. Learn. Res. (PMLR) 48 (2016), 1050–1059. Suche in Google Scholar
[23] E. Gately, Neural Networks for Financial Forecasting, John Wiley & Sons, Hoboken, 1995. Suche in Google Scholar
[24] F. Girosi, M. Jones and T. Poggio, Regularization theory and neural networks architectures, Neural Comput. 7 (1995), no. 2, 219–269. 10.1162/neco.1995.7.2.219Suche in Google Scholar
[25] I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, Adapt. Comput. Mach. Learn., MIT Press, Cambridge, 2016. Suche in Google Scholar
[26] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning, Springer Ser. Statist., Springer, New York, 2009. 10.1007/978-0-387-84858-7Suche in Google Scholar
[27] T. Hastie, R. Tibshirani and M. Wainwright, Statistical Learning with Sparsity, Monogr. Statist. Appl. Probab. 143, CRC Press, Boca Raton, 2015. 10.1201/b18401Suche in Google Scholar
[28] S. Haykin, Neural Networks and Learning Machines, Pearson Education India, New York, 2009. Suche in Google Scholar
[29] K. He, X. Zhang, S. Ren and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, Proceedings of the IEEE International Conference on Computer Vision, IEEE Press, Piscataway (2015), 1026–1034. 10.1109/ICCV.2015.123Suche in Google Scholar
[30] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE Press, Piscataway (2016), 770–778. 10.1109/CVPR.2016.90Suche in Google Scholar
[31] A. E. Hoerl and R. W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems, Technometrics 12 (1970), no. 1, 55–67. 10.1080/00401706.1970.10488634Suche in Google Scholar
[32] S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, Proc. Mach. Learn. Res. (PMLR) 37 (2015), 448–456. Suche in Google Scholar
[33] V. K. Ivanov, V. V. Vasin and V. P. Tanana, Inverse Ill-posed Probl. Ser., VSP, Utrecht, 2002. Suche in Google Scholar
[34] M. I. Jordan and T. M. Mitchell, Machine learning: Trends, perspectives, and prospects, Science 349 (2015), no. 6245, 255–260. 10.1126/science.aaa8415Suche in Google Scholar PubMed
[35] S. I. Kabanikhin, Siberian IT-industry artificial intelligence, Sci. Tech. Sib. 10 (2023), 9–15. Suche in Google Scholar
[36] S. I. Kabanikhin, M. A. Shishlenin, D. B. Nurseitov, A. T. Nurseitova and S. E. Kasenov, Comparative analysis of methods for regularizing an initial boundary value problem for the Helmholtz equation, J. Appl. Math. 2014 (2014), Article ID 786326. 10.1155/2014/786326Suche in Google Scholar
[37] B. Kaltenbacher, A. Neubauer and O. Scherzer, Iterative Regularization Methods for Nonlinear Ill-Posed Problems, Radon Ser. Comput. Appl. Math. 6, Walter de Gruyter, Berlin, 2008. 10.1515/9783110208276Suche in Google Scholar
[38] I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas and I. Chouvarda, Machine learning and data mining methods in diabetes research, Comput. Struct. Biotechnol. J. 15 (2017), 104–116. 10.1016/j.csbj.2016.12.005Suche in Google Scholar PubMed PubMed Central
[39] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, Proceedings of the 3-th International Conference on Learning Representations, ICLR, preprint (2015), 1–15. Suche in Google Scholar
[40] I. Kononenko, Machine learning for medical diagnosis: History, state of the art and perspective, Artif. Intell. Med. 23 (2001), no. 1, 89–109. 10.1016/S0933-3657(01)00077-XSuche in Google Scholar PubMed
[41] K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V. Karamouzis and D. I. Fotiadis, Machine learning applications in cancer prognosis and prediction, Comput. Struct. Biotechnol. J. 13 (2015), 8–17. 10.1016/j.csbj.2014.11.005Suche in Google Scholar PubMed PubMed Central
[42] A. Krizhevsky, I. Sutskever and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems. Vol. 25, Curran Associates, Red Hook (2012), 1–9. Suche in Google Scholar
[43] M. M. Lavrentiev, Integral equations of the first kind, Dokl. Akad. Nauk SSSR 127 (1959), 31–33. Suche in Google Scholar
[44] Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86 (1998), no. 11, 2278–2324. 10.1109/5.726791Suche in Google Scholar
[45] R. W. B. Lewis, The American Adam, University of Chicago, Chicago, 2009. Suche in Google Scholar
[46] X. Liang, L. Wu, J. Li, Y. Wang, Q. Meng, T. Qin, W. Chen, M. Zhang and T. Liu, R-drop: Regularized dropout for neural networks, Advances in Neural Information Processing Systems. Vol. 34, Curran Associates, Red Hook (2021), 10890–10905. Suche in Google Scholar
[47] M. W. Libbrecht and W. S. Noble, Machine learning applications in genetics and genomics, Nat. Rev. Genet. 16 (2015), no. 6, 321–332. 10.1038/nrg3920Suche in Google Scholar PubMed PubMed Central
[48] H. Long, Y. Zhang and G. Gao, An accelerated inexact Newton regularization scheme with a learned feature-selection rule for non-linear inverse problems, Inverse Problems 40 (2024), no. 8, Article ID 085011. 10.1088/1361-6420/ad5e19Suche in Google Scholar
[49] D. Lorenz and N. Worliczek, Necessary conditions for variational regularization schemes, Inverse Problems 29 (2013), no. 7, Article ID 075016. 10.1088/0266-5611/29/7/075016Suche in Google Scholar
[50] A. Mikołajczyk and M. Grochowski, Data augmentation for improving deep learning in image classification problem, 2018 International Interdisciplinary PhD Workshop, IEEE Press, Piscataway (2018), 117–122. 10.1109/IIPHDW.2018.8388338Suche in Google Scholar
[51] T. M. Mitchell, Machine Learning, McGraw-Hill, New York, 1997. Suche in Google Scholar
[52] S. K. Mitra and C. R. Rao, Projections under seminorms and generalized Moore Penrose inverses, Linear Algebra Appl. 9 (1974), 155–167. 10.1016/0024-3795(74)90034-2Suche in Google Scholar
[53] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland and G. Ostrovski, Human-level control through deep reinforcement learning, Nature 518 (2015), no. 7540, 529–533. 10.1038/nature14236Suche in Google Scholar PubMed
[54] V. A. Morozov, Pseudo-solutions, USSR Comput. Math. Math. Phys. 9 (1969), no. 6, 196–203. 10.1016/0041-5553(69)90136-0Suche in Google Scholar
[55] V. A. Morozov and A. B. Nazimov, The necessary and sufficient conditions of regularizability of degenerate sets of linear algebraic equations using the shift method, USSR Comput. Math. Math. Phys. 26 (1986), no. 5, 1–6. 10.1016/0041-5553(86)90033-9Suche in Google Scholar
[56] A. Y. Ng, Feature selection, L1 vs L2 regularization, and rotational invariance, Proceedings of the Twenty-First International Conference on Machine Learning, Association for Computing Machinery, New York (2004), 78–78. Suche in Google Scholar
[57] A. Nguyen, J. Yosinski and J. Clune, Deep neural networks are easily fooled: High confidence predictions for unrecognizable images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE Press, Piscataway (2015), 427–436. 10.1109/CVPR.2015.7298640Suche in Google Scholar
[58] E. Oja, Principal components, minor components, and linear neural networks, Neural Netw. 5 (1992), no. 6, 927–935. 10.1016/S0893-6080(05)80089-9Suche in Google Scholar
[59] M. A. Olshanskii and E. E. Tyrtyshnikov, Iterative Methods for Linear Systems, Society for Industrial and Applied Mathematics, Philadelphia, 2014. 10.1137/1.9781611973464Suche in Google Scholar
[60] D. S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk and Q. V. Le, Specaugment: A simple data augmentation method for automatic speech recognition, Interspeech 2019, ISCA, Graz (2019), 2613–2617. 10.21437/Interspeech.2019-2680Suche in Google Scholar
[61]
M. Y. Park and T. Hastie,
[62] L. Prechelt, Early stopping-but when?, Neural Networks: Tricks of the Trade, Springer, Berlin (2002), 55–69. 10.1007/3-540-49430-8_3Suche in Google Scholar
[63] A. Y. Prikhodko, M. A. Shishlenin, N. S. Novikov and D. V. Klyuchinskiy, Encoder neural network in 2D acoustic tomography, Appl. Comput. Math. 23 (2024), no. 1, 83–98. Suche in Google Scholar
[64] Y. Qin, M. Huo, X. Liu and S. C. Li, Biomarkers and computational models for predicting efficacy to tumor ICI immunotherapy, Front. Immunol. 15 (2024), Article ID 1368749. 10.3389/fimmu.2024.1368749Suche in Google Scholar PubMed PubMed Central
[65] S. J. Reddi, S. Kale and S. Kumar, On the convergence of adam and beyond, Proceedings of the 6th International Conference on Learning Representations, ICLR, Vancouver (2019), 1–23. Suche in Google Scholar
[66] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, You only look once: Unified, real-time object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE Press, Piscataway (2016), 779–788. 10.1109/CVPR.2016.91Suche in Google Scholar
[67] I. H. Sarker, Machine learning: Algorithms, real-world applications and research directions, SN Comput. Sci. 2 (2021), no. 3, Paper No. 160. 10.1007/s42979-021-00592-xSuche in Google Scholar PubMed PubMed Central
[68] S. Sharma, S. Sharma and A. Athaiya, Activation functions in neural networks, Int. J. Eng. Appl. Sci. Technol. 6 (2017), no. 12, 310–316. 10.33564/IJEAST.2020.v04i12.054Suche in Google Scholar
[69] C. Shorten and T. M. Khoshgoftaar, A survey on image data augmentation for deep learning, J. Big Data. 6 (2019), no. 1, 1–48. 10.1186/s40537-019-0197-0Suche in Google Scholar
[70] A. H. Shurrab and A. Y. Maghari, Blood diseases detection using data mining techniques, Proceedings of the 2017 8th International Conference on Information Technology, IEEE Press, Piscataway (2017), 625–631. 10.1109/ICITECH.2017.8079917Suche in Google Scholar
[71] S. S. Shwartz and S. B. David, Understanding Machine Learning: From Theory to Algorithms, Cambridge University, New York, 2014. Suche in Google Scholar
[72] P. Y. Simard, D. Steinkraus and J. C. Platt, Best practices for convolutional neural networks applied to visual document analysis, Seventh International Conference on Document Analysis and Recognition, IEEE Press, Piscataway (2003), 958–963. 10.1109/ICDAR.2003.1227801Suche in Google Scholar
[73] Y. Y. Song and L. Ying, Decision tree methods: Applications for classification and prediction, Shanghai Arch. Psychiatry 27 (2015), no. 2, 130–135. Suche in Google Scholar
[74] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15 (2014), 1929–1958. Suche in Google Scholar
[75] J. A. Suykens, J. P. Vandewalle and B. L. De Moor, Artificial Neural Networks for Modelling and Control of Non-Linear Systems, Springer, Berlin, 2012. Suche in Google Scholar
[76] U. Tautenhahn, On the asymptotical regularization of nonlinear ill-posed problems, Inverse Problems 10 (1994), no. 6, 1405–1418. 10.1088/0266-5611/10/6/014Suche in Google Scholar
[77] R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B 58 (1996), no. 1, 267–288. 10.1111/j.2517-6161.1996.tb02080.xSuche in Google Scholar
[78] A. N. Tihonov, On the solution of ill-posed problems and the method of regularization, Dokl. Akad. Nauk SSSR 151 (1963), 501–504. Suche in Google Scholar
[79] L. N. Trefethen and D. Bau, III, Numerical Linear Algebra, Society for Industrial and Applied Mathematics, Philadelphia, 2022. Suche in Google Scholar
[80] G. M. Vainikko, Error estimates of the successive approximation method for ill-posed problems, Avtomat. i Telemekh. (1980), no. 3, 84–92. Suche in Google Scholar
[81] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin, Attention is all you need, Advances in Neural Information Processing Systems. Vol. 30, Curran Associates, Red Hook (2017), 1–11. Suche in Google Scholar
[82] J. Wei and K. Zou, Eda: Easy data augmentation techniques for boosting performance on text classification tasks, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Association for Computational Linguistics, Hong Kong (2019), 6382–6388. 10.18653/v1/D19-1670Suche in Google Scholar
[83] P. J. Werbos, Backpropagation through time: what it does and how to do it, Proc. IEEE 78 (1990), no. 10, 1550–1560. 10.1109/5.58337Suche in Google Scholar
[84] C. J. Willmott and K. Matsuura, Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance, Clim. Res. 30 (2005), no. 1, 79–82. 10.3354/cr030079Suche in Google Scholar
[85] C. Xu and Y. Zhang, Estimating adsorption isotherm parameters in chromatography via a virtual injection promoting double feed-forward neural network, J. Inverse Ill-Posed Probl. 30 (2022), no. 5, 693–712. 10.1515/jiip-2020-0121Suche in Google Scholar
[86] J. Yang, C. Xu and Y. Zhang, Reconstruction of the s-wave velocity via mixture density networks with a new rayleigh wave dispersion function, IEEE Trans. Geosci. Remote Sens. 60 (2022), 1–13. 10.1109/TGRS.2022.3169236Suche in Google Scholar
[87] Y. Zhang, On the acceleration of optimal regularization algorithms for linear ill-posed inverse problems, Calcolo 60 (2023), no. 1, Paper No. 6. 10.1007/s10092-022-00501-5Suche in Google Scholar
[88] Y. Zhang and C. Chen, Stochastic asymptotical regularization for linear inverse problems, Inverse Problems 39 (2023), no. 1, Article ID 015007. 10.1088/1361-6420/aca70fSuche in Google Scholar
[89] Y. Zhang and C. Chen, Stochastic linear regularization methods: Random discrepancy principle and applications, Inverse Problems 40 (2024), no. 2, Article ID 025007. 10.1088/1361-6420/ad149eSuche in Google Scholar
[90] Y. Zhang and B. Hofmann, On fractional asymptotical regularization of linear ill-posed problems in Hilbert spaces, Fract. Calc. Appl. Anal. 22 (2019), no. 3, 699–721. 10.1515/fca-2019-0039Suche in Google Scholar
[91] Y. Zhang and B. Hofmann, On the second-order asymptotical regularization of linear ill-posed inverse problems, Appl. Anal. 99 (2020), no. 6, 1000–1025. 10.1080/00036811.2018.1517412Suche in Google Scholar
[92] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005), no. 2, 301–320. 10.1111/j.1467-9868.2005.00503.xSuche in Google Scholar
© 2025 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Frontmatter
- Local convergence of the error-reduction algorithm for real-valued objects
- A rotation total variation regularization for full waveform inversion
- Stochastic data-driven Bouligand–Landweber method for solving non-smooth inverse problems
- On determining the fractional exponent of the subdiffusion equation
- Tow-parameter quasi-boundary value method for a backward abstract time-degenerate fractional parabolic problem
- Determining both leading coefficient and source in a nonlocal elliptic equation
- Discrete dynamical systems: Inverse problems and related topics
- Stability estimates for an inverse problem of determining time-dependent coefficients in a system of parabolic equations
- Numerical solutions to inverse nodal problems for the Sturm–Liouville operator and their applications
- Revisiting linear machine learning through the perspective of inverse problems
Artikel in diesem Heft
- Frontmatter
- Local convergence of the error-reduction algorithm for real-valued objects
- A rotation total variation regularization for full waveform inversion
- Stochastic data-driven Bouligand–Landweber method for solving non-smooth inverse problems
- On determining the fractional exponent of the subdiffusion equation
- Tow-parameter quasi-boundary value method for a backward abstract time-degenerate fractional parabolic problem
- Determining both leading coefficient and source in a nonlocal elliptic equation
- Discrete dynamical systems: Inverse problems and related topics
- Stability estimates for an inverse problem of determining time-dependent coefficients in a system of parabolic equations
- Numerical solutions to inverse nodal problems for the Sturm–Liouville operator and their applications
- Revisiting linear machine learning through the perspective of inverse problems