Home Mathematics Revisiting linear machine learning through the perspective of inverse problems
Article
Licensed
Unlicensed Requires Authentication

Revisiting linear machine learning through the perspective of inverse problems

  • Shuang Liu , Sergey Kabanikhin ORCID logo , Sergei Strijhak ORCID logo EMAIL logo , Ying-Ao Wang ORCID logo and Ye Zhang ORCID logo
Published/Copyright: March 28, 2025

Abstract

In this paper, we revisit Linear Neural Networks (LNNs) with single-output neurons performing linear operations. The study focuses on constructing an optimal regularized weight matrix Q from training pairs { G , H } , reformulating the LNNs framework as matrix equations, and addressing it as a linear inverse problem. The ill-posedness of linear machine learning problems is analyzed through the lens of inverse problems. Furthermore, classical and modern regularization techniques from both the machine learning and inverse problems communities are reviewed. The effectiveness of LNNs is demonstrated through a real-world application in blood test classification, highlighting their practical value in solving real-life problems.

Award Identifier / Grant number: 2022YFC3310300

Award Identifier / Grant number: 12171036

Funding statement: This work was supported by the National Key Research and Development Program of China (Grant No. 2022YFC3310300), the Beijing Natural Science Foundation (Grant No. Z210001), the Shenzhen Sci-Tech Fund (Grant No. RCJC20231211090030059), the National Natural Science Foundation of China (Grant No. 12171036) for Zhang Ye and the China Scholarship Council (Grant No. 202008230141) for Liu Shuang. The authors also acknowledge the support of the state assignment of Sobolev Institute of Mathematics (Theme 2024-FWNF-2024-0001 – “Theory, methods, and applications of inverse problems and statistical analysis”) for Sergey Kabanikhin, and a grant for research centers in the field of artificial intelligence (Grant No. 000000D730321P5Q0002) for Sergei Strijhak.

References

[1] H. Abdi, A neural network primer, J. Biol. Syst. 2 (1994), no. 3, 247–281. 10.1142/S0218339094000179Search in Google Scholar

[2] E. Alpaydin, Machine Learning: The New AI, MIT, Cambridge, 2021. Search in Google Scholar

[3] A. B. Bakushinsky, M. Y. Kokurin and A. Smirnova, Iterative Methods for Ill-Posed Problems, Inverse Ill-posed Probl. Ser. 54, Walter de Gruyter, Berlin, 2011. 10.1515/9783110250657Search in Google Scholar

[4] Y. Bengio, Learning Deep Architectures for AI, Now Publishers, Hannover, 2009. 10.1561/9781601982957Search in Google Scholar

[5] N. Bjorck, C. P. Gomes, B. Selman and K. Q. Weinberger, Understanding batch normalization, Advances in Neural Information Processing Systems. Vol. 31, Curran Associates, Red Hook (2018), 1–12. Search in Google Scholar

[6] M. A. Botchev, S. I. Kabanikhin, M. A. Shishlenin and E. E. Tyrtyshnikov, The Cauchy problem for the 3D Poisson equation: Landweber iteration vs. horizontally diagonalize and fit method, J. Inverse Ill-Posed Probl. 31 (2023), no. 2, 203–221. 10.1515/jiip-2022-0092Search in Google Scholar

[7] L. Bottou, Large-scale machine learning with stochastic gradient descent, Proceedings of COMPSTAT’2010, Springer, Heidelberg (2010), 177–186. 10.1007/978-3-7908-2604-3_16Search in Google Scholar

[8] L. Bottou, Stochastic gradient descent tricks, Neural Networks: Tricks of the Trade, Springer Berlin (2012), 421–436. 10.1007/978-3-642-35289-8_25Search in Google Scholar

[9] P. Bühlmann and S. Van De Geer, Statistics for High-Dimensional Data, Springer Ser. Statist., Springer, Heidelberg, 2011. 10.1007/978-3-642-20192-9Search in Google Scholar

[10] S. Chatterjee and A. S. Hadi, Regression Analysis by Example, John Wiley & Sons, Hoboken, 2015. Search in Google Scholar

[11] D. Chicco, M. J. Warrens and G. Jurman, The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation, PeerJ Comput. Sci. 7 (2021), Paper No. e623. 10.7717/peerj-cs.623Search in Google Scholar PubMed PubMed Central

[12] K. R. Chowdhary, Fundamentals of Artificial Intelligence, Springer, New Delhi, 2020. 10.1007/978-81-322-3972-7Search in Google Scholar

[13] J. W. Demmel, Applied Numerical Linear Algebra, Society for Industrial and Applied Mathematics, Philadelphia, 1997. 10.1137/1.9781611971446Search in Google Scholar

[14] R. C. Deo, Machine learning in medicine, Circulation 132 (2015), no. 20, 1920–1930. 10.1161/CIRCULATIONAHA.115.001593Search in Google Scholar PubMed PubMed Central

[15] J. Duchi, E. Hazan and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res. 12 (2011), 2121–2159. Search in Google Scholar

[16] S. Dzhumaev, Approximate calculation of a pseudosolution, Dokl. Akad. Nauk Tadzhik. SSR 25 (1982), no. 10, 584–587. Search in Google Scholar

[17] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau and S. Thrun, Dermatologist-level classification of skin cancer with deep neural networks, Nature 542 (2017), no. 7639, 115–118. 10.1038/nature21056Search in Google Scholar PubMed PubMed Central

[18] A. T. Fam, MFIR filters: Properties and applications, IEEE Trans. Acoust. Speech Signal Process. 29 (1981), no. 6, 1128–1136. 10.1109/TASSP.1981.1163696Search in Google Scholar

[19] J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc. 96 (2001), no. 456, 1348–1360. 10.1198/016214501753382273Search in Google Scholar

[20] J. Friedman, T. Hastie and R. Tibshirani, Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw. 33 (2010), no. 1, 1–20. 10.18637/jss.v033.i01Search in Google Scholar

[21] K. Fukumizu, Special statistical properties of neural network learning, Proceedings of 1997 International Symposium on Nonlinear Theory and its Applications. Vol. 2, IEICE, Tokyo (1997), 747–750. Search in Google Scholar

[22] Y. Gal and Z. Ghahramani, Dropout as a bayesian approximation: Representing model uncertainty in deep learning, Proc. Mach. Learn. Res. (PMLR) 48 (2016), 1050–1059. Search in Google Scholar

[23] E. Gately, Neural Networks for Financial Forecasting, John Wiley & Sons, Hoboken, 1995. Search in Google Scholar

[24] F. Girosi, M. Jones and T. Poggio, Regularization theory and neural networks architectures, Neural Comput. 7 (1995), no. 2, 219–269. 10.1162/neco.1995.7.2.219Search in Google Scholar

[25] I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, Adapt. Comput. Mach. Learn., MIT Press, Cambridge, 2016. Search in Google Scholar

[26] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning, Springer Ser. Statist., Springer, New York, 2009. 10.1007/978-0-387-84858-7Search in Google Scholar

[27] T. Hastie, R. Tibshirani and M. Wainwright, Statistical Learning with Sparsity, Monogr. Statist. Appl. Probab. 143, CRC Press, Boca Raton, 2015. 10.1201/b18401Search in Google Scholar

[28] S. Haykin, Neural Networks and Learning Machines, Pearson Education India, New York, 2009. Search in Google Scholar

[29] K. He, X. Zhang, S. Ren and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, Proceedings of the IEEE International Conference on Computer Vision, IEEE Press, Piscataway (2015), 1026–1034. 10.1109/ICCV.2015.123Search in Google Scholar

[30] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE Press, Piscataway (2016), 770–778. 10.1109/CVPR.2016.90Search in Google Scholar

[31] A. E. Hoerl and R. W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems, Technometrics 12 (1970), no. 1, 55–67. 10.1080/00401706.1970.10488634Search in Google Scholar

[32] S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, Proc. Mach. Learn. Res. (PMLR) 37 (2015), 448–456. Search in Google Scholar

[33] V. K. Ivanov, V. V. Vasin and V. P. Tanana, Inverse Ill-posed Probl. Ser., VSP, Utrecht, 2002. Search in Google Scholar

[34] M. I. Jordan and T. M. Mitchell, Machine learning: Trends, perspectives, and prospects, Science 349 (2015), no. 6245, 255–260. 10.1126/science.aaa8415Search in Google Scholar PubMed

[35] S. I. Kabanikhin, Siberian IT-industry artificial intelligence, Sci. Tech. Sib. 10 (2023), 9–15. Search in Google Scholar

[36] S. I. Kabanikhin, M. A. Shishlenin, D. B. Nurseitov, A. T. Nurseitova and S. E. Kasenov, Comparative analysis of methods for regularizing an initial boundary value problem for the Helmholtz equation, J. Appl. Math. 2014 (2014), Article ID 786326. 10.1155/2014/786326Search in Google Scholar

[37] B. Kaltenbacher, A. Neubauer and O. Scherzer, Iterative Regularization Methods for Nonlinear Ill-Posed Problems, Radon Ser. Comput. Appl. Math. 6, Walter de Gruyter, Berlin, 2008. 10.1515/9783110208276Search in Google Scholar

[38] I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas and I. Chouvarda, Machine learning and data mining methods in diabetes research, Comput. Struct. Biotechnol. J. 15 (2017), 104–116. 10.1016/j.csbj.2016.12.005Search in Google Scholar PubMed PubMed Central

[39] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, Proceedings of the 3-th International Conference on Learning Representations, ICLR, preprint (2015), 1–15. Search in Google Scholar

[40] I. Kononenko, Machine learning for medical diagnosis: History, state of the art and perspective, Artif. Intell. Med. 23 (2001), no. 1, 89–109. 10.1016/S0933-3657(01)00077-XSearch in Google Scholar PubMed

[41] K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V. Karamouzis and D. I. Fotiadis, Machine learning applications in cancer prognosis and prediction, Comput. Struct. Biotechnol. J. 13 (2015), 8–17. 10.1016/j.csbj.2014.11.005Search in Google Scholar PubMed PubMed Central

[42] A. Krizhevsky, I. Sutskever and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems. Vol. 25, Curran Associates, Red Hook (2012), 1–9. Search in Google Scholar

[43] M. M. Lavrentiev, Integral equations of the first kind, Dokl. Akad. Nauk SSSR 127 (1959), 31–33. Search in Google Scholar

[44] Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86 (1998), no. 11, 2278–2324. 10.1109/5.726791Search in Google Scholar

[45] R. W. B. Lewis, The American Adam, University of Chicago, Chicago, 2009. Search in Google Scholar

[46] X. Liang, L. Wu, J. Li, Y. Wang, Q. Meng, T. Qin, W. Chen, M. Zhang and T. Liu, R-drop: Regularized dropout for neural networks, Advances in Neural Information Processing Systems. Vol. 34, Curran Associates, Red Hook (2021), 10890–10905. Search in Google Scholar

[47] M. W. Libbrecht and W. S. Noble, Machine learning applications in genetics and genomics, Nat. Rev. Genet. 16 (2015), no. 6, 321–332. 10.1038/nrg3920Search in Google Scholar PubMed PubMed Central

[48] H. Long, Y. Zhang and G. Gao, An accelerated inexact Newton regularization scheme with a learned feature-selection rule for non-linear inverse problems, Inverse Problems 40 (2024), no. 8, Article ID 085011. 10.1088/1361-6420/ad5e19Search in Google Scholar

[49] D. Lorenz and N. Worliczek, Necessary conditions for variational regularization schemes, Inverse Problems 29 (2013), no. 7, Article ID 075016. 10.1088/0266-5611/29/7/075016Search in Google Scholar

[50] A. Mikołajczyk and M. Grochowski, Data augmentation for improving deep learning in image classification problem, 2018 International Interdisciplinary PhD Workshop, IEEE Press, Piscataway (2018), 117–122. 10.1109/IIPHDW.2018.8388338Search in Google Scholar

[51] T. M. Mitchell, Machine Learning, McGraw-Hill, New York, 1997. Search in Google Scholar

[52] S. K. Mitra and C. R. Rao, Projections under seminorms and generalized Moore Penrose inverses, Linear Algebra Appl. 9 (1974), 155–167. 10.1016/0024-3795(74)90034-2Search in Google Scholar

[53] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland and G. Ostrovski, Human-level control through deep reinforcement learning, Nature 518 (2015), no. 7540, 529–533. 10.1038/nature14236Search in Google Scholar PubMed

[54] V. A. Morozov, Pseudo-solutions, USSR Comput. Math. Math. Phys. 9 (1969), no. 6, 196–203. 10.1016/0041-5553(69)90136-0Search in Google Scholar

[55] V. A. Morozov and A. B. Nazimov, The necessary and sufficient conditions of regularizability of degenerate sets of linear algebraic equations using the shift method, USSR Comput. Math. Math. Phys. 26 (1986), no. 5, 1–6. 10.1016/0041-5553(86)90033-9Search in Google Scholar

[56] A. Y. Ng, Feature selection, L1 vs L2 regularization, and rotational invariance, Proceedings of the Twenty-First International Conference on Machine Learning, Association for Computing Machinery, New York (2004), 78–78. Search in Google Scholar

[57] A. Nguyen, J. Yosinski and J. Clune, Deep neural networks are easily fooled: High confidence predictions for unrecognizable images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE Press, Piscataway (2015), 427–436. 10.1109/CVPR.2015.7298640Search in Google Scholar

[58] E. Oja, Principal components, minor components, and linear neural networks, Neural Netw. 5 (1992), no. 6, 927–935. 10.1016/S0893-6080(05)80089-9Search in Google Scholar

[59] M. A. Olshanskii and E. E. Tyrtyshnikov, Iterative Methods for Linear Systems, Society for Industrial and Applied Mathematics, Philadelphia, 2014. 10.1137/1.9781611973464Search in Google Scholar

[60] D. S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk and Q. V. Le, Specaugment: A simple data augmentation method for automatic speech recognition, Interspeech 2019, ISCA, Graz (2019), 2613–2617. 10.21437/Interspeech.2019-2680Search in Google Scholar

[61] M. Y. Park and T. Hastie, L 1 -regularization path algorithm for generalized linear models, J. R. Stat. Soc. Ser. B Stat. Methodol. 69 (2007), no. 4, 659–677. 10.1111/j.1467-9868.2007.00607.xSearch in Google Scholar

[62] L. Prechelt, Early stopping-but when?, Neural Networks: Tricks of the Trade, Springer, Berlin (2002), 55–69. 10.1007/3-540-49430-8_3Search in Google Scholar

[63] A. Y. Prikhodko, M. A. Shishlenin, N. S. Novikov and D. V. Klyuchinskiy, Encoder neural network in 2D acoustic tomography, Appl. Comput. Math. 23 (2024), no. 1, 83–98. Search in Google Scholar

[64] Y. Qin, M. Huo, X. Liu and S. C. Li, Biomarkers and computational models for predicting efficacy to tumor ICI immunotherapy, Front. Immunol. 15 (2024), Article ID 1368749. 10.3389/fimmu.2024.1368749Search in Google Scholar PubMed PubMed Central

[65] S. J. Reddi, S. Kale and S. Kumar, On the convergence of adam and beyond, Proceedings of the 6th International Conference on Learning Representations, ICLR, Vancouver (2019), 1–23. Search in Google Scholar

[66] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, You only look once: Unified, real-time object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE Press, Piscataway (2016), 779–788. 10.1109/CVPR.2016.91Search in Google Scholar

[67] I. H. Sarker, Machine learning: Algorithms, real-world applications and research directions, SN Comput. Sci. 2 (2021), no. 3, Paper No. 160. 10.1007/s42979-021-00592-xSearch in Google Scholar PubMed PubMed Central

[68] S. Sharma, S. Sharma and A. Athaiya, Activation functions in neural networks, Int. J. Eng. Appl. Sci. Technol. 6 (2017), no. 12, 310–316. 10.33564/IJEAST.2020.v04i12.054Search in Google Scholar

[69] C. Shorten and T. M. Khoshgoftaar, A survey on image data augmentation for deep learning, J. Big Data. 6 (2019), no. 1, 1–48. 10.1186/s40537-019-0197-0Search in Google Scholar

[70] A. H. Shurrab and A. Y. Maghari, Blood diseases detection using data mining techniques, Proceedings of the 2017 8th International Conference on Information Technology, IEEE Press, Piscataway (2017), 625–631. 10.1109/ICITECH.2017.8079917Search in Google Scholar

[71] S. S. Shwartz and S. B. David, Understanding Machine Learning: From Theory to Algorithms, Cambridge University, New York, 2014. Search in Google Scholar

[72] P. Y. Simard, D. Steinkraus and J. C. Platt, Best practices for convolutional neural networks applied to visual document analysis, Seventh International Conference on Document Analysis and Recognition, IEEE Press, Piscataway (2003), 958–963. 10.1109/ICDAR.2003.1227801Search in Google Scholar

[73] Y. Y. Song and L. Ying, Decision tree methods: Applications for classification and prediction, Shanghai Arch. Psychiatry 27 (2015), no. 2, 130–135. Search in Google Scholar

[74] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15 (2014), 1929–1958. Search in Google Scholar

[75] J. A. Suykens, J. P. Vandewalle and B. L. De Moor, Artificial Neural Networks for Modelling and Control of Non-Linear Systems, Springer, Berlin, 2012. Search in Google Scholar

[76] U. Tautenhahn, On the asymptotical regularization of nonlinear ill-posed problems, Inverse Problems 10 (1994), no. 6, 1405–1418. 10.1088/0266-5611/10/6/014Search in Google Scholar

[77] R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B 58 (1996), no. 1, 267–288. 10.1111/j.2517-6161.1996.tb02080.xSearch in Google Scholar

[78] A. N. Tihonov, On the solution of ill-posed problems and the method of regularization, Dokl. Akad. Nauk SSSR 151 (1963), 501–504. Search in Google Scholar

[79] L. N. Trefethen and D. Bau, III, Numerical Linear Algebra, Society for Industrial and Applied Mathematics, Philadelphia, 2022. Search in Google Scholar

[80] G. M. Vainikko, Error estimates of the successive approximation method for ill-posed problems, Avtomat. i Telemekh. (1980), no. 3, 84–92. Search in Google Scholar

[81] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin, Attention is all you need, Advances in Neural Information Processing Systems. Vol. 30, Curran Associates, Red Hook (2017), 1–11. Search in Google Scholar

[82] J. Wei and K. Zou, Eda: Easy data augmentation techniques for boosting performance on text classification tasks, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Association for Computational Linguistics, Hong Kong (2019), 6382–6388. 10.18653/v1/D19-1670Search in Google Scholar

[83] P. J. Werbos, Backpropagation through time: what it does and how to do it, Proc. IEEE 78 (1990), no. 10, 1550–1560. 10.1109/5.58337Search in Google Scholar

[84] C. J. Willmott and K. Matsuura, Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance, Clim. Res. 30 (2005), no. 1, 79–82. 10.3354/cr030079Search in Google Scholar

[85] C. Xu and Y. Zhang, Estimating adsorption isotherm parameters in chromatography via a virtual injection promoting double feed-forward neural network, J. Inverse Ill-Posed Probl. 30 (2022), no. 5, 693–712. 10.1515/jiip-2020-0121Search in Google Scholar

[86] J. Yang, C. Xu and Y. Zhang, Reconstruction of the s-wave velocity via mixture density networks with a new rayleigh wave dispersion function, IEEE Trans. Geosci. Remote Sens. 60 (2022), 1–13. 10.1109/TGRS.2022.3169236Search in Google Scholar

[87] Y. Zhang, On the acceleration of optimal regularization algorithms for linear ill-posed inverse problems, Calcolo 60 (2023), no. 1, Paper No. 6. 10.1007/s10092-022-00501-5Search in Google Scholar

[88] Y. Zhang and C. Chen, Stochastic asymptotical regularization for linear inverse problems, Inverse Problems 39 (2023), no. 1, Article ID 015007. 10.1088/1361-6420/aca70fSearch in Google Scholar

[89] Y. Zhang and C. Chen, Stochastic linear regularization methods: Random discrepancy principle and applications, Inverse Problems 40 (2024), no. 2, Article ID 025007. 10.1088/1361-6420/ad149eSearch in Google Scholar

[90] Y. Zhang and B. Hofmann, On fractional asymptotical regularization of linear ill-posed problems in Hilbert spaces, Fract. Calc. Appl. Anal. 22 (2019), no. 3, 699–721. 10.1515/fca-2019-0039Search in Google Scholar

[91] Y. Zhang and B. Hofmann, On the second-order asymptotical regularization of linear ill-posed inverse problems, Appl. Anal. 99 (2020), no. 6, 1000–1025. 10.1080/00036811.2018.1517412Search in Google Scholar

[92] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005), no. 2, 301–320. 10.1111/j.1467-9868.2005.00503.xSearch in Google Scholar

Received: 2025-02-02
Revised: 2025-02-12
Accepted: 2025-02-14
Published Online: 2025-03-28
Published in Print: 2025-04-01

© 2025 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 29.1.2026 from https://www.degruyterbrill.com/document/doi/10.1515/jiip-2025-0010/html
Scroll to top button