Estimating adsorption isotherm parameters in chromatography via a virtual injection promoting double feed-forward neural network

Chen Xu; Ye Zhang

doi:10.1515/jiip-2020-0121

Article

Estimating adsorption isotherm parameters in chromatography via a virtual injection promoting double feed-forward neural network

Chen Xu and Ye Zhang

Published/Copyright: January 5, 2022

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Journal of Inverse and Ill-posed Problems Volume 30 Issue 5

Abstract

The means to obtain the adsorption isotherms is a fundamental open problem in competitive chromatography. A modern technique of estimating adsorption isotherms is to solve a nonlinear inverse problem in a partial differential equation so that the simulated batch separation coincides with actual experimental results. However, this identification process is usually ill-posed in the sense that the uniqueness of adsorption isotherms cannot be guaranteed, and moreover, the small noise in the measured response can lead to a large fluctuation in the traditional estimation of adsorption isotherms. The conventional mathematical method of solving this problem is the variational regularization, which is formulated as a non-convex minimization problem with a regularized objective functional. However, in this method, the choice of regularization parameter and the design of a convergent solution algorithm are quite difficult in practice. Moreover, due to the restricted number of injection profiles in experiments, the types of measured data are extremely limited, which may lead to a biased estimation. In order to overcome these difficulties, in this paper, we develop a new inversion method – the virtual injection promoting double feed-forward neural network (VIP-DFNN). In this approach, the training data contain various types of artificial injections and synthetic noisy measurement at outlet, generated by a conventional physics model – a time-dependent convection-diffusion system. Numerical experiments with both artificial and real data from laboratory experiments show that the proposed VIP-DFNN is an efficient and robust algorithm.

Keywords: Chromatography; feed-forward neural network; inverse problem; adsorption isotherm; approximation theorem

MSC 2010: 65L09; 35R30; 76R50

Funding source: National Natural Science Foundation of China

Award Identifier / Grant number: 12171036

Funding source: Beijing Municipal Natural Science Foundation

Award Identifier / Grant number: Z210001

Funding statement: The work of C. Xu is supported by the Shenzhen Stable Support Fund for College Researchers (No. 20200829143245001), while the work of Y. Zhang is supported by the National Natural Science Foundation of China (No. 12171036), the Beijing Natural Science Foundation (key project No. Z210001), the Guangdong Fundamental and Applied Research Fund (No. 2019A1515110971) and the Shenzhen Stable Support Fund for College Researchers (No. 20200827173701001).

A Structure of the feed-forward neural network

This appendix introduces what feed-forward neural network (FNN) is used for, its model structure, and how the model is trained to be valid.

Figure 7

An FNN example. The feed-forward neural network in this example has two hidden layers. The first hidden layer has two nodes and the third one has three nodes. The superscript denotes the hidden layer number.

An FNN contains an input layer that takes in the independent variables (also called features) values, an output layer that produces values for model predictions, and the hidden layers (if any) whose structure is controlled by hyper-parameters set by users. Figure 7 is an example of FNN for a regression problem with two independent variables { X 0 , X 1 } and two dependent variables { Y 0 , Y 1 } , where the following holds.

The value pair { x 0 , x 1 } is a realization of { X 0 , X 1 } and also the input to the model.
The 𝑏 terms (called bias terms) and the weights { w i ⁢ j } are parameters. The node values (except the bias nodes and the ones in the input layer) are transformations of the weighted sums of the outputs from the previous layer, i.e.
a i ⁢ j := σ ⁢ ( w 0 ⁢ i ⁢ b 0 ( j - 1 ) + ∑ k = 1 n j - 1 w k ⁢ i ( k ) ⁢ a k , j - 1 ) , j = 1 , 2 , 3 ,
where 𝜎 is called the activation function which is pre-selected, n j - 1 denotes the number of nodes in the ( j - 1 ) th layer. Further, a i ⁢ j = x i if j = 0 , and a i ⁢ j = o i if j = 3 . Normally, the activation is the same for every node except the ones in the output layer. For classification problems, the activation function for the output layer nodes may be the softmax function, which converts real numbers to probabilities of classes, while for regression problems the activation function for the output layer may just be an identity function.
The 𝑜 terms are outputs from the neural net. They are supposed to match the data for the dependent variables.
Note that this example has only one bias term in each layer (except the input layer). Other neural networks may have one bias associated with each node.

The role of the activation function is to control the amount of contribution made by the corresponding node to the model’s output, and there are different choices for this function. The widely used ones are sigmoid, ReLU, and tanh, each having its own advantages and disadvantages. For example, ReLU is computationally efficient to use and can avoid vanishing gradient, but it may lead to the problem of “dead neuron”, i.e. during the training, when the ReLU activation function is used, if a neuron is not activated in some step, it will never be activated in all the following steps even if it should be activated in the true model.

To find values of the parameters (the weights and bias terms) so that the model’s outputs are close to data of the dependent variables, a process called model training is performed in the following procedures.

Pre-select hyper-parameters such as the loss function and the hidden layer structure.
Assign initial values to the parameters (weights and bias terms).
Compute the partial derivative of the loss with respect to each weight and bias term through the back-propagation algorithm.
Based on the partial derivatives, update the parameter values to decrease the loss value.
Repeat steps (3)–(4) until the loss converges (meaning it does not decrease any more) or reach some threshold.

The number of hidden layers and nodes are treated as hyper-parameters which can be adjusted according to the model’s performance on the validation data. Refer to [19, Chapter 11] for more details on neural network.

References

[1] J. Adler and O. Öktem, Solving ill-posed inverse problems using iterative deep neural networks, Inverse Problems 33 (2017), no. 12, Article ID 124007. 10.1088/1361-6420/aa9581Search in Google Scholar

[2] E. Alfaro, N. Garcia, M. Gamez and D. Elizondo, Bankruptcy forecasting: An empirical comparison of adaboost and neural networks, Decis. Support Syst. 45 (2008), no. 1, 110–122. 10.1016/j.dss.2007.12.002Search in Google Scholar

[3] S. Arridge, P. Maass, O. Öktem and C.-B. Schönlieb, Solving inverse problems using data-driven models, Acta Numer. 28 (2019), 1–174. 10.1017/S0962492919000059Search in Google Scholar

[4] C. M. Bishop, Pattern Recognition and Machine Learning, Springer, New York, 2006. Search in Google Scholar

[5] T. A. Bubba, G. Kutyniok, M. Lassas, M. März, W. Samek, S. Siltanen and V. Srinivasan, Learning the invisible: Ahybrid deep learning–shearlet framework for limited angle computed tomography, Inverse Problems 35 (2019), no. 6, Article ID 064002. 10.1088/1361-6420/ab10caSearch in Google Scholar

[6] X. Cheng, G. Lin, Y. Zhang, R. Gong and M. R. Gulliksson, A modified coupled complex boundary method for an inverse chromatography problem, J. Inverse Ill-Posed Probl. 26 (2018), no. 1, 33–49. 10.1515/jiip-2016-0057Search in Google Scholar

[7] E. De Vito, L. Rosasco, A. Caponnetto, U. De Giovannini and F. Odone, Learning from examples as an inverse problem, J. Mach. Learn. Res. 6 (2005), 883–904. Search in Google Scholar

[8] J. Devlin, M. Chang, K. Lee and K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Vol. 1, Association for Computational Linguistics, Stroudsburg (2016), 4171–4186. Search in Google Scholar

[9] E. V. Dose, S. Jacobson and G. Guiochon, Determination of isotherms from chromatographic peak shapes, Anal. Chem. 63 (1991), no. 8, 833–839. 10.1021/ac00008a020Search in Google Scholar

[10] A. Felinger, D. Zhou and G. Guiochon, Determination of the single component and competitive adsorption isotherms of the 1-indanol enantiomers by the inverse method, J. Chromatography A 1005 (2003), no. 1–2, 35–49. 10.1016/S0021-9673(03)00889-6Search in Google Scholar

[11] D. Fletcher and E. Goss, Forecasting with neural networks, Inform. Manag. 24 (1993), no. 3, 159–167. 10.1016/0378-7206(93)90064-ZSearch in Google Scholar

[12] P. Forssén, R. Arnell and T. Fornstedt, An improved algorithm for solving inverse problems in liquid chromatography, Comput. Chem. Eng. 30 (2006), no. 9, 1381–1391. 10.1016/j.compchemeng.2006.03.004Search in Google Scholar

[13] P. Forssén and T. Fornstedt, A model free method for estimation of complicated adsorption isotherms in liquid chromatography, J. Chromatography A 1409 (2015), 108–115. 10.1016/j.chroma.2015.07.030Search in Google Scholar PubMed

[14] J. Freyberger, A. Neuhierl, M. Weber and A. Karolyi, Dissecting characteristics nonparametrically, Rev. Financial Stud. 33 (2020), no. 5, 2326–2377. 10.3386/w23227Search in Google Scholar

[15] S. Gu, B. Kelly and D. Xiu, Empirical asset pricing via machine learning, Rev. Financial Stud. 33 (2020), no. 5, 2223–2273. 10.3386/w25398Search in Google Scholar

[16] G. Guiochon and B. Lin, Modeling for Preparative Chromatography, Academic Press, New York, 2003. Search in Google Scholar

[17] G. Guiochon, G. Shirazi and M. Katti, Fundamentals of Preparative and Nonlinear Chromatography, 2nd ed., Elsevier, Amsterdam, 2006. 10.1016/B978-012370537-2/50030-8Search in Google Scholar

[18] J. Han, A. Jentzen and E. Weinan, Solving high-dimensional partial differential equations using deep learning, Proc. Natl. Acad. Sci. USA 115 (2018), no. 34, 8505–8510. 10.1073/pnas.1718942115Search in Google Scholar

[19] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York, 2009. 10.1007/978-0-387-84858-7Search in Google Scholar

[20] J. He, L. Li, J. Xu and C. Zheng, Relu deep neural networks and linear finite elements, J. Comput. Math. 38 (2020), no. 3, 502–527. 10.4208/jcm.1901-m2018-0160Search in Google Scholar

[21] J. He and J. Xu, MgNet: A unified framework of multigrid and convolutional neural network, Sci. China Math. 62 (2019), no. 7, 1331–1354. 10.1007/s11425-019-9547-2Search in Google Scholar

[22] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition, IEEE Press, Piscataway (2016), 770–778. 10.1109/CVPR.2016.90Search in Google Scholar

[23] K. Hornik, M. Stinchcombe and H. White, Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks, Neural Netw. 3 (1990), no. 5, 551–560. 10.1016/0893-6080(90)90005-6Search in Google Scholar

[24] F. James and M. Sepúlveda, Parameter identification for a model of chromatographic column, Inverse Problems 10 (1994), no. 6, 1299–1314. 10.1088/0266-5611/10/6/008Search in Google Scholar

[25] G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning: With Applications in R, Springer, New York, 2017. Search in Google Scholar

[26] K. H. Jin, M. T. McCann, E. Froustey and M. Unser, Deep convolutional neural network for inverse problems in imaging, IEEE Trans. Image Process. 26 (2017), no. 9, 4509–4522. 10.1109/TIP.2017.2713099Search in Google Scholar

[27] G. Kapoor and W. Zhou, Detecting evolutionary financial statement fraud, Decis. Support Syst. 50 (2011), no. 3, 570–575. 10.1016/j.dss.2010.08.007Search in Google Scholar

[28] A. S. Leonov, Regularizing functionals of general form for solving ill-posed problems in Lebesgue spaces, Sib. Math. J. 44 (2003), no. 6, 1015–1026. 10.1023/B:SIMJ.0000007477.31754.b6Search in Google Scholar

[29] A. S. Leonov, A posteriori accuracy estimations of solutions to ill-posed inverse problems and extra-optimal regularizing algorithms for their solution, Numer. Anal. Appl. 5 (2012), no. 1, 68–83. 10.1134/S1995423912010077Search in Google Scholar

[30] M. Leshno, V. Lin, A. Pinkus and S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function, Neural Netw. 6 (1993), 861–867. 10.1016/S0893-6080(05)80131-5Search in Google Scholar

[31] H. Li, J. Schwab, S. Antholzer and M. Haltmeier, NETT: Solving inverse problems with deep neural networks, Inverse Problems 36 (2020), no. 6, Article ID 065005. 10.1088/1361-6420/ab6d57Search in Google Scholar

[32] Q. Li, L. Chen, C. Tai and W. E, Maximum principle based algorithms for deep learning, J. Mach. Learn. Res. 18 (2017), Paper No. 165. Search in Google Scholar

[33] G. Lin, Y. Zhang, X. Cheng, M. Gulliksson, P. Forssén and T. Fornstedt, A regularizing Kohn–Vogelius formulation for the model-free adsorption isotherm estimation problem in chromatography, Appl. Anal. 97 (2018), no. 1, 13–40. 10.1080/00036811.2017.1284311Search in Google Scholar

[34] O. Lisec, P. Hugo and A. Seidel-Morgenstern, Frontal analysis method to determine competitive adsorption isotherms, J. Chromatography A 908 (2001), no. 1–2, 19–34. 10.1016/S0021-9673(00)00966-3Search in Google Scholar

[35] A. Lucas, M. Iliadis, R. Molina and A. K. Katsaggelos, Using deep neural networks for inverse problems in imaging: Beyond analytical methods, IEEE Signal Proc. Mag. 35 (2018), no. 1, 20–36. 10.1109/MSP.2017.2760358Search in Google Scholar

[36] D. Lukyanenko, T. Yeleskina, I. Prigorniy, T. Isaev, A. Borzunov and M. Shishlenin, Inverse problem of recovering the initial condition for a nonlinear equation of the reaction-diffusion-advection type by data given on the position of a reaction front with a time delay, Mathematics 9 (2021), Article ID 342, 10.3390/math9040342. 10.3390/math9040342Search in Google Scholar

[37] J. Morshed and J. J. Kaluarachchi, Parameter estimation using artificial neural network and genetic algorithm for free-product migration and recovery, Water Resources Res. 34 (1998), no. 5, 1101–1113. 10.1029/98WR00006Search in Google Scholar

[38] A. Pinkus, Approximation theory of the MLP model in neural networks, Acta Numer. 8 (1999), 143–195. 10.1017/S0962492900002919Search in Google Scholar

[39] M. Raissi, P. Perdikaris and G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, J. Comput. Phys. 378 (2019), 686–707. 10.1016/j.jcp.2018.10.045Search in Google Scholar

[40] J. Schwab, S. Antholzer and M. Haltmeier, Deep null space learning for inverse problems: Convergence analysis and rates, Inverse Problems 35 (2019), no. 2, Article ID 025008. 10.1088/1361-6420/aaf14aSearch in Google Scholar

[41] H. C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura and R. M. Summers, Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning, IEEE Trans. Medical Imag. 35 (2016), no. 5, 1285–1298. 10.1109/TMI.2016.2528162Search in Google Scholar

[42] D. Svozil, V. Kvasnicka and J. Pospichal, Introduction to multi-layer feed-forward neural network, Chemometrics Intell. Lab. Syst. 39 (1997), no. 1, 43–62. 10.1016/S0169-7439(97)00061-0Search in Google Scholar

[43] A. N. Tikhonov, A. S. Leonov and A. G. Yagola, Nonlinear Ill-Posed Problems. Vol. 1, 2, Chapman & Hall, London, 1998. 10.1007/978-94-017-5167-4_1Search in Google Scholar

[44] A. Vaswani, N. Shazeer, N. Parmar, I. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin, Attention is all you need, Proceedings of the 31st International Conference on Neural Information Processing, ACM, New York (2017), 6000–6010. Search in Google Scholar

[45] Y. Zhang, P. Forssén, T. Fornstedt, M. Gulliksson and X. Dai, An adaptive regularization algorithm for recovering the rate constant distribution from biosensor data, Inverse Probl. Sci. Eng. 26 (2018), no. 10, 1464–1489. 10.1080/17415977.2017.1411912Search in Google Scholar

[46] Y. Zhang, G. Lin, M. Gulliksson, P. Forssén, T. Fornstedt and X. Cheng, An adjoint method in inverse problems of chromatography, Inverse Probl. Sci. Eng. 25 (2017), no. 8, 1112–1137. 10.1080/17415977.2016.1222528Search in Google Scholar

[47] Y. Zhang, G.-L. Lin, P. Forssén, M. R. Gulliksson, T. Fornstedt and X.-L. Cheng, A regularization method for the reconstruction of adsorption isotherms in liquid chromatography, Inverse Problems 32 (2016), no. 10, Article ID 105005. 10.1088/0266-5611/32/10/105005Search in Google Scholar

Received: 2020-08-28

Revised: 2021-04-02

Accepted: 2021-11-16

Published Online: 2022-01-05

Published in Print: 2022-10-01

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/jiip-2020-0121

Keywords for this article

Chromatography; feed-forward neural network; inverse problem; adsorption isotherm; approximation theorem