Startseite Mathematik Information content in data sets: A review of methods for interrogation and model comparison
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Information content in data sets: A review of methods for interrogation and model comparison

  • H. Thomas Banks EMAIL logo und Michele L. Joyner
Veröffentlicht/Copyright: 24. Januar 2018

Abstract

In this review we discuss methodology to ascertain the amount of information in given data sets with respect to determination of model parameters with desired levels of uncertainty. We do this in the context of least squares (ordinary, weighted, iterative reweighted weighted or “generalized”, etc.) based inverse problem formulations. The ideas are illustrated with several examples of interest in the biological and environmental sciences.

Funding statement: This research was supported in part by the National Institute on Alcohol Abuse and Alcoholism under grant number 1R01AA022714-01A1, and in part by the Air Force Office of Scientific Research under grant number AFOSR FA9550-15-1-0298.

References

[1] B. M. Adams, H. T. Banks, M. Davidian and E. S. Rosenberg, Model fitting and prediction with HIV treatment interruption data, CRSC-TR05-40, North Carolina State University, Raleigh, 2005; Bull. Math. Biology 69 (2007), 563–584. Suche in Google Scholar

[2] K. Adoteye, H. T. Banks, K. Cross, S. Eytcheson, K. B. Flores, G. A. LeBlanc, T. Nguyen, C. Ross, E. Smith, M. Stemkovski and S. Stokely, Statistical validation of structured population models for Daphnia magna, Math. Biosci. 266 (2015), 73–84. 10.1016/j.mbs.2015.06.003Suche in Google Scholar

[3] A. Aguzzi and M. Polymenidou, Mammalian prion biology: One century of evolving concepts, Cell 116 (2004), 313–327. 10.1016/S0092-8674(03)01031-6Suche in Google Scholar

[4] A. Alexanderian, J. Winokur, I. Sraj, M. Iskandarani, A. Srinivasan, W. C. Thacker and O. M. Knio, Global sensitivity analysis in an ocean general circulation model: A sparse spectral projection approach, Comput. Geosci. 16 (2012), 757–778. 10.1007/s10596-012-9286-2Suche in Google Scholar

[5] H. T. Banks, Modeling and Control in the Biomedical Sciences, Lecture Notes in Biomath. 6, Springer, Berlin, 1975. 10.1007/978-3-642-66207-2Suche in Google Scholar

[6] H. T. Banks, J. E. Banks, K. Link, J. A. Rosenheim, C. Ross and K. A. Tillman, Model comparison tests to determine data information content, CRSC-TR14-13, North Carolina State University, Raleigh, 2014; Appl. Math. Lett. 43 (2015), 10–18. Suche in Google Scholar

[7] H. T. Banks, R. Baraldi, K. Cross, K. Flores, C. McChesney, L. Poag and E. Thorpe, Uncertainty quantification in modeling HIV viral mechanics, CRSC-TR13-16, North Carolina State University, Raleigh, 2013; Math. Biosci. Engr. 12 (2015), 937–964. Suche in Google Scholar

[8] H. T. Banks, A. Choi, T. Huffman, J. Nardini, L. Poag and W. C. Thompson, Quantifying CFSE label decay in flow cytometry data, Appl. Math. Lett. 26 (2013), no. 5, 571–577. 10.1016/j.aml.2012.12.010Suche in Google Scholar PubMed PubMed Central

[9] H. T. Banks, A. Cintron-Arias and F. Kappel, Parameter selection methods in inverse problem formulation, CRSC-TR10-03, North Carolina State University, Raleigh, 2010; in: Mathematical Modeling and Validation in Physiology: Application to the Cardiovascular and Respiratory Systems, Lecture Notes in Math. 2064, Springer, Berlin (2013), 43–73. Suche in Google Scholar

[10] H. T. Banks, E. Collins, K. Flores, P. Pershad, M. Stemkovski and L. Stephenson, Statistical error model comparison for logistic growth of green algae (Raphidocelis subcapitata), Appl. Math. Lett. 64 (2017), 213–222. 10.1016/j.aml.2016.09.006Suche in Google Scholar

[11] H. T. Banks, M. Davidian, S. Hu, G. M. Kepler and E. S. Rosenberg, Modelling HIV immune response and validation with clinical data, J. Biol. Dyn. 2 (2008), no. 4, 357–385. 10.1080/17513750701813184Suche in Google Scholar PubMed PubMed Central

[12] H. T. Banks, M. Doumic and C. Kruse, Efficient numerical schemes for Nucleation-Aggregation models: Early steps, CRSC-TR14-01, North Carolina State University, Raleigh, 2014. Suche in Google Scholar

[13] H. T. Banks, M. Doumic, C. Kruse, S. Prigent and H. Rezaei, Information content in data sets for a nucleated-polymerization model, CRSC-TR14-15, North Carolina State University, Raleigh, 2014; J. Biological Dynam. 9 (2015), 172–197. Suche in Google Scholar

[14] H. T. Banks and B. G. Fitzpatrick, Statistical methods for model comparison in parameter estimation problems for distributed systems, J. Math. Biol. 28 (1990), no. 5, 501–527. 10.1007/BF00164161Suche in Google Scholar

[15] H. T. Banks, S. Hu, Z. R. Kenz, C. Kruse, S. Shaw, J. R. Whiteman, M. P. Brewin, S. E. Greenwald and M. J. Birch, Material parameter estimation and hypothesis testing on a 1D viscoelastic stenosis model: Methodology, J. Inverse Ill-Posed Probl. 21 (2013), no. 1, 25–57. 10.1515/jip-2012-0081Suche in Google Scholar

[16] H. T. Banks, S. Hu, K. Link, E. S. Rosenberg, S. Mitsuma and L. Rosario, Modeling immune response to BK virus infection and donor kidney in renal transplant recipients, CRSC-TR14-09, North Carolina State University, Raleigh, 2014; Inverse Probl. Sci. Eng. 24 (2016), 127–152. Suche in Google Scholar

[17] H. T. Banks, S. Hu and W. C. Thompson, Modeling and Inverse Problems in the Presence of Uncertainty, Monogr. Research Notes Math., CRC Press, Boca Raton, 2014. 10.1201/b16760Suche in Google Scholar

[18] H. T. Banks and M. L. Joyner, AIC under the framework of least squares estimation, CRSC-TR17-09, North Carolina State University, Raleigh, 2017; Appl. Math. Lett. 74 (2017), 33–45. Suche in Google Scholar

[19] H. T. Banks and P. A. Kareiva, Parameter estimation techniques for transport equations with application to population dispersal and tissue bulk flow models, J. Math. Biol. 17 (1983), no. 3, 253–273. 10.21236/ADA120394Suche in Google Scholar

[20] H. T. Banks, P. M. Kareiva and P. K. Lamm, Modeling insect dispersal and estimating parameters when mark-release techniques may cause initial disturbances, J. Math. Biol. 22 (1985), no. 3, 259–277. 10.1007/BF00276485Suche in Google Scholar

[21] H. T. Banks, P. M. Kareiva and K. Murphy, Parameter estimation techniques for interaction and redistribution models: A predator-prey example, Oecologia 74 (1987), 356–362. 10.1007/BF00378930Suche in Google Scholar PubMed

[22] H. T. Banks, Z. R. Kenz and W. C. Thompson, An extension of RSS-based model comparison tests for weighted least squares, Int. J. Pure Appl. Math. 79 (2012), 155–183. 10.21236/ADA568205Suche in Google Scholar

[23] H. T. Banks and K. Kunisch, Estimation Techniques for Distributed Parameter Systems, Systems Control Found. Appl. 1, Birkhäuser, Boston, 1989. 10.1007/978-1-4612-3700-6Suche in Google Scholar

[24] H. T. Banks and K. L. Rehm, Experimental design for vector output systems, CRSC-TR12-11, North Carolina State University, Raleigh, 2012; Inverse Problems in Sci. and Engr. 22 (2014), 557–590. Suche in Google Scholar

[25] H. T. Banks, K. L. Sutton, W. C. Thompson, G. Bocharov, M. Doumic, T. Schenkel, J. Argilaguet, S. Giest, C. Peligero and A. Meyerhans, A new model for the estimation of cell proliferation dynamics using CFSE data, J. Immunol. Meth. 373 (2011), 143–160. 10.1016/j.jim.2011.08.014Suche in Google Scholar PubMed PubMed Central

[26] H. T. Banks, K. L. Sutton, W. C. Thompson, G. Bocharov, D. Roose, T. Schenkel and A. Meyerhans, Estimation of cell proliferation dynamics using CFSE data, Bull. Math. Biol. 73 (2011), no. 1, 116–150. 10.1007/s11538-010-9524-5Suche in Google Scholar PubMed PubMed Central

[27] H. T. Banks and W. C. Thompson, Mathematical models of dividing cell populations: application to CFSE data, Math. Model. Nat. Phenom. 7 (2012), no. 5, 24–52. 10.1051/mmnp/20127504Suche in Google Scholar

[28] H. T. Banks and H. T. Tran, Mathematical and Experimental Modeling of Physical and Biological Processes, Textb. Math., CRC Press, Boca Raton, 2009. 10.1201/b17175Suche in Google Scholar

[29] E. J. Bedrick and C. L. Tsai, Model selection for multivariate regression in small samples, Biometrics 50 (1994), 226–231. 10.2307/2533213Suche in Google Scholar

[30] H. Bozdogan, Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions, Psychometrika 52 (1987), no. 3, 345–370. 10.1007/BF02294361Suche in Google Scholar

[31] H. Bozdogan, Akaike’s information criterion and recent developments in information complexity, J. Math. Psych. 44 (2000), no. 1, 62–91. 10.1006/jmps.1999.1277Suche in Google Scholar PubMed

[32] K. P. Burnham and D. R. Anderson, Model Selection and Multimodel Inference. A Practical Information-Theoretic Approach, 2nd ed., Springer, New York, 2002. Suche in Google Scholar

[33] K. P. Burnham and D. R. Anderson, Multimodel inference: Understanding AIC and BIC in model selection, Sociol. Methods Res. 33 (2004), no. 2, 261–304. 10.1177/0049124104268644Suche in Google Scholar

[34] V. Calvez, N. Lenuzza, M. Doumic, J.-P. Deslys, F. Mouthon and B. Perthame, Prion dynamics with size dependency–strain phenomena, J. Biol. Dyn. 4 (2010), no. 1, 28–42. 10.1080/17513750902935208Suche in Google Scholar PubMed

[35] R. J. Carroll and D. Ruppert, Transformation and Weighting in Regression, Monogr. Statist. Appl. Probab., Chapman and Hall, New York, 1988. 10.1007/978-1-4899-2873-3Suche in Google Scholar

[36] R. J. Carroll, C.-F. J. Wu and D. Ruppert, The effect of estimating weights in weighted least squares, J. Amer. Statist. Assoc. 83 (1988), no. 404, 1045–1054. 10.1080/01621459.1988.10478699Suche in Google Scholar

[37] J. Collinge, Prion diseases of humans and animals: Their causes and molecular basis, Annu. Rev. Neurosci. 24 (2001), 519–550. 10.1146/annurev.neuro.24.1.519Suche in Google Scholar PubMed

[38] M. Davidian, Nonlinear models for univariate and multivariate response, ST 762 Lecture Notes, Chapters 2, 3, 9 and 11, 2007, http://www4.stat.ncsu.edu/~davidian/courses.html. Suche in Google Scholar

[39] M. Davidian and D. M. Giltinan, Nonlinear Models for Repeated Measurement Data, Chapman and Hall, London, 2000. Suche in Google Scholar

[40] W. H. Day, C. R. Baird and S. R. Shaw, New native species of peristenus parasitizing Lygus hesperus in Idaho: Biology, importance and description, Ann. Entomol. Soc. Amer. 92 (1999), no. 3, 370–375. 10.1093/aesa/92.3.370Suche in Google Scholar

[41] G. de Vries, T. Hillen, M. Lewis, J. Müller and B. Schönfisch, A Course in Mathematical Biology: Quantitative Modeling with Mathematical & Computational Methods, SIAM Ser. Math. Model. Comput. MM12, Society for Industrial and Applied Mathematics, Philadelphia, 2006. 10.1137/1.9780898718256Suche in Google Scholar

[42] T. J. DiCiccio and B. Efron, Bootstrap confidence intervals, Statist. Sci. 11 (1996), no. 3, 189–228. 10.1214/ss/1032280214Suche in Google Scholar

[43] B. Efron, The Jackknife, the Bootstrap and Other Resampling Plans, CBMS-NSF Regional Conf. Ser. in Appl. Math. 38, Society for Industrial and Applied Mathematics, Philadelphia, 1982. 10.1137/1.9781611970319Suche in Google Scholar

[44] F. Eghiaian, T. Daubenfeld, Y. Quenet, M. van Audenhaege, A. P. Bouin, G. van der Rest, J. Grosclaude and H. Rezaei, Diversity in Prion protein oligomerization pathways results from domain expansion as revealed by hydrogen/deuterium exchange and disulfide linkage, Proc. Natl. Acad. Sci. USA 104 (2007), no. 18, 7414–7419. 10.1073/pnas.0607745104Suche in Google Scholar PubMed PubMed Central

[45] R. Ghanem, D. Higdon and H. Owhadi, Handbook of Uncertainty Quantification, Springer, New York, 2016. 10.1007/978-3-319-11259-6Suche in Google Scholar

[46] C. M. Hurvich and C.-L. Tsai, Regression and time series model selection in small samples, Biometrika 76 (1989), no. 2, 297–307. 10.1093/biomet/76.2.297Suche in Google Scholar

[47] T. P. J. Knowles, M. Vendruscolo and C. M. Dobson, The amyloid state and its association with protein misfolding diseases, Nature Rev. Molecular Cell Biol. 15 (2014), 384–396. 10.1038/nrm3810Suche in Google Scholar PubMed

[48] M. Kot, Elements of Mathematical Ecology, Cambridge University Press, Cambridge, 2001. 10.1017/CBO9780511608520Suche in Google Scholar

[49] S. Kullback and R. A. Leibler, On information and sufficiency, Ann. Math. Statistics 22 (1951), 79–86. 10.1214/aoms/1177729694Suche in Google Scholar

[50] R. J. LeVeque, Finite Volume Methods for Hyperbolic Problems, Cambridge Texts Appl. Math., Cambridge University Press, Cambridge, 2002. 10.1017/CBO9780511791253Suche in Google Scholar

[51] S. Prigent, A. Ballesta, F. Charles, N. Lenuzza, P. Gabriel, L. M. Tine, H. Rezaei and M. Doumic, An efficient kinetic model for assemblies of amyloid fibrils and its application to polyglutamine aggregation, PLoS ONE 7 (2012), Article ID e43273. 10.1371/journal.pone.0043273Suche in Google Scholar PubMed PubMed Central

[52] S. Prigent, H. W. Haffaf, H. T. Banks, M. Hoffmann, H. Rezaei and M. Doumic, Size distribution of amyloid fibrils: Mathematical models and experimental data, CRSC TR14-04, North Carolina State University, Raleigh, 2014; Int. J. Pure Appl. Math. 93 (2014), 845–878. Suche in Google Scholar

[53] S. I. Rubinow, Introduction to Mathematical Biology, Dover Publications, Mineola, 2002. Suche in Google Scholar

[54] G. A. F. Seber and C. J. Wild, Nonlinear Regression, Probab. Math. Stat., John Wiley & Sons, New York, 1989. 10.1002/0471725315Suche in Google Scholar

[55] R. C. Smith, Uncertainty Quantification. Theory, Implementation, and Applications, Comput. Sci. Eng. 12, Society for Industrial and Applied Mathematics, Philadelphia, 2014. 10.1137/1.9781611973228Suche in Google Scholar

[56] N. Sugiura, Further analysis of the data by Akaike’s information criterion and the finite corrections, Comm. Statist. A7 (1978), 13–26. 10.1080/03610927808827599Suche in Google Scholar

[57] T. J. Sullivan, Introduction to Uncertainty Quantification, Texts Appl. Math. 63, Springer, Cham, 2015. 10.1007/978-3-319-23395-6Suche in Google Scholar

[58] W. C. Thompson, Partial Differential Equation Modeling of Flow Cytometry Data from CFSE-based Proliferation Assays, ProQuest LLC, Ann Arbor, 2011; Ph.D. thesis, North Carolina State University, Raleigh, 2011. Suche in Google Scholar

[59] D. Valdez-Jasso, H. T. Banks, M. A. Haider, D. Bia, Y. Zocalo, R. L. Armentano and M. S. Olufsen, Viscoelastic models for passive arterial wall dynamics, Adv. Appl. Math. Mech. 1 (2009), no. 2, 151–165. Suche in Google Scholar

[60] E.-J. Wagenmakers and S. Farrell, AIC model selection using Akaike weights, Psychonomic Bull. Rev. 11 (2004), 192–196. 10.3758/BF03206482Suche in Google Scholar

[61] H. White, Estimation, Inference and Specification Analysis, Econom. Soc. Monogr. 22, Cambridge University Press, Cambridge, 1994. 10.1017/CCOL0521252806Suche in Google Scholar

[62] W.-F. Xue, S. W. Homans and S. E. Radford, Systematic analysis of nucleation-dependent polymerization reveals new insights into the mechanism of amyloid self-assembly, Proc. Natl. Acad. Sci. USA 105 (2008), 8926–8931. 10.1073/pnas.0711664105Suche in Google Scholar PubMed PubMed Central

[63] W.-F. Xue, S. W. Homans and S. E. Radford, Amyloid fibril length distribution quantified by atomic force microscopy single-particle image analysis, Protein Eng. Des. Sel. 22 (2009), 489–496. 10.1093/protein/gzp026Suche in Google Scholar PubMed PubMed Central

[64] W.-F. Xue and S. E. Radford, An imaging and systems modeling approach to fibril breakage enables prediction of amyloid behavior, Biophys. J. 105 (2013), 2811–2819. 10.1016/j.bpj.2013.10.034Suche in Google Scholar PubMed PubMed Central

[65] Gamma distribution, Wikipedia, The Free Encyclopedia. Wikimedia Foundation, Inc. 15, May 2014. Web. 28 May 2014. Suche in Google Scholar

[66] Weibull distribution, Wikipedia, The Free Encyclopedia. Wikimedia Foundation, Inc. 6 May 2014. Web. 28 May 2014. Suche in Google Scholar

Received: 2017-10-2
Accepted: 2018-1-14
Published Online: 2018-1-24
Published in Print: 2018-6-1

© 2018 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 10.12.2025 von https://www.degruyterbrill.com/document/doi/10.1515/jiip-2017-0096/pdf
Button zum nach oben scrollen