Startseite Accounting for undetected compounds in statistical analyses of mass spectrometry ‘omic studies
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Accounting for undetected compounds in statistical analyses of mass spectrometry ‘omic studies

  • Sandra L. Taylor EMAIL logo , Gary S. Leiserowitz und Kyoungmi Kim
Veröffentlicht/Copyright: 12. Oktober 2013

Abstract

Mass spectrometry is an important high-throughput technique for profiling small molecular compounds in biological samples and is widely used to identify potential diagnostic and prognostic compounds associated with disease. Commonly, this data generated by mass spectrometry has many missing values resulting when a compound is absent from a sample or is present but at a concentration below the detection limit. Several strategies are available for statistically analyzing data with missing values. The accelerated failure time (AFT) model assumes all missing values result from censoring below a detection limit. Under a mixture model, missing values can result from a combination of censoring and the absence of a compound. We compare power and estimation of a mixture model to an AFT model. Based on simulated data, we found the AFT model to have greater power to detect differences in means and point mass proportions between groups. However, the AFT model yielded biased estimates with the bias increasing as the proportion of observations in the point mass increased while estimates were unbiased with the mixture model except if all missing observations came from censoring. These findings suggest using the AFT model for hypothesis testing and mixture model for estimation. We demonstrated this approach through application to glycomics data of serum samples from women with ovarian cancer and matched controls.


Corresponding author: Sandra L. Taylor, Division of Biostatistics, Department of Public Health Sciences, University of California School of Medicine, Davis, CA 95616, USA, e-mail:

We wish to thank Drs. Renee Ruhaak and Carlito Lebrilla for their dedicated work in generating the ovarian cancer glycomics data. The project described was supported by the National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), through grant #UL1 TR000002. This work was supported by NIH/NIA grant P01AG025532 and the Ovarian Cancer Research Fund.

References

Burow, M., B. A. Halkier, D. J. Kliebenstein (2010): “Regulatory networks of glucosinolates shape Arabidopsis thaliana fitness,” Curr. Opin. Plant. Biol., 13, 348–353.Suche in Google Scholar

Chai, H. S. and K. R. Bailey (2008): “Use of log-skew-normal distribution in analysis of continuous data with a discrete component at zero,” Stat. Med., 27, 3643–3655.Suche in Google Scholar

Duan, N., W. G. Manning, Jr., C. N. Morris, and J. P. Newhouse (1983): “A comparison of alternative models for the demand for medical care,” J. Bus. Econ. Stat., 1, 115–126.Suche in Google Scholar

Enot, D. P., B. Haas, and K. M. Weinberger (2011): “Bioinformatices for mass-spectrometry-based metabolomics,” Method Mol. Biol., 719, 351–375.Suche in Google Scholar

Hastie, T., R. Tibshirani, B. Narasimhan, and C. Gilbert (2012): “Impute: imputation for microarray data,” R package version 1.32.9.Suche in Google Scholar

Hrydziuszko, O. and M. R. Viant (2012): “Missing values in mass spectrometry based metabolomics, an undervalued step in the data processing pipeline,” Metabolomics 8, 161–174.10.1007/s11306-011-0366-4Suche in Google Scholar

Karpievitch, Y., J. Stanley, T. Taverner, J. Huang, J. N. Adkins, C. Ansong, F. Heffron, T. O. Metz, W. -J. Quan, H. Yoon, R. D. Smith, and A. R. Dabney (2009): “A statistical framework for protein quantitation in bottom-up MS-based proteomics,” Bioinformatics, 25, 2028–2034.10.1093/bioinformatics/btp362Suche in Google Scholar

Karpievitch, Y., A. R. Dabney, and R. D. Smith (2012): “Normalization and missing value imputation for label-free LC-MS analysis,” BMC Bioinformatics, 13(Suppl 16), 55.10.1186/1471-2105-13-S16-S5Suche in Google Scholar

Klein, J. P. and M. L. Moeschberger (2003): Survival Analysis: Techniques for Censored and Truncated Data, 2nd edition. Springer-Verlag, New York.10.1007/b97377Suche in Google Scholar

Lachenbruch, P. A. (1976): “Analysis of data with clumping at zero,” Biometrische Zeitschrift 18, 351–356.Suche in Google Scholar

Lachenbruch, P. A. (1992): Utility of logistic regression analysis in epidemiologic studies of the elderly. In: Wallace, R. B., Woolson, R. F. (Eds.), Epidemiologic Methods in the Study of Aging, Oxford University Press, New York, pp. 371–381.Suche in Google Scholar

Lachenbruch, P. A. (2001): “Comparisons of two-part models with competitors,” Stat. Med. 20, 1215–1234.Suche in Google Scholar

Lee, M. L. (2004): Analysis of microarray gene expression data, Kluwer Academic Publishers, New York.Suche in Google Scholar

Little, R. J. A. and D. B. Rubin (2002): Statistical Analysis with Missing Data. 2nd Edition. John Wiley & Sons, Hoboken.10.1002/9781119013563Suche in Google Scholar

Michalski, A., J. Cox, and M. Mann (2011): “More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LC-MS/MS,” J. Proteome Res., 10, 1785–1793.10.1021/pr101060vSuche in Google Scholar

Moulton, L. H. and N. A. Halsey (1995): “A mixture model with detection limits for regression analyses of antibody response to vaccine,” Biometrics, 51, 1570–1578.10.2307/2533289Suche in Google Scholar

Moulton, L. H. and N. A. Halsey (1996): “A mixed gamma model for regression analyses of quantitative assay data,” Vaccines, 14, 1154–1158.10.1016/0264-410X(96)00017-5Suche in Google Scholar

R Core Team. (2012): R, A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http,//www.R-project.org/.Suche in Google Scholar

Self, S. G. and K. -Y. Liang (1987): “Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions,” J. Am. Stat. Assoc., 82, 605–610.Suche in Google Scholar

Taylor, S. and K. Pollard (2009): “Hypothesis tests for point-mass mixture data with application to ‘omics data with many zero values,” Stat. Appl. Genet. Mo. B., 8(1), 1–43.Suche in Google Scholar

Tekwe, C. D., R. J. Carroll, and A. R. Dabney (2012): “Application of survival analysis methodology to the quantitative analysis of LC-MS proteomic data,” Bioinformatics, 28, 1998–2003.10.1093/bioinformatics/bts306Suche in Google Scholar PubMed PubMed Central

Therneau, T. and P. M. Grambsch (2000): Modeling Survival Data: Extending the Cox Model. Springer, N.Y. ISBN 0-387-98784-3.10.1007/978-1-4757-3294-8Suche in Google Scholar

Wang, X, G. A. Anderson, R. D. Smith, and A. R. Dabney (2012): “A hybrid approach to protein differential expression in mass spectrometry-based proteomics,” Bioinformatics, 28, 1586–1591.10.1093/bioinformatics/bts193Suche in Google Scholar PubMed PubMed Central

Want, E. and P. Masson (2011): “Processing and analysis of GC/LC-MS-Based metabolomic data,” Method Mol. Biol., 708, 277–298.Suche in Google Scholar

Wood, J., I. R. White, and P. Cutler (2004): “A likelihood-based approach to defining statistical significance in proteomic analysis where missing data cannot be disregarded,” Signal Process., 84, 1777–1788.Suche in Google Scholar

Wu, S. A., M. A. Black, R. A. North, K. R. Atkinson, and A. G. Rodrigo (2009): “A statistical model to identify differentially expressed proteins in 2D PAGE Gels,” PLOS Comp. Biol., 5(9), e1000509.Suche in Google Scholar

Published Online: 2013-10-12
Published in Print: 2013-12-01

©2013 by Walter de Gruyter Berlin Boston

Heruntergeladen am 16.11.2025 von https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2013-0021/pdf
Button zum nach oben scrollen