A boosting approach for adapting the sparsity of risk prediction signatures based on different molecular levels

Murat Sariyar; Martin Schumacher; Harald Binder

doi:10.1515/sagmb-2013-0050

Artikel

A boosting approach for adapting the sparsity of risk prediction signatures based on different molecular levels

Murat Sariyar , Martin Schumacher und Harald Binder

Veröffentlicht/Copyright: 14. März 2014

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

Aus der Zeitschrift Statistical Applications in Genetics and Molecular Biology Band 13 Heft 3

Abstract

Risk prediction models can link high-dimensional molecular measurements, such as DNA methylation, to clinical endpoints. For biological interpretation, often a sparse fit is desirable. Different molecular aggregation levels, such as considering DNA methylation at the CpG, gene, or chromosome level, might demand different degrees of sparsity. Hence, model building and estimation techniques should be able to adapt their sparsity according to the setting. Additionally, underestimation of coefficients, which is a typical problem of sparse techniques, should also be addressed. We propose a comprehensive approach, based on a boosting technique that allows a flexible adaptation of model sparsity and addresses these problems in an integrative way. The main motivation is to have an automatic sparsity adaptation. In a simulation study, we show that this approach reduces underestimation in sparse settings and selects more adequate model sizes than the corresponding non-adaptive boosting technique in non-sparse settings. Using different aggregation levels of DNA methylation data from a study in kidney carcinoma patients, we illustrate how automatically selected values of the sparsity tuning parameter can reflect the underlying structure of the data. In addition to that, prediction performance and variable selection stability is compared to the non-adaptive boosting approach.

Keywords: componentwise boosting; Cox model; CpG methylation; regularization; sparse models

Corresponding author: Murat Sariyar, Institute of Medical Biostatistics, Epidemiology and Informatics, Medical Center of the Johannes Gutenberg University, 55131 Mainz, Germany; and Institute of Pathology, Charite – University Medicine Berlin, Campus Benjamin Franklin, Berlin 12200, Germany, e-mail: murat.sariyar@charite.de

References

Asakura, T., A. Imai, N. Ohkubo-Uraoka, M. Kuroda, Y. Iidaka, K. Uchida, T. Shibasaki and K. Ohkawa (2005): “Relationship between expression of drugresistance factors and drug sensitivity in normal human renal proximal tubular epithelial cells in comparison with renal cell carcinoma.” Oncol. Rep., 14, 601–607.Suche in Google Scholar

Benner, A., M. Zucknick, T. Hielscher, C. Ittrich and U. Mansmann (2010): “High-dimensional cox models: the choice of penalty as part of the model Building process,” Biometrical J., 52, 50–69.Suche in Google Scholar

Binder, H. and M. Schumacher (2008a): “Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples,” Statistical Applications in Genetics and Molecular Biology, 7, Article 12.10.2202/1544-6115.1346Suche in Google Scholar PubMed

Binder, H. and M. Schumacher (2008b): “Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models,” BMC Bioinformatics, 9, 14.10.1186/1471-2105-9-14Suche in Google Scholar PubMed PubMed Central

Binder, H. and M. Schumacher (2009): “Incorporating pathway information into boosting estimation of high-dimensional risk prediction models,” BMC Bioinformatics, 10, 18.10.1186/1471-2105-10-18Suche in Google Scholar PubMed PubMed Central

Buhlmann, P. and B. Yu (2003): “Boosting with the L2 Loss: Regression and Classification,” J. Am. Stat. Assoc., 98, 324–339.Suche in Google Scholar

Candes, E. and T. Tao (2007): “The Dantzig selector: statistical estimation when p is much larger than n,” Ann. Stat., 35, 2313–2351.Suche in Google Scholar

Dedeurwaerder, S., M. Defrance, E. Calonne, H. Denis, C. Sotiriou and F. Fuks (2011): “Evaluation of the infinium methylation 450K technology,” Epigenomics, 3, 771–784.10.2217/epi.11.105Suche in Google Scholar PubMed

Efron, B., T. Hastie, I. Johnstone and R. Tibshirani (2004): “Least angle regression,” Ann. Stat., 32, 407–499.Suche in Google Scholar

Efron, B. and R. Tibshirani (1997): “Improvements on cross-validation: the 0.632 bootstrap method,” J. Am. Stat. Assoc., 92, 548–560.Suche in Google Scholar

Ein-Dor, L., O. Zuk and E. Domany (2006): “Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer,” P. Nat. Acad. Sci., 103, 5923–5928.Suche in Google Scholar

Engler, D. and Y. Li (2009): “Survival analysis with high-dimensional covariates: an application in microarray studies,” Statistical Applications in Genetics and Molecular Biology, 8, 1–56.10.2202/1544-6115.1423Suche in Google Scholar PubMed PubMed Central

Fan, J. and R. Li (2001): “Variable selection via nonconcave penalized likelihood and its oracle properties,” J. Am. Stat. Assoc., 96, 1348–1360.Suche in Google Scholar

Hakimi, A. A., I. Ostrovnaya, B. Reva, N. Schultz, Y.-B. Chen, M. Gonen, H. Liu, S. Takeda, M. H. Voss, S. K. Tickoo, V. E. Reuter, P. Russo, E. H. Cheng, C. Sander, R. J. Motzer and J. J. Hsieh (2013): “Adverse outcomes in clear cell renal cell carcinoma with mutations of 3p21 epigenetic regulators BAP1 and SETD2: A report by MSKCC and the KIRC TCGA research network,” Clin. Cancer Res., 19, 3259–3267.10.1158/1078-0432.CCR-12-3886Suche in Google Scholar PubMed PubMed Central

Li, J. and S. Ma (2013): Survival analysis in medicine and genetics, Chapman and Hall/CRC Biostatistics Series, CRC Press LLC.Suche in Google Scholar

R Development Core Team (2013): R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, URL http://www.R-project.org/.Suche in Google Scholar

Sandoval, J., H. Heyn, S. Moran, J. Serra-Musach, M. A. Pujana, M. Bibikova and M. Esteller (2011): “Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome,” Epigenetics, 6, 692–702.10.4161/epi.6.6.16196Suche in Google Scholar PubMed

Schmid, M. and T. Hothorn (2008): “Flexible boosting of accelerated failure time models,” BMC Bioinformatics, 9, 269.10.1186/1471-2105-9-269Suche in Google Scholar PubMed PubMed Central

Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. Roy. Stat. Soc., B, 58, 267–288.Suche in Google Scholar

Tutz, G. and H. Binder (2006): “Generalized additive modeling with implicit variable selection by likelihood-based boosting,” Biometrics, 62, 961–971.10.1111/j.1541-0420.2006.00578.xSuche in Google Scholar PubMed

Tutz, G. and H. Binder (2007): “Boosting ridge regression,” Comput. Stat. Data Anal., 51, 6044–6059.Suche in Google Scholar

Wang, S., B. Nan, J. Zhu and D. G. Beer (2008): “Doubly penalized buckleyjames method for survival data with high-dimensional covariates,” Biometrics, 64, 132–140.10.1111/j.1541-0420.2007.00877.xSuche in Google Scholar PubMed

Xie, H. and J. Huang (2009): “SCAD-Penalized regression in high-dimensional partially linear models,” Ann. Stat., 37, 673–696.Suche in Google Scholar

Yang, Y. (2007): “Prediction/Estimation with simple linear models: is it really that simple?” Economet. Theor., 23, 1–36.Suche in Google Scholar

Ziller, M. J., H. Gu, F. Mller, J. Donaghey, O. Kohlbacher, B. E. Bernstein, A. Gnirke and A. Meissner (2013): “Charting a dynamic DNA methylation landscape of the human genome,” Nature, 500, 477481.10.1038/nature12433Suche in Google Scholar PubMed PubMed Central

Zou, H. (2006): “The adaptive lasso and its oracle properties,” J. Am. Stat. Assoc., 101, 1418–1429.Suche in Google Scholar

Zou, H. and T. Hastie (2005): “Regularization and variable selection via the elastic net,” J. Roy. Stat. Soc.: Series B (Statistical Methodology), 67, 301–320.10.1111/j.1467-9868.2005.00503.xSuche in Google Scholar

Published Online: 2014-3-14

Published in Print: 2014-6-1

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/sagmb-2013-0050

Schlagwörter für diesen Artikel

componentwise boosting; Cox model; CpG methylation; regularization; sparse models