False discovery control for penalized variable selections with high-dimensional covariates

Kevin He; Xiang Zhou; Hui Jiang; Xiaoquan Wen; Yi Li

doi:10.1515/sagmb-2018-0038

Artikel

False discovery control for penalized variable selections with high-dimensional covariates

Kevin He , Xiang Zhou , Hui Jiang , Xiaoquan Wen und Yi Li

Veröffentlicht/Copyright: 15. Dezember 2018

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Statistical Applications in Genetics and Molecular Biology Band 17 Heft 6

Abstract

Modern bio-technologies have produced a vast amount of high-throughput data with the number of predictors much exceeding the sample size. Penalized variable selection has emerged as a powerful and efficient dimension reduction tool. However, control of false discoveries (i.e. inclusion of irrelevant variables) for penalized high-dimensional variable selection presents serious challenges. To effectively control the fraction of false discoveries for penalized variable selections, we propose a false discovery controlling procedure. The proposed method is general and flexible, and can work with a broad class of variable selection algorithms, not only for linear regressions, but also for generalized linear models and survival analysis.

Keywords: dimension reduction; false discovery; penalized regression; variable selection

Funding source: Chinese Natural Science Foundation

Award Identifier / Grant number: 11528102

Funding statement: The authors thank Dr. Kirsten Herold at the UM-SPH Writing lab for her helpful suggestions. Chinese Natural Science Foundation, Grant Number: 11528102.

References

Ayers, K. and H. Cordell (2010): “SNP selection in genome-wide and candidate gene studies via penalized logistic regression,” Genet. Epidemiol., 34, 879–891.10.1002/gepi.20543Suche in Google Scholar PubMed PubMed Central

Barber, R. and E. Candês (2015): “Controlling the false discovery rate via knockoffs,” Ann. Stat., 43, 2055–2085.10.1214/15-AOS1337Suche in Google Scholar

Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: A practical and powerful approach to multiple testing,” J. R. Stat. Soc. Series B Stat. Methodol., 57, 289–300.10.1111/j.2517-6161.1995.tb02031.xSuche in Google Scholar

Bühlmann, P. and S. van de Geer (2011): Statistics for high-dimensional data: methods, theory and applications, Berlin Heidelberg: Springer-Verlag.10.1007/978-3-642-20192-9Suche in Google Scholar

Cho, S., K. Kim, Y. Kim, J. Lee, Y. Cho, J. Lee, B. Han, H. Kim, J. Ott and T. Park (2010): “Joint identification of multiple genetic variants via elastic-net variable selection in a genome-wide association analysis,” Ann. Hum. Genet., 74, 416–428.10.1111/j.1469-1809.2010.00597.xSuche in Google Scholar PubMed

Efron, B. (2008): “Microarrays, empirical Bayes and the two groups model,” Stat. Sci., 23, 1–22.10.1214/07-STS236Suche in Google Scholar

Efron, B. (2013): Large-scale inference: empirical bayes methods for estimation, testing, and prediction, Cambridge, UK: Cambridge University Press.Suche in Google Scholar

Efron, B. (2014): “Estimation and accuracy after model selection,” J. Am. Stat. Assoc., 109, 991–1007.10.1080/01621459.2013.823775Suche in Google Scholar PubMed PubMed Central

Fan, J. and J. Lv (2008): “Sure independence screening for ultrahigh dimensional feature space,” J. R. Stat. Soc. Series B Stat. Methodol., 70, 849–911.10.1111/j.1467-9868.2008.00674.xSuche in Google Scholar PubMed PubMed Central

Genovese, C. and L. Wasserman (2004): “A stochastic process approach to false discovery control,” Ann. Stat., 32, 1035–1061.10.1214/009053604000000283Suche in Google Scholar

Gui, J. and H. Li (2005): “Penalized cox regression analysis in the high-dimensional and low-sample size settings with application to microarray gene expression data,” Bioinformatics, 21, 3001–3008.10.1093/bioinformatics/bti422Suche in Google Scholar PubMed

Hastie, T., R. Tibshirani and J. Friedman (2009): The elements of statistical learning: data mining, inference, and prediction, New York: Springer.10.1007/978-0-387-84858-7Suche in Google Scholar

He, K., Y. Li, J. Zhu, H. Liu, J. Lee, C. Amos, T. Hyslop, J. Jin, H. Lin, Q. Wei and Y. Li (2016): “Component-wise gradient boosting and false discovery control in survival analysis with high-dimensional covariates,” Bioinformatics, 32, 50–57.10.1093/bioinformatics/btv517Suche in Google Scholar PubMed PubMed Central

Meinshausen, N., L. Meier and P. Bühlmann (2009): “P-values for highdimensional regression,” J. Am. Stat. Assoc., 104, 1671–1681.10.1198/jasa.2009.tm08647Suche in Google Scholar

Meinshausen, N., L. Meier and P. Bühlmann (2010): “Stability selection (with discussion),” J. R. Stat. Soc. Series B Stat. Methodol., 72, 417–473.10.1111/j.1467-9868.2010.00740.xSuche in Google Scholar

Scott, L., M. Erdos, J. Huyghe, R. Welch, A. Beck, M. Boehnke, F. Collins and S. Parker (2016): “The genetic regulatory sigature of type 2 diabetes in human skeletal muscle,” Nat. Commun., 7, 1–12.10.1038/ncomms11764Suche in Google Scholar

Shaughnessy, J., F. Zhan, B. Burington, Y. Huang, S. Colla, I. Hanamura, J. Stewart, B. Kordsmeier, C. Randolph, D. Williams, Y. Xiao, H. Xu, J. Epstein, E. Anaissie, S. Krishna, M. Cottler-Fox, K. Hollmig, A. Mohiuddin, M. Pineda-Roman, G. Tricot, F. van Rhee, J. Sawyer, Y. Alsayed, R. Walker, M. Zangari, J. Crowley and B. Barlogie (2007): “A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1,” Blood, 109, 2276–2284.10.1182/blood-2006-07-038430Suche in Google Scholar PubMed

Shi, L., G. Campbell, W. Jones and M. Consortium (2010): “The MAQC-II project: a comprehensive study of common practices for the development and validation of microarray-based predictive models,” Nat. Biotechnol., 28, 827–838.10.1038/nbt.1665Suche in Google Scholar PubMed PubMed Central

Simon, N., J. Friedman, T. Hastie and R. Tibshirani (2011): “Regularization paths for Cox’s proportional hazards model via coordinate descent,” J. Stat. Softw., 39, 1–13.10.18637/jss.v039.i05Suche in Google Scholar PubMed PubMed Central

Sun, S., M. Hood, L. Scott, Q. Peng, S. Mukherjee, J. Tung and X. Zhou (2017): “Differential expression analysis for RNAseq using Poisson mixed models,” Nucleic Acids Res., 45, e106.10.1093/nar/gkx204Suche in Google Scholar PubMed PubMed Central

Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. R. Stat. Soc. Series B Stat. Methodol., 58, 267–288.10.1111/j.2517-6161.1996.tb02080.xSuche in Google Scholar

Tusher, V., R. Tibshirani and G. Chu (2001): “Significane analysis of microarrays applied to the ionizing radiation repsonse,” Proc. Natl. Acad. Sci. USA, 98, 5116–5121.10.1073/pnas.091062498Suche in Google Scholar PubMed PubMed Central

Uno, H., T. Cai, L. Tian and L. J. Wei (2007): “Evaluating prediction rules for t-year survivors with censored regression models,” J. Am. Stat. Assoc., 102, 527–537.10.1198/016214507000000149Suche in Google Scholar

Wu, T., Y. Chen, T. Hastie, E. Sobel and K. Lange (2009): “Genome-wide association analysis by lasso penalized logistic regression,” Bioinformatics, 25, 714–721.10.1093/bioinformatics/btp041Suche in Google Scholar PubMed PubMed Central

Zou, H. and T. Hastie (2005): “Regression shrinkage and selection via the elastic net with application to microarrays,” J. R. Stat. Soc. Series B Stat. Methodol., 67, 301–320.10.1111/j.1467-9868.2005.00503.xSuche in Google Scholar

Supplementary Material

The online version of this article offers supplementary material (DOI: https://doi.org/10.1515/sagmb-2018-0038).

Published Online: 2018-12-15

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Supplementary Material Details

Artikel in diesem Heft

https://doi.org/10.1515/sagmb-2018-0038

Schlagwörter für diesen Artikel

dimension reduction; false discovery; penalized regression; variable selection

False discovery control for penalized variable selections with high-dimensional covariates

Artikel

Abstract

References

Supplementary Material

Zusatzmaterial

Artikel in diesem Heft

Artikel in diesem Heft

Artikel in diesem Heft