Modeling gene-covariate interactions in sparse regression with group structure for genome-wide association studies

Yun Li; George T. O’Connor; Josée Dupuis; Eric Kolaczyk

doi:10.1515/sagmb-2014-0073

40% Rabatt

auf Fachbücher bei De Gruyter Brill *

Artikel

Modeling gene-covariate interactions in sparse regression with group structure for genome-wide association studies

Yun Li , George T. O’Connor , Josée Dupuis und Eric Kolaczyk

Veröffentlicht/Copyright: 1. Mai 2015

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Statistical Applications in Genetics and Molecular Biology Band 14 Heft 3

Abstract

In genome-wide association studies (GWAS), it is of interest to identify genetic variants associated with phenotypes. For a given phenotype, the associated genetic variants are usually a sparse subset of all possible variants. Traditional Lasso-type estimation methods can therefore be used to detect important genes. But the relationship between genotypes at one variant and a phenotype may be influenced by other variables, such as sex and life style. Hence it is important to be able to incorporate gene-covariate interactions into the sparse regression model. In addition, because there is biological knowledge on the manner in which genes work together in structured groups, it is desirable to incorporate this information as well. In this paper, we present a novel sparse regression methodology for gene-covariate models in association studies that not only allows such interactions but also considers biological group structure. Simulation results show that our method substantially outperforms another method, in which interaction is considered, but group structure is ignored. Application to data on total plasma immunoglobulin E (IgE) concentrations in the Framingham Heart Study (FHS), using sex and smoking status as covariates, yields several potentially interesting gene-covariate interactions.

Keywords: gene-environment/covariate interaction; genome-wide association studies; sparse regression

Corresponding author: Yun Li, Department of Mathematics and Statistics, Boston University, MA 02215, USA; and Department of Biostatistics, Boston University School of Public Health, MA 02118, USA, e-mail: yrlee@bu.edu

Acknowledgments

This research is supported by National Institute Health grants ES020827, DK078616, N01 HC25195 and P01 AI050516 (in part). A portion of this research was conducted using the Linux Clusters for Genetic Analysis (LinGA) computing resources at Boston University Medical Campus.

References

Bickel, P., Y. Ritov and A. Tsybakov (2009): “Simultaneous analysis of lasso and dantzig selector,” Ann. Stat., 37, 1705–1732.Suche in Google Scholar

Candes, E. and T. Tao (2007): “The dantzig selector: Statistical estimation when p is much larger than n (with discussion),” Ann. Stat., 35, 2313–2351.Suche in Google Scholar

Chen, G. and D. Thomas (2010): “Using biological knowledge to discover higher order interactions in genetic association studies,” Genet. Epidemiol., 34, 863–878.Suche in Google Scholar

Chipman, H. (1996): “Bayesian variable selection with related predictors,” Can. J. Stat., 24, 17–36.Suche in Google Scholar

Choi, N., W. Li and J. Zhu (2010): “Variable selection with the strong heredity constraint and its oracle property,” J. Am. Stat. Assoc., 105, 354–364.Suche in Google Scholar

Fan, J. and R. Li (2001): “Variable selection via nonconcave penalized likelihood and its oracle properties,” J. Am. Stat. Assoc., 96, 1348–1360.Suche in Google Scholar

Fan, J. and J. Lv (2008): “Sure independence screening for ultra-high dimensional feature space,” J. R. Stat. Soc., Series B, 70, 849–911.Suche in Google Scholar

Friedman, J., T. Hastie and R. Tibshirani (2010a): “A note on the group lasso and sparse group lasso,” arXiv:1001.0736v1. (http://arxiv.org/pdf/1001.0736v1.pdf).Suche in Google Scholar

Friedman, J., T. Hastie and R. Tibshirani (2010b): “Regularization paths for generalized linear models via coordinate descent,” J. Stat. Software, 33, 1–22.10.18637/jss.v033.i01Suche in Google Scholar

Fu, W. (1998): “Penalized regression: the bridge versus the lasso,” J. Comput. Graph. Stat., 7, 397–416.Suche in Google Scholar

Gauderman, W., C. Murcray, F. Gilliland and D. Conti (2007): “Testing association between disease and multiple SNPs in a candidate gene,” Genet. Epidemiol., 31, 383–395.Suche in Google Scholar

Granada, M., J. Wilk, M. Tuzova, D. Strachan, S. Weiding, E. Albrecht, C. Gieger, J. Heinrish, B. Himes, G. Hunninghake, J. Celedn, S. Weiss, W. Cruikshank, L. Farrer, D. Center and G. O’Connor (2012): “A genome-wide association study of plasma total IgE concentration in the Framingham Heart Study,” J. Allergy Clin. Immun., 129, 840–845.Suche in Google Scholar

Hamada, M. and C. Wu (1992): “Analysis of designed experiments with complex aliasing,” J. Qual. Technol., 24, 130–137.Suche in Google Scholar

Huang, J., S. Ma, H. Xie and C. Zhang (2009): “A group bridge approach for variable selection,” Biometrika, 96, 339–355.10.1093/biomet/asp020Suche in Google Scholar PubMed PubMed Central

Joseph, V. (2006): “A Bayesian approach to the design and analysis of fractionated experiments,” Technometrics, 48, 219–229.10.1198/004017005000000652Suche in Google Scholar

Li, Y. and G. Abecasis (2006): “Mach 1.0: rapid haplotype reconstruction and missing genotype inference,” Am. J. Hum. Genet. S., 79, 2290.Suche in Google Scholar

McCullagh, P. and J. Nelder (1989): Generalized linear models, London: Chapman & Hall/CRC.10.1007/978-1-4899-3242-6Suche in Google Scholar

Meinshausen, N. (2007): “Relaxed lasso,” Comput. Stat. Data Anal., 52, 374–393.Suche in Google Scholar

Nardi, Y. and A. Rinaldo (2008): “On the asymptotic properties of the group lasso estimator for linear models,” Electron. J. Stat., 2, 605–633.Suche in Google Scholar

Nelder, J. (1994): “The statistics of linear models: Back to basics,” Stat. Comput., 4, 221–234.Suche in Google Scholar

Radchenko, P. and G. James (2010): “Variable selection using adaptive nonlinear interaction structures in high dimensions,” J. Am. Stat. Assoc., 105, 1541–1553.Suche in Google Scholar

Simon, N., J. Friedman, T. Hastie and R. Tibshirani (2013): “A sparse-group lasso,” J. Comput. Graph. Stat., 22.2, 231–245.10.1080/10618600.2012.681250Suche in Google Scholar

The ENCODE Project Consortium (2012): “An integrated encyclopedia of DNA elements in the human genome,” Nature, 489, 57–74.10.1038/nature11247Suche in Google Scholar PubMed PubMed Central

Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. R. Stat. Soc., Series B, 58, 267–288.Suche in Google Scholar

Wu, T., Y. Chen, T. Hastie, E. Sobel and K. Lange (2009): “Genomewide association analysis by lasso penalized logistic regression,” Bioinformatics, 25, 714–721.10.1093/bioinformatics/btp041Suche in Google Scholar PubMed PubMed Central

Yuan, M. and Y. Lin (2006): “Model selection and estimation in regression with grouped variables,” J. R. Stat. Soc., Series B, 68, 4967.Suche in Google Scholar

Zhao, R., G. Rocha and B. Yu (2009): “The composite absolute penalties family for grouped and hierarchical variable selection,” The Annals of Stat., 6A, 3468–3497.Suche in Google Scholar

Zhou, N. and J. Zhu (2010): “Group variable selection via a hierarchical lasso and its oracle property,” Stat. Interface, 3, 574.Suche in Google Scholar

Zou, H. (2006): “The adaptive lasso and its oracle properties,” J. Am. Stat. Assoc., 101, 1418–1429.Suche in Google Scholar

Zou, H. and T. Hastie (2005): “Regularization and variable selection via the elastic net,” J. R. Stat. Soc., Series B, 67, 301–320.Suche in Google Scholar

Published Online: 2015-5-1

Published in Print: 2015-6-1

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/sagmb-2014-0073

Schlagwörter für diesen Artikel

gene-environment/covariate interaction; genome-wide association studies; sparse regression