Abstract
In genome-wide association studies (GWAS), it is of interest to identify genetic variants associated with phenotypes. For a given phenotype, the associated genetic variants are usually a sparse subset of all possible variants. Traditional Lasso-type estimation methods can therefore be used to detect important genes. But the relationship between genotypes at one variant and a phenotype may be influenced by other variables, such as sex and life style. Hence it is important to be able to incorporate gene-covariate interactions into the sparse regression model. In addition, because there is biological knowledge on the manner in which genes work together in structured groups, it is desirable to incorporate this information as well. In this paper, we present a novel sparse regression methodology for gene-covariate models in association studies that not only allows such interactions but also considers biological group structure. Simulation results show that our method substantially outperforms another method, in which interaction is considered, but group structure is ignored. Application to data on total plasma immunoglobulin E (IgE) concentrations in the Framingham Heart Study (FHS), using sex and smoking status as covariates, yields several potentially interesting gene-covariate interactions.
Acknowledgments
This research is supported by National Institute Health grants ES020827, DK078616, N01 HC25195 and P01 AI050516 (in part). A portion of this research was conducted using the Linux Clusters for Genetic Analysis (LinGA) computing resources at Boston University Medical Campus.
References
Bickel, P., Y. Ritov and A. Tsybakov (2009): “Simultaneous analysis of lasso and dantzig selector,” Ann. Stat., 37, 1705–1732.Suche in Google Scholar
Candes, E. and T. Tao (2007): “The dantzig selector: Statistical estimation when p is much larger than n (with discussion),” Ann. Stat., 35, 2313–2351.Suche in Google Scholar
Chen, G. and D. Thomas (2010): “Using biological knowledge to discover higher order interactions in genetic association studies,” Genet. Epidemiol., 34, 863–878.Suche in Google Scholar
Chipman, H. (1996): “Bayesian variable selection with related predictors,” Can. J. Stat., 24, 17–36.Suche in Google Scholar
Choi, N., W. Li and J. Zhu (2010): “Variable selection with the strong heredity constraint and its oracle property,” J. Am. Stat. Assoc., 105, 354–364.Suche in Google Scholar
Fan, J. and R. Li (2001): “Variable selection via nonconcave penalized likelihood and its oracle properties,” J. Am. Stat. Assoc., 96, 1348–1360.Suche in Google Scholar
Fan, J. and J. Lv (2008): “Sure independence screening for ultra-high dimensional feature space,” J. R. Stat. Soc., Series B, 70, 849–911.Suche in Google Scholar
Friedman, J., T. Hastie and R. Tibshirani (2010a): “A note on the group lasso and sparse group lasso,” arXiv:1001.0736v1. (http://arxiv.org/pdf/1001.0736v1.pdf).Suche in Google Scholar
Friedman, J., T. Hastie and R. Tibshirani (2010b): “Regularization paths for generalized linear models via coordinate descent,” J. Stat. Software, 33, 1–22.10.18637/jss.v033.i01Suche in Google Scholar
Fu, W. (1998): “Penalized regression: the bridge versus the lasso,” J. Comput. Graph. Stat., 7, 397–416.Suche in Google Scholar
Gauderman, W., C. Murcray, F. Gilliland and D. Conti (2007): “Testing association between disease and multiple SNPs in a candidate gene,” Genet. Epidemiol., 31, 383–395.Suche in Google Scholar
Granada, M., J. Wilk, M. Tuzova, D. Strachan, S. Weiding, E. Albrecht, C. Gieger, J. Heinrish, B. Himes, G. Hunninghake, J. Celedn, S. Weiss, W. Cruikshank, L. Farrer, D. Center and G. O’Connor (2012): “A genome-wide association study of plasma total IgE concentration in the Framingham Heart Study,” J. Allergy Clin. Immun., 129, 840–845.Suche in Google Scholar
Hamada, M. and C. Wu (1992): “Analysis of designed experiments with complex aliasing,” J. Qual. Technol., 24, 130–137.Suche in Google Scholar
Huang, J., S. Ma, H. Xie and C. Zhang (2009): “A group bridge approach for variable selection,” Biometrika, 96, 339–355.10.1093/biomet/asp020Suche in Google Scholar PubMed PubMed Central
Joseph, V. (2006): “A Bayesian approach to the design and analysis of fractionated experiments,” Technometrics, 48, 219–229.10.1198/004017005000000652Suche in Google Scholar
Li, Y. and G. Abecasis (2006): “Mach 1.0: rapid haplotype reconstruction and missing genotype inference,” Am. J. Hum. Genet. S., 79, 2290.Suche in Google Scholar
McCullagh, P. and J. Nelder (1989): Generalized linear models, London: Chapman & Hall/CRC.10.1007/978-1-4899-3242-6Suche in Google Scholar
Meinshausen, N. (2007): “Relaxed lasso,” Comput. Stat. Data Anal., 52, 374–393.Suche in Google Scholar
Nardi, Y. and A. Rinaldo (2008): “On the asymptotic properties of the group lasso estimator for linear models,” Electron. J. Stat., 2, 605–633.Suche in Google Scholar
Nelder, J. (1994): “The statistics of linear models: Back to basics,” Stat. Comput., 4, 221–234.Suche in Google Scholar
Radchenko, P. and G. James (2010): “Variable selection using adaptive nonlinear interaction structures in high dimensions,” J. Am. Stat. Assoc., 105, 1541–1553.Suche in Google Scholar
Simon, N., J. Friedman, T. Hastie and R. Tibshirani (2013): “A sparse-group lasso,” J. Comput. Graph. Stat., 22.2, 231–245.10.1080/10618600.2012.681250Suche in Google Scholar
The ENCODE Project Consortium (2012): “An integrated encyclopedia of DNA elements in the human genome,” Nature, 489, 57–74.10.1038/nature11247Suche in Google Scholar PubMed PubMed Central
Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. R. Stat. Soc., Series B, 58, 267–288.Suche in Google Scholar
Wu, T., Y. Chen, T. Hastie, E. Sobel and K. Lange (2009): “Genomewide association analysis by lasso penalized logistic regression,” Bioinformatics, 25, 714–721.10.1093/bioinformatics/btp041Suche in Google Scholar PubMed PubMed Central
Yuan, M. and Y. Lin (2006): “Model selection and estimation in regression with grouped variables,” J. R. Stat. Soc., Series B, 68, 4967.Suche in Google Scholar
Zhao, R., G. Rocha and B. Yu (2009): “The composite absolute penalties family for grouped and hierarchical variable selection,” The Annals of Stat., 6A, 3468–3497.Suche in Google Scholar
Zhou, N. and J. Zhu (2010): “Group variable selection via a hierarchical lasso and its oracle property,” Stat. Interface, 3, 574.Suche in Google Scholar
Zou, H. (2006): “The adaptive lasso and its oracle properties,” J. Am. Stat. Assoc., 101, 1418–1429.Suche in Google Scholar
Zou, H. and T. Hastie (2005): “Regularization and variable selection via the elastic net,” J. R. Stat. Soc., Series B, 67, 301–320.Suche in Google Scholar
©2015 by De Gruyter
Artikel in diesem Heft
- Frontmatter
- Research Articles
- A novel method to prioritize RNAseq data for post-hoc analysis based on absolute changes in transcript abundance
- A mutual information estimator with exponentially decaying bias
- Bayes factors based on robust TDT-type tests for family trio design
- Modeling gene-covariate interactions in sparse regression with group structure for genome-wide association studies
- Weighted Kolmogorov Smirnov testing: an alternative for Gene Set Enrichment Analysis
- Application of the fractional-stable distributions for approximation of the gene expression profiles
- Software and Application Notes
- CSI: a nonparametric Bayesian approach to network inference from multiple perturbed time series gene expression data
- TopKLists: a comprehensive R package for statistical inference, stochastic aggregation, and visualization of multiple omics ranked lists
Artikel in diesem Heft
- Frontmatter
- Research Articles
- A novel method to prioritize RNAseq data for post-hoc analysis based on absolute changes in transcript abundance
- A mutual information estimator with exponentially decaying bias
- Bayes factors based on robust TDT-type tests for family trio design
- Modeling gene-covariate interactions in sparse regression with group structure for genome-wide association studies
- Weighted Kolmogorov Smirnov testing: an alternative for Gene Set Enrichment Analysis
- Application of the fractional-stable distributions for approximation of the gene expression profiles
- Software and Application Notes
- CSI: a nonparametric Bayesian approach to network inference from multiple perturbed time series gene expression data
- TopKLists: a comprehensive R package for statistical inference, stochastic aggregation, and visualization of multiple omics ranked lists