Multiple comparisons in genetic association studies: a hierarchical modeling approach

Nengjun Yi; Shizhong Xu; Xiang-Yang Lou; Himel Mallick

doi:10.1515/sagmb-2012-0040

Article

Multiple comparisons in genetic association studies: a hierarchical modeling approach

Nengjun Yi , Shizhong Xu , Xiang-Yang Lou and Himel Mallick

Published/Copyright: November 20, 2013

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Statistical Applications in Genetics and Molecular Biology Volume 13 Issue 1

Abstract

Multiple comparisons or multiple testing has been viewed as a thorny issue in genetic association studies aiming to detect disease-associated genetic variants from a large number of genotyped variants. We alleviate the problem of multiple comparisons by proposing a hierarchical modeling approach that is fundamentally different from the existing methods. The proposed hierarchical models simultaneously fit as many variables as possible and shrink unimportant effects towards zero. Thus, the hierarchical models yield more efficient estimates of parameters than the traditional methods that analyze genetic variants separately, and also coherently address the multiple comparisons problem due to largely reducing the effective number of genetic effects and the number of statistically “significant” effects. We develop a method for computing the effective number of genetic effects in hierarchical generalized linear models, and propose a new adjustment for multiple comparisons, the hierarchical Bonferroni correction, based on the effective number of genetic effects. Our approach not only increases the power to detect disease-associated variants but also controls the Type I error. We illustrate and evaluate our method with real and simulated data sets from genetic association studies. The method has been implemented in our freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).

Keywords: Bayesian inference; effective number of parameters; effective number of hypothesis tests; generalized linear models; genetic association studies; hierarchical modeling; hierarchical Bonferroni correction; multiple comparisons

Corresponding author: Nengjun Yi, Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, AL 35294, USA, e-mail: nyi@ms.soph.uab.edu

We would like to thank Drs. Jonathan Cohen and Helen Hobbs for access to the Dallas Heart Study dataset, and Drs. Virginia G. Kaklamani and Boris Pasche for access to the Colorectal Cancer Case-Control dataset. This work was supported in part by the research grants: NIH 5R01GM069430-08 and NIH 5R01DA025095.

Appendix A

The hierarchical prior distributions and the EM-IWLS algorithm

A variety of prior distributions have been proposed for coefficients in high-dimensional models (Park and Casella 2008; Yi and Xu 2008; Armagan et al. 2010; Kyung et al. 2010; Yi and Ma 2012). Most of these priors can be expressed as a mixture of normal distributions, with variances following certain hyper-prior distributions. Although our method can be used to various priors for the variances we describe our algorithm for the hierarchical exponential distribution with group-specific hyperparameters:

where the subscript k[j] indexes the group k that the j-th predictor belongs to. The hyperparameter s_k controls the amount of shrinkage in the variance estimate; a large value of s_k forces the variance closer to zero. This prior distribution includes group-specific parameters s_k and variable-specific parameters

We further treat the hyperparameters s_k as unknown parameters with the Gamma hyper-prior distributions:

As a typical default specification for the hyperparameters, one can let a=b=1, which induces the standard double Pareto distributions for the coefficients and usually works well in high-dimensional settings (Armagan et al. 2010).

We fit the generalized linear models with the hierarchical priors by estimating the marginal posterior modes of the parameters (β, ϕ). We modify the usual iterative weighted least squares (IWLS) for fitting classical GLMs and incorporate an EM algorithm into the modified IWLS procedure. The EM-IWLS algorithm increases the marginal posterior density of the parameters (β, ϕ) at each step and thus converges to a local mode. Our EM algorithm treats the unknown variances and the hyperparameters s_k_[_j_] as missing data and estimates the parameters (β, ϕ) by averaging over these missing values. At each step of the iteration, we replace the terms involving the parameters (β, ϕ) and the missing values by their conditional expectations, and then update the parameters (β, ϕ) by maximizing the expected value of the joint log-posterior density,

For the E-step of the algorithm, we take the expectation of the above joint log-posterior density with respect to the conditional posterior distributions of the variances and the hyperparameters. The conditional posterior distributions are

Therefore, we have the conditional expectations

In the M-step, we update (β, ϕ) by maximizing where and for j=1, ··, J. This is equivalent to solving the generalized linear model y_i~p(y_i|X_iβ, ϕ) with the normal priors Thus, the parameters (β, ϕ) can be updated using the modified IWLS algorithm as described in the main text.

References

Armagan, A., D. Dunson and J. Lee (2010): “Bayesian generalized double Pareto shrinkage.” Biometrika. Arxiv preprint arxiv:1104.0861.Search in Google Scholar

Balding, D. J. (2006): “A tutorial on statistical methods for population association studies,” Nat. Rev. Genet., 7, 781–791.Search in Google Scholar

Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. R. Stat. Soc. Ser. B, 57, 289–300.Search in Google Scholar

Benjamini, Y. and D. Yekutieli (2001): “The control of the false discovery rate in multiple testing under dependency,” Ann. Stat., 29, 1165–1188.Search in Google Scholar

Benjamini, Y. and D. Yekutieli (2005): “Quantitative trait Loci analysis using the false discovery rate,” Genetics, 171, 783–790.10.1534/genetics.104.036699Search in Google Scholar PubMed PubMed Central

Galwey, N. W. (2009): “A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests,” Genet. Epidemiol., 33, 559–568.10.1002/gepi.20408Search in Google Scholar PubMed

Gao, X., L. C. Becker, D. M. Becker, J. D. Starmer and M. A. Province (2010): “Avoiding the high Bonferroni penalty in genome-wide association studies,” Genet. Epidemiol., 34, 100–105.Search in Google Scholar

Gao, X., J. Starmer and E. R. Martin (2008): “A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms,” Genet. Epidemiol., 32, 361–369.Search in Google Scholar

Gelman, A. and J. Hill (2007): Data analysis using regression and Multilevel/Hierarchical models, New York: Cambridge University Press.10.1017/CBO9780511790942Search in Google Scholar

Gelman, A., J. Carlin, H. Stern and D. Rubin (2003): Bayesian data analysis, Chapman and Hall, London.10.1201/9780429258480Search in Google Scholar

Gelman, A., J. Hill and M. Yajima (2012): “Why we (usually) don’t have to worry about multiple comparisons,” J. Res. Educ. Eff. 5, 189–211.Search in Google Scholar

Gelman, A., A. Jakulin, M. G. Pittau and Y. S. Su (2008): “A weakly informative default prior distribution for logistic and other regression models,” Ann. Appl. Stat., 2, 1360–1383.Search in Google Scholar

Hochberg, Y. (1988): “A sharper Bonferroni procedure for multiple tests of significance,” Biometrika, 75, 800–803.10.1093/biomet/75.4.800Search in Google Scholar

Hoffmann, T. J., N. J. Marini and J. S. Witte (2010): “Comprehensive approach to analyzing rare genetic variants,” PLoS One, 5, e13584.10.1371/journal.pone.0013584Search in Google Scholar PubMed PubMed Central

Holm, S. (1979): “A simple sequentially rejective multiple test procedure,” Scan. J. Stat., 6, 65–70.Search in Google Scholar

Hommel, G. (1988): “A stagewise rejective multiple test procedure based on a modified Bonferroni test,” Biometrika, 75, 383–386.10.1093/biomet/75.2.383Search in Google Scholar

Hsu, J. C. (1996): “Multiple comparisons: theory and methods, London: Chapman and Hall.10.1201/b15074Search in Google Scholar

Hung, R., P. Brennan, C. Malaveille, S. Porru, F. Donato, P. Boffetta, and J. S. Witte (2004): “Using hierarchical modeling in genetic association studies with multiple markers: application to a case-control study of bladder cancer,” Cancer Epidem. Biomar. Prev., 13, 1013–1021.Search in Google Scholar

Kaklamani, V. G., K. B. Wisinski, M. Sadim, C. Gulden, A. Do, K. Offit, J. A. Baron, H. Ahsan, C. Mantzoros, B. Pasche (2008): “Variants of the adiponectin (ADIPOQ) and adiponectin receptor 1 (ADIPOR1) genes and colorectal cancer risk,” J. Am. Med. Assoc., 300, 1523–1531.Search in Google Scholar

Kang, G., K. Ye, N. Liu, D. B. Allison and G. Gao (2009): “Weighted multiple hypothesis testing procedures,” Stat. Appl. Genet. Mol. Biol., 8, 1–21.Search in Google Scholar

King, C. R., P. J. Rathouz and D. L. Nicolae (2010): “An evolutionary framework for association testing in resequencing studies,” PLoS Genet., 6, e1001202.Search in Google Scholar

Kyung, M., J. Gill, M. Ghosh and G. Casella (2010): “Penalized regression, standard errors, and Bayesian lassos,” Bayesian Anal., 5, 369–412.Search in Google Scholar

Lu, H., J. S. Hodges and B. P. Carlin (2007): “Measuring the complexity of generalized linear hierarchical models,” Can. J. Stat., 35, 69–87.Search in Google Scholar

Madsen, B. E. and S. R. Browning (2009): “A groupwise association test for rare mutations using a weighted sum statistic,” PLoS Genet., 5, e1000384.Search in Google Scholar

McCullagh, P. and J. A. Nelder (1989): Generalized linear models, London: Chapman and Hall.10.1007/978-1-4899-3242-6Search in Google Scholar

Park, T. and G. Casella (2008): “The Bayesian lasso,” J. Am. Stat. Assoc., 103, 681–686.Search in Google Scholar

Price, A. L., G. V. Kryukov, P. I. de Bakker, S. M. Purcell, J. Staples, L. J Wei, S. R Sunyaev (2010): “Pooled association tests for rare variants in exon-resequencing studies,” Am. J. Hum. Genet., 86, 832–838.Search in Google Scholar

Pritchard, J. K. (2001): “Are rare variants responsible for susceptibility to complex diseases?” Am. J. Hum. Genet., 69, 124–137.Search in Google Scholar

Pritchard, J. K. and N. J. Cox (2002): “The allelic architecture of human disease genes: common disease-common variant…or not?” Hum. Mol. Genet., 11, 2417–2423.Search in Google Scholar

Rice, T. K., N. J. Schork and D. C. Rao (2008): “Methods for handling multiple testing,” Adv. Genet., 60, 293–308.Search in Google Scholar

Roeder, K., B. Devlin and L. Wasserman (2007): “Improving power in genome-wide association studies: weights tip the scale,” Genet. Epidemiol., 31, 741–747.Search in Google Scholar

Romeo, S., L. A. Pennacchio, Y. Fu, E. Boerwinkle, A. Tybjaerg-Hansen, H.H. Hobbs and J.C. Cohen (2007): “Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL,” Nat. Genet., 39, 513–516.Search in Google Scholar

Romeo, S., W. Yin, J. Kozlitina, L. A. Pennacchio and E. Boerwinkle (2009): “Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans,” J. Clin. Invest., 119, 70–79.Search in Google Scholar

Sabatti, C., S. Service and N. Freimer (2003): “False discovery rate in linkage and association genome screens for complex disorders,” Genetics, 164, 829–833.10.1093/genetics/164.2.829Search in Google Scholar PubMed PubMed Central

Schaid, D. J., J. P. Sinnwell, G. D. Jenkins, S. K. McDonnell and J. N. Ingle (2011): “Using the gene ontology to scan multilevel gene sets for associations in genome wide association studies,” Genet. Epidemiol., 36, 3–16.Search in Google Scholar

Spiegelhalter, D. J., N. G. Best, B. P. Carlin and A. v. d. Linde (2002): “Bayesian measures of model complexity and fit (with discussion),” J. R. Stat. Soc. Ser. B, 64, 583–639.Search in Google Scholar

Thomas, D. C., D. V. Conti, J. Baurley, F. Nijhout and M. Reed (2009): “Use of pathway information in molecular epidemiology,” Hum. Genom., 4, 21–42.Search in Google Scholar

Wang, K., M. Li and H. Hakonarson (2010): “Analysing biological pathways in genome-wide association studies,” Nat. Rev. Genet., 11, 843–854.Search in Google Scholar

Yi, N. and S. Banerjee (2009): “Hierarchical generalized linear models for multiple quantitative trait locus mapping,” Genetics, 181, 1101–1113.10.1534/genetics.108.099556Search in Google Scholar PubMed PubMed Central

Yi, N. and S. Xu (2008): “Bayesian LASSO for quantitative trait loci mapping,” Genetics, 179, 1045–1055.10.1534/genetics.107.085589Search in Google Scholar PubMed PubMed Central

Yi, N. and D. Zhi (2011): “Bayesian analysis of rare variants in genetic association studies,” Genet. Epidemiol., 35: 57–69.Search in Google Scholar

Yi, N. and S. Ma (2012): “Hierarchical shrinkage priors and model fitting algorithms for high-dimensional generalized linear models,” Stat. App. Genet. Mol. Biol., 11, 1544–6115.Search in Google Scholar

Yi, N., V. G. Kaklamani and B. Pasche (2011a): “Bayesian analysis of genetic interactions in case-control studies, with application to adiponectin genes and colorectal cancer risk,” Ann. Hum. Genet., 75, 90–104.10.1111/j.1469-1809.2010.00605.xSearch in Google Scholar PubMed PubMed Central

Yi, N., N. Liu, D. Zhi and J. Li (2011b): “Hierarchical generalized linear models for multiple groups of rare and common variants: jointly estimating group and individual-variant effects,” PLoS Genet., 7, e1002382.10.1371/journal.pgen.1002382Search in Google Scholar PubMed PubMed Central

Published Online: 2013-11-20

Published in Print: 2014-02-01

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/sagmb-2012-0040

Keywords for this article

Bayesian inference; effective number of parameters; effective number of hypothesis tests; generalized linear models; genetic association studies; hierarchical modeling; hierarchical Bonferroni correction; multiple comparisons