Abstract
Multiple comparisons or multiple testing has been viewed as a thorny issue in genetic association studies aiming to detect disease-associated genetic variants from a large number of genotyped variants. We alleviate the problem of multiple comparisons by proposing a hierarchical modeling approach that is fundamentally different from the existing methods. The proposed hierarchical models simultaneously fit as many variables as possible and shrink unimportant effects towards zero. Thus, the hierarchical models yield more efficient estimates of parameters than the traditional methods that analyze genetic variants separately, and also coherently address the multiple comparisons problem due to largely reducing the effective number of genetic effects and the number of statistically “significant” effects. We develop a method for computing the effective number of genetic effects in hierarchical generalized linear models, and propose a new adjustment for multiple comparisons, the hierarchical Bonferroni correction, based on the effective number of genetic effects. Our approach not only increases the power to detect disease-associated variants but also controls the Type I error. We illustrate and evaluate our method with real and simulated data sets from genetic association studies. The method has been implemented in our freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).
We would like to thank Drs. Jonathan Cohen and Helen Hobbs for access to the Dallas Heart Study dataset, and Drs. Virginia G. Kaklamani and Boris Pasche for access to the Colorectal Cancer Case-Control dataset. This work was supported in part by the research grants: NIH 5R01GM069430-08 and NIH 5R01DA025095.
Appendix A
The hierarchical prior distributions and the EM-IWLS algorithm
A variety of prior distributions have been proposed for coefficients in high-dimensional models (Park and Casella 2008; Yi and Xu 2008; Armagan et al. 2010; Kyung et al. 2010; Yi and Ma 2012). Most of these priors can be expressed as a mixture of normal distributions,
with variances
following certain hyper-prior distributions. Although our method can be used to various priors for the variances
we describe our algorithm for the hierarchical exponential distribution with group-specific hyperparameters:
where the subscript k[j] indexes the group k that the j-th predictor belongs to. The hyperparameter sk controls the amount of shrinkage in the variance estimate; a large value of sk forces the variance
closer to zero. This prior distribution includes group-specific parameters sk and variable-specific parameters 
We further treat the hyperparameters sk as unknown parameters with the Gamma hyper-prior distributions:
As a typical default specification for the hyperparameters, one can let a=b=1, which induces the standard double Pareto distributions for the coefficients and usually works well in high-dimensional settings (Armagan et al. 2010).
We fit the generalized linear models with the hierarchical priors by estimating the marginal posterior modes of the parameters (β, ϕ). We modify the usual iterative weighted least squares (IWLS) for fitting classical GLMs and incorporate an EM algorithm into the modified IWLS procedure. The EM-IWLS algorithm increases the marginal posterior density of the parameters (β, ϕ) at each step and thus converges to a local mode. Our EM algorithm treats the unknown variances
and the hyperparameters sk[j] as missing data and estimates the parameters (β, ϕ) by averaging over these missing values. At each step of the iteration, we replace the terms involving the parameters (β, ϕ) and the missing values
by their conditional expectations, and then update the parameters (β, ϕ) by maximizing the expected value of the joint log-posterior density,
For the E-step of the algorithm, we take the expectation of the above joint log-posterior density with respect to the conditional posterior distributions of the variances and the hyperparameters. The conditional posterior distributions are
Therefore, we have the conditional expectations
In the M-step, we update (β, ϕ) by maximizing
where
and
for j=1, ··, J. This is equivalent to solving the generalized linear model yi~p(yi|Xiβ, ϕ) with the normal priors
Thus, the parameters (β, ϕ) can be updated using the modified IWLS algorithm as described in the main text.
References
Armagan, A., D. Dunson and J. Lee (2010): “Bayesian generalized double Pareto shrinkage.” Biometrika. Arxiv preprint arxiv:1104.0861.Search in Google Scholar
Balding, D. J. (2006): “A tutorial on statistical methods for population association studies,” Nat. Rev. Genet., 7, 781–791.Search in Google Scholar
Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. R. Stat. Soc. Ser. B, 57, 289–300.Search in Google Scholar
Benjamini, Y. and D. Yekutieli (2001): “The control of the false discovery rate in multiple testing under dependency,” Ann. Stat., 29, 1165–1188.Search in Google Scholar
Benjamini, Y. and D. Yekutieli (2005): “Quantitative trait Loci analysis using the false discovery rate,” Genetics, 171, 783–790.10.1534/genetics.104.036699Search in Google Scholar PubMed PubMed Central
Galwey, N. W. (2009): “A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests,” Genet. Epidemiol., 33, 559–568.10.1002/gepi.20408Search in Google Scholar PubMed
Gao, X., L. C. Becker, D. M. Becker, J. D. Starmer and M. A. Province (2010): “Avoiding the high Bonferroni penalty in genome-wide association studies,” Genet. Epidemiol., 34, 100–105.Search in Google Scholar
Gao, X., J. Starmer and E. R. Martin (2008): “A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms,” Genet. Epidemiol., 32, 361–369.Search in Google Scholar
Gelman, A. and J. Hill (2007): Data analysis using regression and Multilevel/Hierarchical models, New York: Cambridge University Press.10.1017/CBO9780511790942Search in Google Scholar
Gelman, A., J. Carlin, H. Stern and D. Rubin (2003): Bayesian data analysis, Chapman and Hall, London.10.1201/9780429258480Search in Google Scholar
Gelman, A., J. Hill and M. Yajima (2012): “Why we (usually) don’t have to worry about multiple comparisons,” J. Res. Educ. Eff. 5, 189–211.Search in Google Scholar
Gelman, A., A. Jakulin, M. G. Pittau and Y. S. Su (2008): “A weakly informative default prior distribution for logistic and other regression models,” Ann. Appl. Stat., 2, 1360–1383.Search in Google Scholar
Hochberg, Y. (1988): “A sharper Bonferroni procedure for multiple tests of significance,” Biometrika, 75, 800–803.10.1093/biomet/75.4.800Search in Google Scholar
Hoffmann, T. J., N. J. Marini and J. S. Witte (2010): “Comprehensive approach to analyzing rare genetic variants,” PLoS One, 5, e13584.10.1371/journal.pone.0013584Search in Google Scholar PubMed PubMed Central
Holm, S. (1979): “A simple sequentially rejective multiple test procedure,” Scan. J. Stat., 6, 65–70.Search in Google Scholar
Hommel, G. (1988): “A stagewise rejective multiple test procedure based on a modified Bonferroni test,” Biometrika, 75, 383–386.10.1093/biomet/75.2.383Search in Google Scholar
Hsu, J. C. (1996): “Multiple comparisons: theory and methods, London: Chapman and Hall.10.1201/b15074Search in Google Scholar
Hung, R., P. Brennan, C. Malaveille, S. Porru, F. Donato, P. Boffetta, and J. S. Witte (2004): “Using hierarchical modeling in genetic association studies with multiple markers: application to a case-control study of bladder cancer,” Cancer Epidem. Biomar. Prev., 13, 1013–1021.Search in Google Scholar
Kaklamani, V. G., K. B. Wisinski, M. Sadim, C. Gulden, A. Do, K. Offit, J. A. Baron, H. Ahsan, C. Mantzoros, B. Pasche (2008): “Variants of the adiponectin (ADIPOQ) and adiponectin receptor 1 (ADIPOR1) genes and colorectal cancer risk,” J. Am. Med. Assoc., 300, 1523–1531.Search in Google Scholar
Kang, G., K. Ye, N. Liu, D. B. Allison and G. Gao (2009): “Weighted multiple hypothesis testing procedures,” Stat. Appl. Genet. Mol. Biol., 8, 1–21.Search in Google Scholar
King, C. R., P. J. Rathouz and D. L. Nicolae (2010): “An evolutionary framework for association testing in resequencing studies,” PLoS Genet., 6, e1001202.Search in Google Scholar
Kyung, M., J. Gill, M. Ghosh and G. Casella (2010): “Penalized regression, standard errors, and Bayesian lassos,” Bayesian Anal., 5, 369–412.Search in Google Scholar
Lu, H., J. S. Hodges and B. P. Carlin (2007): “Measuring the complexity of generalized linear hierarchical models,” Can. J. Stat., 35, 69–87.Search in Google Scholar
Madsen, B. E. and S. R. Browning (2009): “A groupwise association test for rare mutations using a weighted sum statistic,” PLoS Genet., 5, e1000384.Search in Google Scholar
McCullagh, P. and J. A. Nelder (1989): Generalized linear models, London: Chapman and Hall.10.1007/978-1-4899-3242-6Search in Google Scholar
Park, T. and G. Casella (2008): “The Bayesian lasso,” J. Am. Stat. Assoc., 103, 681–686.Search in Google Scholar
Price, A. L., G. V. Kryukov, P. I. de Bakker, S. M. Purcell, J. Staples, L. J Wei, S. R Sunyaev (2010): “Pooled association tests for rare variants in exon-resequencing studies,” Am. J. Hum. Genet., 86, 832–838.Search in Google Scholar
Pritchard, J. K. (2001): “Are rare variants responsible for susceptibility to complex diseases?” Am. J. Hum. Genet., 69, 124–137.Search in Google Scholar
Pritchard, J. K. and N. J. Cox (2002): “The allelic architecture of human disease genes: common disease-common variant…or not?” Hum. Mol. Genet., 11, 2417–2423.Search in Google Scholar
Rice, T. K., N. J. Schork and D. C. Rao (2008): “Methods for handling multiple testing,” Adv. Genet., 60, 293–308.Search in Google Scholar
Roeder, K., B. Devlin and L. Wasserman (2007): “Improving power in genome-wide association studies: weights tip the scale,” Genet. Epidemiol., 31, 741–747.Search in Google Scholar
Romeo, S., L. A. Pennacchio, Y. Fu, E. Boerwinkle, A. Tybjaerg-Hansen, H.H. Hobbs and J.C. Cohen (2007): “Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL,” Nat. Genet., 39, 513–516.Search in Google Scholar
Romeo, S., W. Yin, J. Kozlitina, L. A. Pennacchio and E. Boerwinkle (2009): “Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans,” J. Clin. Invest., 119, 70–79.Search in Google Scholar
Sabatti, C., S. Service and N. Freimer (2003): “False discovery rate in linkage and association genome screens for complex disorders,” Genetics, 164, 829–833.10.1093/genetics/164.2.829Search in Google Scholar PubMed PubMed Central
Schaid, D. J., J. P. Sinnwell, G. D. Jenkins, S. K. McDonnell and J. N. Ingle (2011): “Using the gene ontology to scan multilevel gene sets for associations in genome wide association studies,” Genet. Epidemiol., 36, 3–16.Search in Google Scholar
Spiegelhalter, D. J., N. G. Best, B. P. Carlin and A. v. d. Linde (2002): “Bayesian measures of model complexity and fit (with discussion),” J. R. Stat. Soc. Ser. B, 64, 583–639.Search in Google Scholar
Thomas, D. C., D. V. Conti, J. Baurley, F. Nijhout and M. Reed (2009): “Use of pathway information in molecular epidemiology,” Hum. Genom., 4, 21–42.Search in Google Scholar
Wang, K., M. Li and H. Hakonarson (2010): “Analysing biological pathways in genome-wide association studies,” Nat. Rev. Genet., 11, 843–854.Search in Google Scholar
Yi, N. and S. Banerjee (2009): “Hierarchical generalized linear models for multiple quantitative trait locus mapping,” Genetics, 181, 1101–1113.10.1534/genetics.108.099556Search in Google Scholar PubMed PubMed Central
Yi, N. and S. Xu (2008): “Bayesian LASSO for quantitative trait loci mapping,” Genetics, 179, 1045–1055.10.1534/genetics.107.085589Search in Google Scholar PubMed PubMed Central
Yi, N. and D. Zhi (2011): “Bayesian analysis of rare variants in genetic association studies,” Genet. Epidemiol., 35: 57–69.Search in Google Scholar
Yi, N. and S. Ma (2012): “Hierarchical shrinkage priors and model fitting algorithms for high-dimensional generalized linear models,” Stat. App. Genet. Mol. Biol., 11, 1544–6115.Search in Google Scholar
Yi, N., V. G. Kaklamani and B. Pasche (2011a): “Bayesian analysis of genetic interactions in case-control studies, with application to adiponectin genes and colorectal cancer risk,” Ann. Hum. Genet., 75, 90–104.10.1111/j.1469-1809.2010.00605.xSearch in Google Scholar PubMed PubMed Central
Yi, N., N. Liu, D. Zhi and J. Li (2011b): “Hierarchical generalized linear models for multiple groups of rare and common variants: jointly estimating group and individual-variant effects,” PLoS Genet., 7, e1002382.10.1371/journal.pgen.1002382Search in Google Scholar PubMed PubMed Central
©2014 by Walter de Gruyter Berlin Boston
Articles in the same Issue
- Masthead
- Masthead
- Research Articles
- Modeling angles in proteins and circular genomes using multivariate angular distributions based on multiple nonnegative trigonometric sums
- Second order optimization for the inference of gene regulatory pathways
- Multiple comparisons in genetic association studies: a hierarchical modeling approach
- A Bayesian decision procedure for testing multiple hypotheses in DNA microarray experiments
- Semi-automatic selection of summary statistics for ABC model choice
- Detection of epistatic effects with logic regression and a classical linear regression model
- Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model
Articles in the same Issue
- Masthead
- Masthead
- Research Articles
- Modeling angles in proteins and circular genomes using multivariate angular distributions based on multiple nonnegative trigonometric sums
- Second order optimization for the inference of gene regulatory pathways
- Multiple comparisons in genetic association studies: a hierarchical modeling approach
- A Bayesian decision procedure for testing multiple hypotheses in DNA microarray experiments
- Semi-automatic selection of summary statistics for ABC model choice
- Detection of epistatic effects with logic regression and a classical linear regression model
- Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model