Abstract
Genetic association studies lead to simultaneous categorical data analysis. The sample for every genetic locus consists of a contingency table containing the numbers of observed genotype-phenotype combinations. Under case-control design, the row counts of every table are identical and fixed, while column counts are random. The aim of the statistical analysis is to test independence of the phenotype and the genotype at every locus. We present an objective Bayesian methodology for these association tests, which relies on the conjugacy of Dirichlet and multinomial distributions. Being based on the likelihood principle, the Bayesian tests avoid looping over all tables with given marginals. Making use of data generated by The Wellcome Trust Case Control Consortium (WTCCC), we illustrate that the ordering of the Bayes factors shows a good agreement with that of frequentist p-values. Furthermore, we deal with specifying prior probabilities for the validity of the null hypotheses, by taking linkage disequilibrium structure into account and exploiting the concept of effective numbers of tests. Application of a Bayesian decision theoretic multiple test procedure to the WTCCC data illustrates the proposed methodology. Finally, we discuss two methods for reconciling frequentist and Bayesian approaches to the multiple association test problem.
Acknowledgments
This work makes use of data generated by the Wellcome Trust Case Control Consortium. A full list of the investigators who contributed to the generation of the data is available from http://www.wtccc.org.uk. Funding for the Wellcome Trust Case Control Consortium project was provided by the Wellcome Trust under award 076113. The author is grateful to two anonymous referees for their careful reading of the manuscript and constructive comments which have improved the presentation.
References
Agresti, A. (2002): Categorical data analysis, 2nd edition, Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics. Chichester: Wiley.10.1002/0471249688Search in Google Scholar
Agresti, A. and D. B. Hitchcock (2005): “Bayesian inference for categorical data analysis,” Stat. Methods Appl., 14, 297–330.Search in Google Scholar
Bakke, Ø. and M. Langaas (2012): “The number of 2×c tables with given margins,” Statistics Preprint No. 11/2012, Trondheim: Norwegian University of Science and Technology.Search in Google Scholar
Cheverud, J. M. (2001): “A simple correction for multiple comparisons in interval mapping genome scans,” Heredity, 87, 52–58.10.1046/j.1365-2540.2001.00901.xSearch in Google Scholar PubMed
Crook, J. and I. Good (1980): “On the application of symmetric Dirichlet distributions and their mixtures to contingency tables. II,” Ann. Stat., 8, 1198–1218.Search in Google Scholar
Dawid, A. P. (1987): “The difficulty about conjunction,” J. Roy. Stat. Soc. D-Stat., 2/3, 91–97.10.2307/2348501Search in Google Scholar
Dickhaus, T. (2014): Simultaneous statistical inference with applications in the life sciences, Berlin Heidelberg: Springer-Verlag.10.1007/978-3-642-45182-9Search in Google Scholar
Dickhaus, T. and J. Stange (2013): “Multiple point hypothesis test problems and effective numbers of tests for control of the family-wise error rate,” Calcutta Stat. Assoc. Bull., 65, 123–144.Search in Google Scholar
Dickhaus, T., K. Strassburger, D. Schunk, C. Morcillo-Suarez, T. Illig and A. Navarro (2012): “How to analyze many contingency tables simultaneously in genetic association studies,” Stat. Appl. Genet. Mol. Biol., 11, Article 12.Search in Google Scholar
Do, K.-A., P. Müller and F. Tang (2005): “A Bayesian mixture model for differential gene expression,” J. R. Stat. Soc., Ser. C, Appl. Stat., 54, 627–644.Search in Google Scholar
Efron, B. (2010): Large-scale inference. Empirical Bayes methods for estimation, testing, and prediction, Cambridge: Cambridge University Press.10.1017/CBO9780511761362Search in Google Scholar
Finner, H., K. Straßburger, I. M. Heid, C. Herder, W. Rathmann, G. Giani, T. Dickhaus, P. Lichtner, T. Meitinger, H.-E. Wichmann, T. Illig and C. Gieger (2010): “How to link call rate and p-values for Hardy-Weinberg equilibrium as measures of genome-wide SNP data quality,” Stat. Med., 29, 2347–2358.10.1002/sim.4004Search in Google Scholar PubMed
Fisher, R. A. (1922): “On the interpretation of χ2 from contingency tables, and the calculation of p,” J. Roy. Stat. Soc., 85, 87–94.Search in Google Scholar
Gao, X., J. Starmer and E. R. Martin (2008): “A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms,” Genet. Epidemiol., 32, 361–369.Search in Google Scholar
Geisser, S. (1984): “On prior distributions for binary trials,” Am. Stat., 38, 244–251.Search in Google Scholar
Gómez-Villegas, M. and B. González-Pérez (2010): “r×s tables from a Bayesian viewpoint,” Rev. Mat. Complut., 23, 19–35.Search in Google Scholar
Good, I. (1976): “On the application of symmetric Dirichlet distributions and their mixtures to contingency tables,” Ann. Stat., 4, 1159–1189.Search in Google Scholar
Guan, Y. and M. Stephens (2011): “Bayesian variable selection regression for genome-wide association studies and other large-scale problems,” Ann. Appl. Stat., 5, 1780–1815.Search in Google Scholar
Habiger, J. D. and E. A. Peña (2011): “Randomised P-values and nonparametric procedures in multiple testing,” J. Nonparametr. Stat., 23, 583–604.Search in Google Scholar
Langaas, M. and Ø. Bakke (2014): “Robust methods to detect disease-genotype association in genetic association studies: calculate p-values using exact conditional enumeration instead of simulated permutations or asymptotic approximations,” Stat. Appl. Genet. Mol. Biol., 13, 675–692.10.1515/sagmb-2013-0084Search in Google Scholar PubMed
León-Novelo, L. G., P. Müller, W. Arap, J. Sun, R. Pasqualini and K.-A. Do (2013): “Bayesian decision theoretic multiple comparison procedures: an application to phage display data,” Biom. J., 55, 478–489.Search in Google Scholar
Lewontin, R. C. and K. I. Kojima (1960): “The evolutionary dynamics of complex polymorphisms,” Evolution, 14, 458–472.10.1111/j.1558-5646.1960.tb03113.xSearch in Google Scholar
Lydersen, S., M. W. Fagerland and P. Laake (2009): “Recommended tests for association in 2× 2 tables,” Stat. Med., 28, 1159–1175.Search in Google Scholar
Malovini, A., N. Barbarini, R. Bellazzi and F. de Michelis (2012): “Hierarchical Naive Bayes for genetic association studies,” BMC Bioinformatics, 13(Suppl 14), S6.10.1186/1471-2105-13-S14-S6Search in Google Scholar PubMed PubMed Central
McCarroll, S. A., F. G. Kuruvilla, J. M. Korn, S. Cawley, J. Nemesh, A. Wysoker, M. H. Shapero, P. I. de Bakker, J. B. Maller, A. Kirby, A. L. Elliott, M. Parkin, E. Hubbell, T. Webster, R. Mei, J. Veitch, P. J. Collins, R. Handsaker, S. Lincoln, M. Nizzari, J. Blume, K. W. Jones, R. Rava, M. J. Daly, S. B. Gabriel and D. Altshuler (2008): “Integrated detection and population-genetic analysis of SNPs and copy number variation,” Nat. Genet., 40, 1166–1174.Search in Google Scholar
Moskvina, V. and K. M. Schmidt (2008): “On multiple-testing correction in genome-wide association studies,” Genet. Epidemiol., 32, 567–573.Search in Google Scholar
Müller, P., G. Parmigiani, C. Robert and J. Rousseau (2004): “Optimal sample size for multiple testing: the case of gene expression microarrays,” J. Am. Stat. Assoc., 99, 990–1001.Search in Google Scholar
Müller, P., G. Parmigiani and K. Rice (2007): FDR and Bayesian multiple comparisons rules. In: Bernardo, J. M., Bayarri, M. J., Berger, J. O., Dawid, A. P., Heckerman, D., Smith, A. F. M. and West, M. (Eds.), Bayesian Statistics 8 – Proc. ISBA 8thWorldMeeting on Bayesian Statistics, Oxford: Oxford University Press, pp. 349–370.Search in Google Scholar
Ng, K. W., G.-L. Tian and M.-L. Tang (2011): Dirichlet and related distributions: theory, methods and applications, Hoboken, NJ: John Wiley & Sons.10.1002/9781119995784Search in Google Scholar
Nyholt, D. R. (2004): “A simple correction for multiple testing for SNPs in linkage disequilibrium with each other,” Am. J. Hum. Genet., 74, 765–769.Search in Google Scholar
Patefield, W. (1981): “An efficient method of generating random R×C tables with given row and column totals. (Algorithm AS 159.),” J. R. Stat. Soc., Ser. C, 30, 91–97.Search in Google Scholar
Pearson, K. (1900): “On the criterion, that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling,” Phil. Mag., 5(50), 157–175.10.1080/14786440009463897Search in Google Scholar
Scott, J. G. and J. O. Berger (2010): “Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem,” Ann. Stat., 38, 2587–2619.Search in Google Scholar
Sellke, T., M. Bayarri and J. O. Berger (2001): “Calibration of p values for testing precise null hypotheses,” Am. Stat., 55, 62–71.Search in Google Scholar
The 1000 Genomes Consortium (2010): “A map of human genome variation from population-scale sequencing,” Nature, 467, 1061–1073.10.1038/nature09534Search in Google Scholar PubMed PubMed Central
The International HapMap Consortium (2005): “A haplotype map of the human genome,” Nature, 437, 1299–1320.10.1038/nature04226Search in Google Scholar PubMed PubMed Central
The Wellcome Trust Case Control Consortium (2007): “Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls,” Nature, 447, 661–678.10.1038/nature05911Search in Google Scholar PubMed PubMed Central
Wakefield, J. (2009): “Bayes factors for genome-wide association studies: comparison with P-values,” Genet. Epidemiol., 33, 79–86.Search in Google Scholar
Westfall, P. H., W. O. Johnson and J. M. Utts (1997): “A Bayesian perspective on the Bonferroni adjustment,” Biometrika, 84, 419–427.10.1093/biomet/84.2.419Search in Google Scholar
Yekutieli, D. (2014): “Bayesian tests for composite alternative hypotheses in crosstabulated data,” TEST, 24, 287–301.10.1007/s11749-014-0407-1Search in Google Scholar
©2015 by De Gruyter
Articles in the same Issue
- Frontmatter
- Research Articles
- Exact likelihood-free Markov chain Monte Carlo for elliptically contoured distributions
- Outlier reset CUSUM for the exploration of copy number alteration data
- Simultaneous Bayesian analysis of contingency tables in genetic association studies
- Modeling the next generation sequencing read count data for DNA copy number variant study
- Synonymous and nonsynonymous distances help untangle convergent evolution and recombination
- Node sampling for protein complex estimation in bait-prey graphs
Articles in the same Issue
- Frontmatter
- Research Articles
- Exact likelihood-free Markov chain Monte Carlo for elliptically contoured distributions
- Outlier reset CUSUM for the exploration of copy number alteration data
- Simultaneous Bayesian analysis of contingency tables in genetic association studies
- Modeling the next generation sequencing read count data for DNA copy number variant study
- Synonymous and nonsynonymous distances help untangle convergent evolution and recombination
- Node sampling for protein complex estimation in bait-prey graphs