Simultaneous Bayesian analysis of contingency tables in genetic association studies

Thorsten Dickhaus

doi:10.1515/sagmb-2014-0052

Article

Simultaneous Bayesian analysis of contingency tables in genetic association studies

Thorsten Dickhaus

Published/Copyright: July 28, 2015

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Statistical Applications in Genetics and Molecular Biology Volume 14 Issue 4

Abstract

Genetic association studies lead to simultaneous categorical data analysis. The sample for every genetic locus consists of a contingency table containing the numbers of observed genotype-phenotype combinations. Under case-control design, the row counts of every table are identical and fixed, while column counts are random. The aim of the statistical analysis is to test independence of the phenotype and the genotype at every locus. We present an objective Bayesian methodology for these association tests, which relies on the conjugacy of Dirichlet and multinomial distributions. Being based on the likelihood principle, the Bayesian tests avoid looping over all tables with given marginals. Making use of data generated by The Wellcome Trust Case Control Consortium (WTCCC), we illustrate that the ordering of the Bayes factors shows a good agreement with that of frequentist p-values. Furthermore, we deal with specifying prior probabilities for the validity of the null hypotheses, by taking linkage disequilibrium structure into account and exploiting the concept of effective numbers of tests. Application of a Bayesian decision theoretic multiple test procedure to the WTCCC data illustrates the proposed methodology. Finally, we discuss two methods for reconciling frequentist and Bayesian approaches to the multiple association test problem.

Keywords: Bayes factors; contingency tables; Dirichlet mixtures; effective number of tests; statistical genetics

Corresponding author: Thorsten Dickhaus, Institute for Statistics, University of Bremen, P.O. Box 330 440, D-28344 Bremen, Germany, e-mail: dickhaus@uni-bremen.de

Acknowledgments

This work makes use of data generated by the Wellcome Trust Case Control Consortium. A full list of the investigators who contributed to the generation of the data is available from http://www.wtccc.org.uk. Funding for the Wellcome Trust Case Control Consortium project was provided by the Wellcome Trust under award 076113. The author is grateful to two anonymous referees for their careful reading of the manuscript and constructive comments which have improved the presentation.

References

Agresti, A. (2002): Categorical data analysis, 2nd edition, Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics. Chichester: Wiley.10.1002/0471249688Search in Google Scholar

Agresti, A. and D. B. Hitchcock (2005): “Bayesian inference for categorical data analysis,” Stat. Methods Appl., 14, 297–330.Search in Google Scholar

Bakke, Ø. and M. Langaas (2012): “The number of 2×c tables with given margins,” Statistics Preprint No. 11/2012, Trondheim: Norwegian University of Science and Technology.Search in Google Scholar

Cheverud, J. M. (2001): “A simple correction for multiple comparisons in interval mapping genome scans,” Heredity, 87, 52–58.10.1046/j.1365-2540.2001.00901.xSearch in Google Scholar PubMed

Crook, J. and I. Good (1980): “On the application of symmetric Dirichlet distributions and their mixtures to contingency tables. II,” Ann. Stat., 8, 1198–1218.Search in Google Scholar

Dawid, A. P. (1987): “The difficulty about conjunction,” J. Roy. Stat. Soc. D-Stat., 2/3, 91–97.10.2307/2348501Search in Google Scholar

Dickhaus, T. (2014): Simultaneous statistical inference with applications in the life sciences, Berlin Heidelberg: Springer-Verlag.10.1007/978-3-642-45182-9Search in Google Scholar

Dickhaus, T. and J. Stange (2013): “Multiple point hypothesis test problems and effective numbers of tests for control of the family-wise error rate,” Calcutta Stat. Assoc. Bull., 65, 123–144.Search in Google Scholar

Dickhaus, T., K. Strassburger, D. Schunk, C. Morcillo-Suarez, T. Illig and A. Navarro (2012): “How to analyze many contingency tables simultaneously in genetic association studies,” Stat. Appl. Genet. Mol. Biol., 11, Article 12.Search in Google Scholar

Do, K.-A., P. Müller and F. Tang (2005): “A Bayesian mixture model for differential gene expression,” J. R. Stat. Soc., Ser. C, Appl. Stat., 54, 627–644.Search in Google Scholar

Efron, B. (2010): Large-scale inference. Empirical Bayes methods for estimation, testing, and prediction, Cambridge: Cambridge University Press.10.1017/CBO9780511761362Search in Google Scholar

Finner, H., K. Straßburger, I. M. Heid, C. Herder, W. Rathmann, G. Giani, T. Dickhaus, P. Lichtner, T. Meitinger, H.-E. Wichmann, T. Illig and C. Gieger (2010): “How to link call rate and p-values for Hardy-Weinberg equilibrium as measures of genome-wide SNP data quality,” Stat. Med., 29, 2347–2358.10.1002/sim.4004Search in Google Scholar PubMed

Fisher, R. A. (1922): “On the interpretation of χ2 from contingency tables, and the calculation of p,” J. Roy. Stat. Soc., 85, 87–94.Search in Google Scholar

Gao, X., J. Starmer and E. R. Martin (2008): “A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms,” Genet. Epidemiol., 32, 361–369.Search in Google Scholar

Geisser, S. (1984): “On prior distributions for binary trials,” Am. Stat., 38, 244–251.Search in Google Scholar

Gómez-Villegas, M. and B. González-Pérez (2010): “r×s tables from a Bayesian viewpoint,” Rev. Mat. Complut., 23, 19–35.Search in Google Scholar

Good, I. (1976): “On the application of symmetric Dirichlet distributions and their mixtures to contingency tables,” Ann. Stat., 4, 1159–1189.Search in Google Scholar

Guan, Y. and M. Stephens (2011): “Bayesian variable selection regression for genome-wide association studies and other large-scale problems,” Ann. Appl. Stat., 5, 1780–1815.Search in Google Scholar

Habiger, J. D. and E. A. Peña (2011): “Randomised P-values and nonparametric procedures in multiple testing,” J. Nonparametr. Stat., 23, 583–604.Search in Google Scholar

Langaas, M. and Ø. Bakke (2014): “Robust methods to detect disease-genotype association in genetic association studies: calculate p-values using exact conditional enumeration instead of simulated permutations or asymptotic approximations,” Stat. Appl. Genet. Mol. Biol., 13, 675–692.10.1515/sagmb-2013-0084Search in Google Scholar PubMed

León-Novelo, L. G., P. Müller, W. Arap, J. Sun, R. Pasqualini and K.-A. Do (2013): “Bayesian decision theoretic multiple comparison procedures: an application to phage display data,” Biom. J., 55, 478–489.Search in Google Scholar

Lewontin, R. C. and K. I. Kojima (1960): “The evolutionary dynamics of complex polymorphisms,” Evolution, 14, 458–472.10.1111/j.1558-5646.1960.tb03113.xSearch in Google Scholar

Lydersen, S., M. W. Fagerland and P. Laake (2009): “Recommended tests for association in 2× 2 tables,” Stat. Med., 28, 1159–1175.Search in Google Scholar

Malovini, A., N. Barbarini, R. Bellazzi and F. de Michelis (2012): “Hierarchical Naive Bayes for genetic association studies,” BMC Bioinformatics, 13(Suppl 14), S6.10.1186/1471-2105-13-S14-S6Search in Google Scholar PubMed PubMed Central

McCarroll, S. A., F. G. Kuruvilla, J. M. Korn, S. Cawley, J. Nemesh, A. Wysoker, M. H. Shapero, P. I. de Bakker, J. B. Maller, A. Kirby, A. L. Elliott, M. Parkin, E. Hubbell, T. Webster, R. Mei, J. Veitch, P. J. Collins, R. Handsaker, S. Lincoln, M. Nizzari, J. Blume, K. W. Jones, R. Rava, M. J. Daly, S. B. Gabriel and D. Altshuler (2008): “Integrated detection and population-genetic analysis of SNPs and copy number variation,” Nat. Genet., 40, 1166–1174.Search in Google Scholar

Moskvina, V. and K. M. Schmidt (2008): “On multiple-testing correction in genome-wide association studies,” Genet. Epidemiol., 32, 567–573.Search in Google Scholar

Müller, P., G. Parmigiani, C. Robert and J. Rousseau (2004): “Optimal sample size for multiple testing: the case of gene expression microarrays,” J. Am. Stat. Assoc., 99, 990–1001.Search in Google Scholar

Müller, P., G. Parmigiani and K. Rice (2007): FDR and Bayesian multiple comparisons rules. In: Bernardo, J. M., Bayarri, M. J., Berger, J. O., Dawid, A. P., Heckerman, D., Smith, A. F. M. and West, M. (Eds.), Bayesian Statistics 8 – Proc. ISBA 8thWorldMeeting on Bayesian Statistics, Oxford: Oxford University Press, pp. 349–370.Search in Google Scholar

Ng, K. W., G.-L. Tian and M.-L. Tang (2011): Dirichlet and related distributions: theory, methods and applications, Hoboken, NJ: John Wiley & Sons.10.1002/9781119995784Search in Google Scholar

Nyholt, D. R. (2004): “A simple correction for multiple testing for SNPs in linkage disequilibrium with each other,” Am. J. Hum. Genet., 74, 765–769.Search in Google Scholar

Patefield, W. (1981): “An efficient method of generating random R×C tables with given row and column totals. (Algorithm AS 159.),” J. R. Stat. Soc., Ser. C, 30, 91–97.Search in Google Scholar

Pearson, K. (1900): “On the criterion, that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling,” Phil. Mag., 5(50), 157–175.10.1080/14786440009463897Search in Google Scholar

Scott, J. G. and J. O. Berger (2010): “Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem,” Ann. Stat., 38, 2587–2619.Search in Google Scholar

Sellke, T., M. Bayarri and J. O. Berger (2001): “Calibration of p values for testing precise null hypotheses,” Am. Stat., 55, 62–71.Search in Google Scholar

The 1000 Genomes Consortium (2010): “A map of human genome variation from population-scale sequencing,” Nature, 467, 1061–1073.10.1038/nature09534Search in Google Scholar PubMed PubMed Central

The International HapMap Consortium (2005): “A haplotype map of the human genome,” Nature, 437, 1299–1320.10.1038/nature04226Search in Google Scholar PubMed PubMed Central

The Wellcome Trust Case Control Consortium (2007): “Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls,” Nature, 447, 661–678.10.1038/nature05911Search in Google Scholar PubMed PubMed Central

Wakefield, J. (2009): “Bayes factors for genome-wide association studies: comparison with P-values,” Genet. Epidemiol., 33, 79–86.Search in Google Scholar

Westfall, P. H., W. O. Johnson and J. M. Utts (1997): “A Bayesian perspective on the Bonferroni adjustment,” Biometrika, 84, 419–427.10.1093/biomet/84.2.419Search in Google Scholar

Yekutieli, D. (2014): “Bayesian tests for composite alternative hypotheses in crosstabulated data,” TEST, 24, 287–301.10.1007/s11749-014-0407-1Search in Google Scholar

Published Online: 2015-7-28

Published in Print: 2015-8-1

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/sagmb-2014-0052

Keywords for this article

Bayes factors; contingency tables; Dirichlet mixtures; effective number of tests; statistical genetics