Robust methods to detect disease-genotype association in genetic association studies: calculate p-values using exact conditional enumeration instead of simulated permutations or asymptotic approximations

Mette Langaas; Øyvind Bakke

doi:10.1515/sagmb-2013-0084

Article

Robust methods to detect disease-genotype association in genetic association studies: calculate p-values using exact conditional enumeration instead of simulated permutations or asymptotic approximations

Mette Langaas and Øyvind Bakke

Published/Copyright: October 13, 2014

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Statistical Applications in Genetics and Molecular Biology Volume 13 Issue 6

Abstract

In genetic association studies, detecting disease-genotype association is a primary goal. We study seven robust test statistics for such association when the underlying genetic model is unknown, for data on disease status (case or control) and genotype (three genotypes of a biallelic genetic marker). In such studies, p-values have predominantly been calculated by asymptotic approximations or by simulated permutations. We consider an exact method, conditional enumeration. When the number of simulated permutations tends to infinity, the permutation p-value approaches the conditional enumeration p-value, but calculating the latter is much more efficient than performing simulated permutations. We have studied case-control sample sizes with 500–5000 cases and 500–15,000 controls, and significance levels from 5×10^–8 to 0.05, thus our results are applicable to genetic association studies with only a few genetic markers under study, intermediate follow-up studies, and genome-wide association studies. Our main findings are: (i) If all monotone genetic models are of interest, the best performance in the situations under study is achieved for the robust test statistics based on the maximum over a range of Cochran-Armitage trend tests with different scores and for the constrained likelihood ratio test. (ii) For significance levels below 0.05, for the test statistics under study, asymptotic approximations may give a test size up to 20 times the nominal level, and should therefore be used with caution. (iii) Calculating p-values based on exact conditional enumeration is a powerful, valid and computationally feasible approach, and we advocate its use in genetic association studies.

Keywords: case-control study; contingency table; exact tests; genetic models; permutation

Corresponding author: Mette Langaas, Department of Mathematical Sciences, Norwegian University of Science and Technology, No 7491 Trondheim, Norway, e-mail: mette.langaas@math.ntnu.no

Acknowledgments

The authors would like to thank the associate editor and an anonymous referee for useful comments that significantly improved the article.

References

Armitage, P. (1955): “Tests for linear trends in proportions and frequencies,” Biometrics, 11, 375–386.10.2307/3001775Search in Google Scholar

Bakke, Ø. and M. Langaas (2012): “The number of 2×c tables with given margins,” Preprint in Statistics 11/2012, Department of Mathematical Sciences, Norwegian University of Science and Technology.Search in Google Scholar

Camp, N. J. (1997): “Genomewide transmission/disequilibrium testing – consideration of the genotypic relative risks at disease loci,” Am. J. Hum. Genet., 61, 1424–1430.Search in Google Scholar

Casella, G. and R. L. Berger (2001): Statistical inference, 2nd edition. Duxbury: Pacific Grove, CA.Search in Google Scholar

Cochran, W. G. (1954): “Some methods for strengthening the common c2 tests,” Biometrics, 10, 417–451.10.2307/3001616Search in Google Scholar

Devlin, B. and K. Roeder (2004): “Genomic control for association studies,” Biometrics, 55, 997–1004.10.1111/j.0006-341X.1999.00997.xSearch in Google Scholar

Freidlin, B., G. Zheng, Z. Li and J. L. Gastwirth (2002): “Trend tests for case-control studies of genetic markers: power, sample size and robustness,” Hum. Hered., 53, 146–152.Search in Google Scholar

Gastwirth, J. L. (1985): “The use of maximin efficiency robust tests in combining contingency tables and survival analysis,” J. Am. Stat. Assoc., 80, 380–384.Search in Google Scholar

Joo, J., M. Kwak, K. Ahn and G. Zheng (2009): “A robust genome-wide scan statistic of the Wellcome Trust Case-Control Consortium,” Biometrics, 65, 1115–1122.10.1111/j.1541-0420.2009.01185.xSearch in Google Scholar PubMed

Langaas, M. and Ø. Bakke (2013): “Increasing power with the unconditional maximization enumeration test in small samples – a detailed study of the MAX3 test statistic,” Preprint in Statistics 1/2013, Department of Mathematical Sciences, Norwegian University of Science and Technology.Search in Google Scholar

Lehmann, E. L. (1993): “The Fisher, Neyman-Pearson theories of testing hypotheses: one theory or two?” J. Am. Stat. Assoc., 88, 1242–1249.Search in Google Scholar

Lydersen, S., M. W. Fagerland and P. Laake (2009): “Recommended tests for association in 2×2 tables,” Stat. Med., 28, 1159–75.Search in Google Scholar

Mehrotra, D. V., D. S. F. Chan and R. L. Berger (2003): “A cautionary note on exact unconditional inference for a difference between two independent bionomial proportions,” Biometrics, 59, 441–450.10.1111/1541-0420.00051Search in Google Scholar PubMed

Mehta, C. R. and J. F. Hilton (1993): “Exact power of conditional and unconditional tests: going beyond the 2×2 contingency table,” Am. Stat., 47, 91–98.Search in Google Scholar

Mehta, C. R. and N. R. Patel (1983): “A network algorithm for performing Fisher’s exact test in r×c contingency tables,” J. Am. Stat. Assoc., 78, 427–434.Search in Google Scholar

Mehta, C. R. and N. R. Patel (1995): “Exact logistic regression: theory and examples.” Stat. Med., 14, 2143–2160.Search in Google Scholar

Moldovan, M. and M. Langaas (2013): “Exact conditional p-values from arbitrary ranking of the sample space: an application to genome-wide association studies,” arXiv, 1307.7537.Search in Google Scholar

Morris, N. and R. Elston (2011): “A note on comparing power of test statistics at low significance levels,” Am. Stat., 65, 164–166.Search in Google Scholar

Phipson, B. and G. K. Smyth (2010): “Permutation p-values should never be zero: Calculating exact p-values when permutations are randomly drawn,” Stat. Appl. Genet. Mol. Biol., 9, 39.Search in Google Scholar

Pirinen, M., P. Donnelly and C. C. A. Spencer (2012): “Including known covariates can reduce power to detect genetic effects in case-control studies,” Nat. Genet., 44, 848–853.Search in Google Scholar

Price, A. L., N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A. Shadick and D. Reich (2006): “Principal components analysis corrects for stratification in genome-wide association studies,” Nat. Genet., 38, 904–909.Search in Google Scholar

Robinson, L. D. and N. P. Jewell (1991): “Some surprising results about covariate adjustment in logistic regression models,” Int. Stat. Rev., 58, 227–240.Search in Google Scholar

Runde, M. (2013): Statistical metods for detecting genotype-phenotype association in the presence of environmental covariates, Master’s thesis, Norwegian University of Science and Technology.Search in Google Scholar

Sasieni, P. D. (1997): “From genotypes to genes: doubling the sample size,” Biometrics, 53, 1253–1261.10.2307/2533494Search in Google Scholar

Sladek, R., G. Rocheleau, J. Rung, C. Dina, L. Shen, D. Serre, P. Boutin, D. Vincent, A. Belisle, S. Hadjadj, B. Balkau, B. Heude, G. Charpentier, T. J. Hudson, A. Montpetit, A. V. Pshezhetsky, M. Prentki, B. I. Posner, D. J. Balding, D. Meyre, C. Polychronakos and P. Froguel (2007): “A genome-wide association study identifies novel risk loci for type 2 diabetes,” Nature, 445, 881–885.10.1038/nature05616Search in Google Scholar PubMed

Slager, S. L. and D. J. Schaid (2001): “Case-control studies of genetic markers: Power and sample size approximations for Armitage’s test for trend,” Hum. Hered., 52, 149–153.Search in Google Scholar

So, H.-C. and P. C. Sham (2011): “Robust association tests under different genetic models, allowing for binary or quantitative traits and covariates,” Behav. Genet., 41, 768–775.Search in Google Scholar

Tarone, R. E. and J. J. Gart (1980): “On the robustness of combined tests for trend in proportions,” JASA, 75, 110–116.10.1080/01621459.1980.10477439Search in Google Scholar

The Wellcome Trust Case Control Consortium (2007): “Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls,” Nature, 447, 661–678.10.1038/nature05911Search in Google Scholar PubMed PubMed Central

Tian, J. and C. Xu (2013): MaXact: Exact max-type Cochran-Armitage trend test (CATT). R package version 0.2.1. http://CRAN.R-project.org/package=MaXact.Search in Google Scholar

Wang, K. and V. C. Sheffield (2005): “A constrained-likelihood approach to marker-trait association studies,” Am. J. Hum. Genet., 77, 768–780.Search in Google Scholar

Westfall, P. and S. Young (1993): “Resampling-based multiple testing: examples and methods for p-value adjustment,” Wiley series in probability and mathematical statistics (Applied probability and statistics).Search in Google Scholar

Wise, M. E. (1963): “Multinomial probabilities and the c2 and X2 distributions,” Biometrika, 50, 145–154.10.1093/biomet/50.1-2.145Search in Google Scholar

Zang, Y., W. K. Fung and G. Zheng (2010): “Simple algorithms to calculate asymptotic null distributions of robust tests in case-control genetic association studies in R,” J. Stat. Softw., 33, 1–24.Search in Google Scholar

Zheng, G., B. Freidlin and J. L. Gastwirth (2006): “Comparison of robust tests for genetic association using case-control studies,” In: Rojo, J. ed., Optimality: The Second Erich L. Lehmann Symposium, Beachwood, OH: Institute of Mathematical Statistics, Lecture Notes – Monograph Series, volume 49, 253–265.Search in Google Scholar

Zheng, G., B. Freidlin, Z. Li and J. L. Gastwirth (2003): “Choice of scores in trend tests for case-control studies of candidate-gene associations,” Biometrical J., 45, 335–348.Search in Google Scholar

Zheng, G., J. Joo and Y. Yang (2009): “Pearson’s test, trend test, and MAX are all trend tests with different types of scores,” Ann. Hum. Genet., 73, 133–140.Search in Google Scholar

Published Online: 2014-10-13

Published in Print: 2014-12-1

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/sagmb-2013-0084

Keywords for this article

case-control study; contingency table; exact tests; genetic models; permutation