Robust methods to detect disease-genotype association in genetic association studies: calculate p-values using exact conditional enumeration instead of simulated permutations or asymptotic approximations
Abstract
In genetic association studies, detecting disease-genotype association is a primary goal. We study seven robust test statistics for such association when the underlying genetic model is unknown, for data on disease status (case or control) and genotype (three genotypes of a biallelic genetic marker). In such studies, p-values have predominantly been calculated by asymptotic approximations or by simulated permutations. We consider an exact method, conditional enumeration. When the number of simulated permutations tends to infinity, the permutation p-value approaches the conditional enumeration p-value, but calculating the latter is much more efficient than performing simulated permutations. We have studied case-control sample sizes with 500–5000 cases and 500–15,000 controls, and significance levels from 5×10–8 to 0.05, thus our results are applicable to genetic association studies with only a few genetic markers under study, intermediate follow-up studies, and genome-wide association studies. Our main findings are: (i) If all monotone genetic models are of interest, the best performance in the situations under study is achieved for the robust test statistics based on the maximum over a range of Cochran-Armitage trend tests with different scores and for the constrained likelihood ratio test. (ii) For significance levels below 0.05, for the test statistics under study, asymptotic approximations may give a test size up to 20 times the nominal level, and should therefore be used with caution. (iii) Calculating p-values based on exact conditional enumeration is a powerful, valid and computationally feasible approach, and we advocate its use in genetic association studies.
Acknowledgments
The authors would like to thank the associate editor and an anonymous referee for useful comments that significantly improved the article.
References
Armitage, P. (1955): “Tests for linear trends in proportions and frequencies,” Biometrics, 11, 375–386.10.2307/3001775Suche in Google Scholar
Bakke, Ø. and M. Langaas (2012): “The number of 2×c tables with given margins,” Preprint in Statistics 11/2012, Department of Mathematical Sciences, Norwegian University of Science and Technology.Suche in Google Scholar
Camp, N. J. (1997): “Genomewide transmission/disequilibrium testing – consideration of the genotypic relative risks at disease loci,” Am. J. Hum. Genet., 61, 1424–1430.Suche in Google Scholar
Casella, G. and R. L. Berger (2001): Statistical inference, 2nd edition. Duxbury: Pacific Grove, CA.Suche in Google Scholar
Cochran, W. G. (1954): “Some methods for strengthening the common c2 tests,” Biometrics, 10, 417–451.10.2307/3001616Suche in Google Scholar
Devlin, B. and K. Roeder (2004): “Genomic control for association studies,” Biometrics, 55, 997–1004.10.1111/j.0006-341X.1999.00997.xSuche in Google Scholar
Freidlin, B., G. Zheng, Z. Li and J. L. Gastwirth (2002): “Trend tests for case-control studies of genetic markers: power, sample size and robustness,” Hum. Hered., 53, 146–152.Suche in Google Scholar
Gastwirth, J. L. (1985): “The use of maximin efficiency robust tests in combining contingency tables and survival analysis,” J. Am. Stat. Assoc., 80, 380–384.Suche in Google Scholar
Joo, J., M. Kwak, K. Ahn and G. Zheng (2009): “A robust genome-wide scan statistic of the Wellcome Trust Case-Control Consortium,” Biometrics, 65, 1115–1122.10.1111/j.1541-0420.2009.01185.xSuche in Google Scholar PubMed
Langaas, M. and Ø. Bakke (2013): “Increasing power with the unconditional maximization enumeration test in small samples – a detailed study of the MAX3 test statistic,” Preprint in Statistics 1/2013, Department of Mathematical Sciences, Norwegian University of Science and Technology.Suche in Google Scholar
Lehmann, E. L. (1993): “The Fisher, Neyman-Pearson theories of testing hypotheses: one theory or two?” J. Am. Stat. Assoc., 88, 1242–1249.Suche in Google Scholar
Lydersen, S., M. W. Fagerland and P. Laake (2009): “Recommended tests for association in 2×2 tables,” Stat. Med., 28, 1159–75.Suche in Google Scholar
Mehrotra, D. V., D. S. F. Chan and R. L. Berger (2003): “A cautionary note on exact unconditional inference for a difference between two independent bionomial proportions,” Biometrics, 59, 441–450.10.1111/1541-0420.00051Suche in Google Scholar PubMed
Mehta, C. R. and J. F. Hilton (1993): “Exact power of conditional and unconditional tests: going beyond the 2×2 contingency table,” Am. Stat., 47, 91–98.Suche in Google Scholar
Mehta, C. R. and N. R. Patel (1983): “A network algorithm for performing Fisher’s exact test in r×c contingency tables,” J. Am. Stat. Assoc., 78, 427–434.Suche in Google Scholar
Mehta, C. R. and N. R. Patel (1995): “Exact logistic regression: theory and examples.” Stat. Med., 14, 2143–2160.Suche in Google Scholar
Moldovan, M. and M. Langaas (2013): “Exact conditional p-values from arbitrary ranking of the sample space: an application to genome-wide association studies,” arXiv, 1307.7537.Suche in Google Scholar
Morris, N. and R. Elston (2011): “A note on comparing power of test statistics at low significance levels,” Am. Stat., 65, 164–166.Suche in Google Scholar
Phipson, B. and G. K. Smyth (2010): “Permutation p-values should never be zero: Calculating exact p-values when permutations are randomly drawn,” Stat. Appl. Genet. Mol. Biol., 9, 39.Suche in Google Scholar
Pirinen, M., P. Donnelly and C. C. A. Spencer (2012): “Including known covariates can reduce power to detect genetic effects in case-control studies,” Nat. Genet., 44, 848–853.Suche in Google Scholar
Price, A. L., N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A. Shadick and D. Reich (2006): “Principal components analysis corrects for stratification in genome-wide association studies,” Nat. Genet., 38, 904–909.Suche in Google Scholar
Robinson, L. D. and N. P. Jewell (1991): “Some surprising results about covariate adjustment in logistic regression models,” Int. Stat. Rev., 58, 227–240.Suche in Google Scholar
Runde, M. (2013): Statistical metods for detecting genotype-phenotype association in the presence of environmental covariates, Master’s thesis, Norwegian University of Science and Technology.Suche in Google Scholar
Sasieni, P. D. (1997): “From genotypes to genes: doubling the sample size,” Biometrics, 53, 1253–1261.10.2307/2533494Suche in Google Scholar
Sladek, R., G. Rocheleau, J. Rung, C. Dina, L. Shen, D. Serre, P. Boutin, D. Vincent, A. Belisle, S. Hadjadj, B. Balkau, B. Heude, G. Charpentier, T. J. Hudson, A. Montpetit, A. V. Pshezhetsky, M. Prentki, B. I. Posner, D. J. Balding, D. Meyre, C. Polychronakos and P. Froguel (2007): “A genome-wide association study identifies novel risk loci for type 2 diabetes,” Nature, 445, 881–885.10.1038/nature05616Suche in Google Scholar PubMed
Slager, S. L. and D. J. Schaid (2001): “Case-control studies of genetic markers: Power and sample size approximations for Armitage’s test for trend,” Hum. Hered., 52, 149–153.Suche in Google Scholar
So, H.-C. and P. C. Sham (2011): “Robust association tests under different genetic models, allowing for binary or quantitative traits and covariates,” Behav. Genet., 41, 768–775.Suche in Google Scholar
Tarone, R. E. and J. J. Gart (1980): “On the robustness of combined tests for trend in proportions,” JASA, 75, 110–116.10.1080/01621459.1980.10477439Suche in Google Scholar
The Wellcome Trust Case Control Consortium (2007): “Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls,” Nature, 447, 661–678.10.1038/nature05911Suche in Google Scholar PubMed PubMed Central
Tian, J. and C. Xu (2013): MaXact: Exact max-type Cochran-Armitage trend test (CATT). R package version 0.2.1. http://CRAN.R-project.org/package=MaXact.Suche in Google Scholar
Wang, K. and V. C. Sheffield (2005): “A constrained-likelihood approach to marker-trait association studies,” Am. J. Hum. Genet., 77, 768–780.Suche in Google Scholar
Westfall, P. and S. Young (1993): “Resampling-based multiple testing: examples and methods for p-value adjustment,” Wiley series in probability and mathematical statistics (Applied probability and statistics).Suche in Google Scholar
Wise, M. E. (1963): “Multinomial probabilities and the c2 and X2 distributions,” Biometrika, 50, 145–154.10.1093/biomet/50.1-2.145Suche in Google Scholar
Zang, Y., W. K. Fung and G. Zheng (2010): “Simple algorithms to calculate asymptotic null distributions of robust tests in case-control genetic association studies in R,” J. Stat. Softw., 33, 1–24.Suche in Google Scholar
Zheng, G., B. Freidlin and J. L. Gastwirth (2006): “Comparison of robust tests for genetic association using case-control studies,” In: Rojo, J. ed., Optimality: The Second Erich L. Lehmann Symposium, Beachwood, OH: Institute of Mathematical Statistics, Lecture Notes – Monograph Series, volume 49, 253–265.Suche in Google Scholar
Zheng, G., B. Freidlin, Z. Li and J. L. Gastwirth (2003): “Choice of scores in trend tests for case-control studies of candidate-gene associations,” Biometrical J., 45, 335–348.Suche in Google Scholar
Zheng, G., J. Joo and Y. Yang (2009): “Pearson’s test, trend test, and MAX are all trend tests with different types of scores,” Ann. Hum. Genet., 73, 133–140.Suche in Google Scholar
©2014 by De Gruyter
Artikel in diesem Heft
- Frontmatter
- Research Articles
- When is Menzerath-Altmann law mathematically trivial? A new approach
- Covariate adjusted differential variability analysis of DNA methylation with propensity score method
- P-value calibration for multiple testing problems in genomics
- Robust methods to detect disease-genotype association in genetic association studies: calculate p-values using exact conditional enumeration instead of simulated permutations or asymptotic approximations
- Markovianness and conditional independence in annotated bacterial DNA
- Exploring homogeneity of correlation structures of gene expression datasets within and between etiological disease categories
- Corrigendum
- Biological pathway selection through Bayesian integrative modeling
Artikel in diesem Heft
- Frontmatter
- Research Articles
- When is Menzerath-Altmann law mathematically trivial? A new approach
- Covariate adjusted differential variability analysis of DNA methylation with propensity score method
- P-value calibration for multiple testing problems in genomics
- Robust methods to detect disease-genotype association in genetic association studies: calculate p-values using exact conditional enumeration instead of simulated permutations or asymptotic approximations
- Markovianness and conditional independence in annotated bacterial DNA
- Exploring homogeneity of correlation structures of gene expression datasets within and between etiological disease categories
- Corrigendum
- Biological pathway selection through Bayesian integrative modeling