Abstract
Due to the advancement of genome sequencing techniques, a great stride has been made in exome sequencing such that the association study between disease and genetic variants has become feasible. Some powerful and well-known association tests have been proposed to test the association between a group of genes and the disease of interest. However, some challenges still remain, in particular, many factors can affect the performance of testing power, e.g., the sample size, the number of causal and non-causal variants, and direction of the effect of causal variants. Recently, a powerful test, called T REM , is derived based on a random effects model. T REM has the advantages of being less sensitive to the inclusion of non-causal rare variants or low effect common variants or the presence of missing genotypes. However, the testing power of T REM can be low when a portion of causal variants has effects in opposite directions. To improve the drawback of T REM , we propose a novel test, called T ROB , which keeps the advantages of T REM and is more robust than T REM in terms of having adequate power in the case of variants with opposite directions of effect. Simulation results show that T ROB has a stable type I error rate and outperforms T REM when the proportion of risk variants decreases to a certain level and its advantage over T REM increases as the proportion decreases. Furthermore, T ROB outperforms several other competing tests in most scenarios. The proposed methodology is illustrated using the Shanghai Breast Cancer Study.
Funding source: Ministry of Science and Technology, Taiwan
Award Identifier / Grant number: MOST 108-2118-M-035 -002 -
-
Author contribution: J.Y. Lee conceived and designed the experiments, analyzed the data, prepared figures and tables, authored the initial draft. P.S. Shen reviewed of the paper, approved the final draft. K.F. Cheng concived and designed the experiments, and approved the final draft.
-
Research funding: This research was supported in part by the Ministry of Science and Technology of Taiwan under Grants (MOST 108-2118-M-035-002-).
-
Conflict of interest statement: The authors declare that there is no conflict of interest.
References
Basu, S. and Pan, W. (2011). Comparison of statistical tests for association with rare variants. Genet. Epidemiol. 35: 606–619. https://doi.org/10.1002/gepi.20609.Suche in Google Scholar PubMed PubMed Central
Burfoot, R.K., Jensen, C.J., Field, J., Stankovich, J., Varney, M.D., Johnson, L.J., Butzkueven, H., Booth, D., Bahlo, M., Tait, B.D., et al.. (2008). SNP mapping and candidate gene sequencing in the class I region of the HLA complex: searching for multiple sclerosis susceptibility genes in Tasmanians. Tissue Antigens 71: 42–50. https://doi.org/10.1111/j.1399-0039.2007.00962.x.Suche in Google Scholar PubMed
Cheng, K.F., Lee, J.Y., Zheng, W., and Li, C. (2014). A powerful association test of multiple genetic variants using a random-effects model. Stat. Med. 33: 1816–1827. https://doi.org/10.1002/sim.6068.Suche in Google Scholar PubMed PubMed Central
Cheng, K.F. and Lee, J.Y. (2017). Detecting disease association signals with multiple genetic variants and covariates. Stat. Methods Med. Res. 26: 1281–1294. https://doi.org/10.1177/0962280215574541.Suche in Google Scholar PubMed
Day-Williams, A.G. and Zeggini, E. (2011). The effect of next-generation sequencing technology on complex trait research. Eur. J. Clin. Invest. 41: 561–567. https://doi.org/10.1111/j.1365-2362.2010.02437.x.Suche in Google Scholar PubMed PubMed Central
Gibson, G. (2011). Rare and common variants: twenty arguments. Nat. Rev. Genet. 13: 135–145. https://doi.org/10.1038/nrg3118.Suche in Google Scholar PubMed PubMed Central
Hafler, J.P., Maier, L.M., Cooper, J.D., Plagnol, V., Hinks, A., Simmonds, M.J., Stevens, H.E., Walker, N.M., Healy, B., Howson, J.M.M., et al.. (2009). CD226 Gly307Ser association with multiple autoimmune diseases. Gene Immun. 10: 5–10. https://doi.org/10.1038/gene.2008.82.Suche in Google Scholar PubMed PubMed Central
Han, F. and Pan, W. (2010). A data-adaptive sum test for disease association with multiple common or rare variants. Hum. Hered. 70: 42–54. https://doi.org/10.1159/000288704.Suche in Google Scholar PubMed PubMed Central
Hunter, D.J., Kraft, P., Jacobs, K.B., Cox, D.G., Yeager, M., Hankinson, S.E., Wacholder, S., Wang, Z., Welch, R., Hutchinson, A., et al.. (2007). A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat. Genet. 39: 870–874. https://doi.org/10.1038/ng2075.Suche in Google Scholar PubMed PubMed Central
Kryukov, G.V., Shpunt, A., Stamatoyannopoulos, J.A., and Sunyaeva, S.R. (2009). Power of deep, all-exon resequencing for discovery of human trait genes. Proc. Natl. Acad. Sci. U.S.A. 106: 3871–3876. https://doi.org/10.1073/pnas.0812824106.Suche in Google Scholar PubMed PubMed Central
Lee, S., Emond, M.J., Bamshad, M.J., Barnes, K.C., Rieder, M.J., Nickerson, D.A., Christiani, D.C., Wurfel, M.M., and Lin, X. (2012). Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91: 224–237. https://doi.org/10.1016/j.ajhg.2012.06.007.Suche in Google Scholar PubMed PubMed Central
Li, B. and Leal, S. (2008). Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83: 311–321. https://doi.org/10.1016/j.ajhg.2008.06.024.Suche in Google Scholar PubMed PubMed Central
Lin, D.Y. and Tang, Z.Z. (2011). A general framework for detecting disease associations with rare variants in sequencing studies. Am. J. Hum. Genet. 89: 354–367. https://doi.org/10.1016/j.ajhg.2011.07.015.Suche in Google Scholar PubMed PubMed Central
Lowe, C.E., Cooper, J.D., Brusko, T., Walker, N.M., Smyth, D.J., Bailey, R., Bourget, K., Plagnol, V., Field, S., Atkinson, M., et al.. (2007). Large scale genetic fine mapping and genotype-phenotype associations implicate polymorphism IL2RA region in type I diabetes. Nat. Genet. 39: 1074–1082. https://doi.org/10.1038/ng2102.Suche in Google Scholar PubMed
MacArthur, D.G., Balasubramanian, S., Frankish, A., Huang, N., Morris, J., Walter, K., Jostins, L., Habegger, L., Pickrell, J.K., Montgomery, S.B., et al.. (2012). A systematic survey of loss-of function variants in human protein-coding genes. Science 335: 823–828. https://doi.org/10.1126/science.1215040.Suche in Google Scholar PubMed PubMed Central
Madsen, B.E. and Browning, S.R. (2009). A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5: e1000384. https://doi.org/10.1371/journal.pgen.1000384.Suche in Google Scholar PubMed PubMed Central
Morgenthaler, S. and Thilly, W.G. (2007). A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat. Res. 615: 28–56. https://doi.org/10.1016/j.mrfmmm.2006.09.003.Suche in Google Scholar PubMed
Pan, W. (2011). Relationship between genomic distance-based regression and kernel machine regression for multi-marker association testing. Genet. Epidemiol. 35: 211–216. https://doi.org/10.1002/gepi.20567.Suche in Google Scholar PubMed PubMed Central
Price, A.L., Kryukov, G.V., de Bakker, P.I.W., Purcell, S.M., Staples, J., Wei, L.J., and Sunyaev, S.R. (2010). Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86: 832–838. https://doi.org/10.1016/j.ajhg.2010.04.005.Suche in Google Scholar PubMed PubMed Central
Schaffner, S.F., Foo, C., Gabriel, S., Reich, D., Daly, M.J., and Altshuler, D. (2005). Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15: 1576–1583. https://doi.org/10.1101/gr.3709305.Suche in Google Scholar PubMed PubMed Central
The 1000 Genomes Project Consortium (2012). An integrated map of genetic variation from 1,092 human genomes. Nature 491: 56–65. https://doi.org/10.1038/nature11632.Suche in Google Scholar PubMed PubMed Central
Wu, M.C., Lee, S., Cai, T., Li, Y., Boehnke, M., and Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89: 82–93. https://doi.org/10.1016/j.ajhg.2011.05.029.Suche in Google Scholar PubMed PubMed Central
Zawistowski, M., Gopalakrishnan, S., Ding, J., Li, Y., Grimm, S., and Zöllner, S. (2010). Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. Am. J. Hum. Genet. 87: 604–617. https://doi.org/10.1016/j.ajhg.2010.10.012.Suche in Google Scholar PubMed PubMed Central
Zhang, Y., Guan, W., and Pan, W. (2013). Adjustment for population stratification via principal components in association analysis of rare variants. Genet. Epidemiol. 37: 99–109. https://doi.org/10.1002/gepi.21691.Suche in Google Scholar PubMed PubMed Central
Zheng, W., Long, J., Gao, Y.T., Li, C., Zheng, Y., Xiang, Y.B., Wen, W., Levy, S., Deming, S.L., Haines, J.L., et al.. (2009). Genome-wide association study identifies a new breast cancer susceptibility locus at 6q25.1. Nat. Genet. 41: 324–328. https://doi.org/10.1038/ng.318.Suche in Google Scholar PubMed PubMed Central
© 2022 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Review Article
- Challenges for machine learning in RNA-protein interaction prediction
- Research Articles
- Distinct characteristics of correlation analysis at the single-cell and the population level
- pwrBRIDGE: a user-friendly web application for power and sample size estimation in batch-confounded microarray studies with dependent samples
- Use of SVM-based ensemble feature selection method for gene expression data analysis
- A robust association test with multiple genetic variants and covariates
- Estimation of the covariance structure from SNP allele frequencies
- GMEPS: a fast and efficient likelihood approach for genome-wide mediation analysis under extreme phenotype sequencing
- Sparse latent factor regression models for genome-wide and epigenome-wide association studies
Artikel in diesem Heft
- Review Article
- Challenges for machine learning in RNA-protein interaction prediction
- Research Articles
- Distinct characteristics of correlation analysis at the single-cell and the population level
- pwrBRIDGE: a user-friendly web application for power and sample size estimation in batch-confounded microarray studies with dependent samples
- Use of SVM-based ensemble feature selection method for gene expression data analysis
- A robust association test with multiple genetic variants and covariates
- Estimation of the covariance structure from SNP allele frequencies
- GMEPS: a fast and efficient likelihood approach for genome-wide mediation analysis under extreme phenotype sequencing
- Sparse latent factor regression models for genome-wide and epigenome-wide association studies