Abstract
Multilocus haplotype analysis of candidate variants with genome wide association studies (GWAS) data may provide evidence of association with disease, even when the individual loci themselves do not. Unfortunately, when a large number of candidate variants are investigated, identifying risk haplotypes can be very difficult. To meet the challenge, a number of approaches have been put forward in recent years. However, most of them are not directly linked to the disease-penetrances of haplotypes and thus may not be efficient. To fill this gap, we propose a mixture model-based approach for detecting risk haplotypes. Under the mixture model, haplotypes are clustered directly according to their estimated disease penetrances. A theoretical justification of the above model is provided. Furthermore, we introduce a hypothesis test for haplotype inheritance patterns which underpin this model. The performance of the proposed approach is evaluated by simulations and real data analysis. The results show that the proposed approach outperforms an existing multiple testing method.
Acknowledgement
We thank the WTCCC for sharing their data with us. The project was partially done when the second author was visiting Chinese Academy of Sciences, Beijing. The second author thanks Professor Guohua Zou for his hospitality. The research of the first author was funded by the Ministry of Higher Education and Scientific Research, Iraq. The authors have no conflicts of interest to declare.
References
Agresti, A. (1999): “On logit confidence intervals for the odds ratio with small samples,” Biometrics, 55, 597–602.10.1111/j.0006-341X.1999.00597.xSearch in Google Scholar
Browning, S. R. and B. L. Browning (2007): “Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering,” Am J Hum Genet., 81, 1084–1097.10.1086/521987Search in Google Scholar
Hindorff, L. A., P. Sethupathy, H. A. Junkins, E. M. Ramos, J. P. Mehta, F. S. Collins, and T. A. Manolio (2009): “Potential etiologic and functional implications of genome-wide association loci for human diseases and traits,” Proc. Natl. Acad. Sci. USA, 106, 9362–9367.10.1073/pnas.0903103106Search in Google Scholar
Hudson, R. R. (2002): “Generating samples under a Wright-Fisher neutral model,” Bioinformatics, 18, 337–338.10.1093/bioinformatics/18.2.337Search in Google Scholar
Jewell, N. P. (2004): Statistics for Epidemiology, Chapman & Hall/CRC, New York.Search in Google Scholar
Karlis, D. and E. Xekalaki (2003): “Choosing initial values for the EM algorithm for finite mixtures,” Comput. Stat. Data Ana., 41, 577–590.10.1016/S0167-9473(02)00177-9Search in Google Scholar
Li, Y., A. E. Byrnes, and M. Li (2010): “To identify associations with rare variants, Just WHalt: Weighted haplotype and imputation-based tests,” Am. J. Hum. Genet., 87, 728–735.10.1016/j.ajhg.2010.10.014Search in Google Scholar PubMed PubMed Central
Li, M., C. Ye, W. Fu, R. C. Elston, and Q. Lu (2011): “Detecting Genetic Interactions for Quantitative Traits with U-Statistics,” Genet. Epidemiol., 35, 457–468.10.1002/gepi.20594Search in Google Scholar PubMed PubMed Central
McLachlan, G. J. and K. E. Basford (1988): Mixture Models: Inference and Applications to Clustering, Marcel Dekker, New York.Search in Google Scholar
Molitor, J., P. Marjoram, and D. Thomas (2003): “Fine-scale mapping of disease genes with multiple mutations via spatial clustering techniques,” Am. J. Hum. Genet., 73, 1368–1384.10.1086/380415Search in Google Scholar PubMed PubMed Central
Morris, A. P. (2006): “A flexible Bayesian framework for modeling haplotype association with disease, allowing for dominance effects of the underlying causative variants,” Am. J. Hum. Genet., 79, 679–694.10.1086/508264Search in Google Scholar PubMed PubMed Central
Prentice, R. L. and R. Pyke (1979): “Logistic disease incidence models and case–control studies,” Biometrika, 66, 403–411.10.1093/biomet/66.3.403Search in Google Scholar
Schaid, D. J., C. M. Rowland, D. E. Tines, R. M. Jacobson, and G. A. Poland (2002): “Score tests for association between traits and haplotypes when linkage phase is ambiguous,” Am. J. Hum. Genet., 70, 425–434.10.1086/338688Search in Google Scholar PubMed PubMed Central
Scheet, P. and M. Stephens (2006): “A fast and flexible method for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase,” Am. J. Hum. Genet., 78, 629–644.10.1086/502802Search in Google Scholar PubMed PubMed Central
Stephens, P., N. J. Smith, and P. Donnelly (2001): “A new statistical method for haplotype reconstruction from population data,” Am. J. Hum. Genet., 68, 978–989.10.1086/319501Search in Google Scholar PubMed PubMed Central
Stranger, B. E., E. A. Stahl, and T. Raj (2011): “Progress and promise of genome-wide association studies for human complex trait genetics,” Genetics, 187, 367–383.10.1534/genetics.110.120907Search in Google Scholar PubMed PubMed Central
Tzeng, J. Y., C. H. Wang, J. T. Kao, and C. K. Hsiao (2006): “Regression-based association analysis with clustered haplotypes through use of genotypes,” Am. J. Hum. Genet., 78, 231–242.10.1086/500025Search in Google Scholar PubMed PubMed Central
Welter, D., J. MacArthur, J. Morales, T. Burdett, P. Hall, H. Junkins, A. Klemm, P. Flicek, T. Manolio, L. Hindorff, and H. Parkinson (2014): “The NHGRI GWAS Catalog, a curated resource of SNP-trait associations,” Nucleic Acids Res., 42 (Database issue), D1001–D1006.10.1093/nar/gkt1229Search in Google Scholar PubMed PubMed Central
WTCCC (2007): “The Wellcome Trust Case Control Consortium Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls,” Nature, 447, 661–668.10.1038/nature05911Search in Google Scholar PubMed PubMed Central
Zakharov, S., T. Y. Wong, T. Aung, E. N. Vithana, C. C. Khor, A. Salim, and A. Thalamuthu (2013): “Combined genotype and haplotype tests for region-based association studies,” BMC Genom., 14, 569.10.1186/1471-2164-14-569Search in Google Scholar PubMed PubMed Central
Zhang, J., F. Liang, W. R. Dassen, B. A. Veldman, P. A. Doevendans, and M. De Gunst (2003): “Search for haplotype interactions that influence susceptibility to type 1 diabetes, through use of unphased genotype data,” Am. J. Hum. Genet., 73, 1385–1401.10.1086/380417Search in Google Scholar PubMed PubMed Central
Zhu, X., T. Feng, Y. Li, Q. Lu, and R. C. Elston (2010): “Detecting rare variants for complex traits using family and unrelated data,” Genet. Epidemiol., 34, 171–187.10.1002/gepi.20449Search in Google Scholar PubMed PubMed Central
Supplemental Material:
The online version of this article (DOI: https://doi.org/10.1515/sagmb-2016-0022) offers supplementary material, available to authorized users.
©2017 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- Regularized estimation in sparse high-dimensional multivariate regression, with application to a DNA methylation study
- Mixture model-based association analysis with case-control data in genome wide association studies
- Genetic association test based on principal component analysis
- Comparing the performance of linear and nonlinear principal components in the context of high-dimensional genomic data integration
Articles in the same Issue
- Frontmatter
- Research Articles
- Regularized estimation in sparse high-dimensional multivariate regression, with application to a DNA methylation study
- Mixture model-based association analysis with case-control data in genome wide association studies
- Genetic association test based on principal component analysis
- Comparing the performance of linear and nonlinear principal components in the context of high-dimensional genomic data integration