A practical approach to adjusting for population stratification in genome-wide association studies: principal components and propensity scores (PCAPS)

Huaqing Zhao; Nandita Mitra; Peter A. Kanetsky; Katherine L. Nathanson; Timothy R. Rebbeck

doi:10.1515/sagmb-2017-0054

Artikel

A practical approach to adjusting for population stratification in genome-wide association studies: principal components and propensity scores (PCAPS)

Huaqing Zhao , Nandita Mitra , Peter A. Kanetsky , Katherine L. Nathanson und Timothy R. Rebbeck

Veröffentlicht/Copyright: 4. Dezember 2018

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Statistical Applications in Genetics and Molecular Biology Band 17 Heft 6

Abstract

Genome-wide association studies (GWAS) are susceptible to bias due to population stratification (PS). The most widely used method to correct bias due to PS is principal components (PCs) analysis (PCA), but there is no objective method to guide which PCs to include as covariates. Often, the ten PCs with the highest eigenvalues are included to adjust for PS. This selection is arbitrary, and patterns of local linkage disequilibrium may affect PCA corrections. To address these limitations, we estimate genomic propensity scores based on all statistically significant PCs selected by the Tracy-Widom (TW) statistic. We compare a principal components and propensity scores (PCAPS) approach to PCA and EMMAX using simulated GWAS data under no, moderate, and severe PS. PCAPS reduced spurious genetic associations regardless of the degree of PS, resulting in odds ratio (OR) estimates closer to the true OR. We illustrate our PCAPS method using GWAS data from a study of testicular germ cell tumors. PCAPS provided a more conservative adjustment than PCA. Advantages of the PCAPS approach include reduction of bias compared to PCA, consistent selection of propensity scores to adjust for PS, the potential ability to handle outliers, and ease of implementation using existing software packages.

Keywords: bias; principal components analysis; propensity score; testicular germ cell tumors; Tracy-Widom statistic

References

Airy, G. (1838): “On the intensity of light in the neighbourhood of a caustic,” Thans. Cambr. Phil. Soc., 6, 379–402.Suche in Google Scholar

Allen, A., M. P. Epstein and G. A. Satten (2010): “Score-based adjustment for confounding by population stratification in genetic association studies,” Genet. Epidemiol., 34(5), 383–385.10.1002/gepi.20487Suche in Google Scholar PubMed PubMed Central

Bouaziz, M., C. Ambroise and M. Guedj (2011): “Accounting for population stratification in practice: a comparison of the main strategies dedicated to genome-wide association studies,” PLoS One, 6, e28845.10.1371/journal.pone.0028845Suche in Google Scholar PubMed PubMed Central

Cepeda, M. S., R. Boston, J. T. Farrar and B. L. Strom (2003): “Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders,” Am J Epidemiol, 158, 280–287.10.1093/aje/kwg115Suche in Google Scholar PubMed

Chen, H., C. Wang, M. P. Conomos, A. M. Stilp, Z. Li, T. Sofer, A. A. Szpiro, W. Chen, J. M. Brehm, J. C. Celedón, S. Redline, G. J. Papanicolaou, T. A. Thornton, C. C. Laurie, K. Rice and X. Lin (2016): “Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models,” Am. J. Hum. Genet., 98, 653–666.10.1016/j.ajhg.2016.02.012Suche in Google Scholar PubMed PubMed Central

de Andrade, M., D. Ray, A. C. Pereira and J. P. Soler (2015): “Global individual ancestry using principal components for family data,” Hum. Hered., 80, 1–11.10.1159/000381908Suche in Google Scholar PubMed PubMed Central

Devlin, B. and K. Roeder (1999): “Genomic control for association studies,” Biometrics, 55, 997–1004.10.1111/j.0006-341X.1999.00997.xSuche in Google Scholar PubMed

Dominici, D. and R. S. Maier (2008): Special Functions and Orthogonal Polynomials, American Mathematical Society.10.1090/conm/471Suche in Google Scholar

Drake, C. (1993): “Effects of misspecification of the propensity score on estimators of treatment effect,” Biometrics, 49, 1231–1236.10.2307/2532266Suche in Google Scholar

Epstein, M. P., A. S. Allen and G. A. Satten (2007): “A simple and improved correction for population stratification in case-control studies,” Am. J. Hum. Genet., 80, 921–930.10.1086/516842Suche in Google Scholar PubMed PubMed Central

Epstein, M. P., R. Duncan, K. A. Broadaway, M. He, A. S. Allen and G. A. Satten (2012): “Stratification-score matching improves correction for confounding by population stratification in case-control association studies,” Genet. Epidemiol., 36, 195–205.10.1002/gepi.21611Suche in Google Scholar PubMed PubMed Central

Feng, Q., J. Abraham, T. Feng, Y. Song, R. C. Elston and X. Zhu (2009): “A method to correct for population structure using a segregation model,” BMC Proc., 3(Suppl 7), S104.10.1186/1753-6561-3-s7-s104Suche in Google Scholar PubMed PubMed Central

Hastings, S. P. and J. B. McLeod (1980): “A boundary value problem associated with the second Painleve transcendent and the Korteweg-de Vries equation,” Arch. Ration. Mech. An., 73, 31–51.10.1007/BF00283254Suche in Google Scholar

Imbens, G. W. (2004): “Nonparametric estimation of average treatment effects under exogeneity: a review,” Rev. Econ. Stat., 86, 4–29.10.1162/003465304323023651Suche in Google Scholar

Johnstone, I. M. (2001): “On the distribution of the largest eigenvalue in principal components analysis,” Ann. Stat., 29, 295–327.10.1214/aos/1009210543Suche in Google Scholar

Kanetsky, P. A., N. Mitra, S. Vardhanabhuti, M. Li, D. J. Vaughn, R. Letrero, S. L. Ciosek, D. R. Doody, L. M. Smith, J. Weaver, A. Albano, C. Chen, J. R. Starr, D. J. Rader, A. K. Godein, M. P. Reilly, H. Hakonarson, S. M. Schwartz and K. L. Nathanson (2009): “Common variation in KITLG and at 5q31.3 predisposes to testicular germ cell cancer,” Nat. Genet., 41, 811–815.10.1038/ng.393Suche in Google Scholar PubMed PubMed Central

Kang, H. M., J. H. Sul, S. K. Service, N. A. Zaitlen, S.-Y. Kong, N. B. Freimer, C. Sabatti and E. Eskin (2010): “Variance component model to account for sample structure in genome-wide association studies,” Nat. Gene., 42, 348–354.10.1038/ng.548Suche in Google Scholar PubMed PubMed Central

Kang, S. J., E. K. Larkin, Y. Song, J. Barnholtz-Sloan, D. Baechle, T. Feng and X. Zhu (2009): “Assessing the impact of global versus local ancestry in association studies,” BMC Proc., 3(Suppl 7), S107.10.1186/1753-6561-3-s7-s107Suche in Google Scholar PubMed PubMed Central

Lee, A. B., D. Luca, L. Klei, B. Devlin and K. Roeder (2010): “Discovering genetic ancestry using spectral graph theory,” Genet. Epidemiol., 34, 51–59.10.1002/gepi.20434Suche in Google Scholar PubMed PubMed Central

Li, C. and M. Li (2008): “GWAsimulator: a rapid whole-genome simulation program,” Bioinformatics, 24, 140–142.10.1093/bioinformatics/btm549Suche in Google Scholar PubMed

Li, Q., S. Wacholder, D. J. Hunter, R. N. Hoover, S. Chanock, G. Thomas and K. Yu (2009): “Genetic background comparison using distance-based regression, with applications in population stratification evaluation and adjustment,” Genet. Epidemiol., 33, 432–441.10.1002/gepi.20396Suche in Google Scholar PubMed PubMed Central

Li, Q., and K. Yu (2008): “Improved correction for population stratification in genomewide association studies by identifying hidden population structures,” Genet. Epidemiol., 32, 215–226.10.1002/gepi.20296Suche in Google Scholar PubMed

Lin, D. Y. and D. Zeng. (2011): “Correcting for population stratification in genomewide association studies,” J. Am. Stat. Assoc., 106, 997–1008.10.1198/jasa.2011.tm10294Suche in Google Scholar PubMed PubMed Central

Liu, L., D. Zhang, H. Liu and C. Arendt (2013): “Robust methods for population stratification in genome wide association studies,” BMC Bioinformatics, 14, 132.10.1186/1471-2105-14-132Suche in Google Scholar PubMed PubMed Central

Luca, D., S. Ringquist, L. Klei, A. B. Lee, C. Gieger, H. E. Wichmann, S. Schreiber, M. Krawczak, Y. Lu, A. Styche, B. Devlin, K. Roeder and M. Trucco (2008): “On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants,” Am. J. Hum. Genet., 82, 453–63.10.1016/j.ajhg.2007.11.003Suche in Google Scholar PubMed PubMed Central

Lunceford, J. K. and M. Davidian (2004): “Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study,” Stat. Med., 23, 2937–2960.10.1002/sim.1903Suche in Google Scholar PubMed

McPeek, M. and M. Abney (2008): “Association testing with principal-components-based correction for population stratification,” The American Society of Human Genetics, November 13, 2008, Philadelphia, PA.Suche in Google Scholar

Patterson, N., A. L. Price and D. Reich (2006): “Population structure and eigenanalysis,” PLoS Genet., 2, e190.10.1371/journal.pgen.0020190Suche in Google Scholar PubMed PubMed Central

Price, A. L., N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A. Shadick and D. Reich (2006): “Principal components analysis corrects for stratification in genome-wide association studies,” Nat. Genet., 38, 904–909.10.1038/ng1847Suche in Google Scholar PubMed

Price, A. L., N. A. Zaitlen, D. Reich and N. Patterson (2010): “New approaches to population stratification in genome-wide association studies,” Nat. Rev. Genet., 11, 459–463.10.1038/nrg2813Suche in Google Scholar PubMed PubMed Central

Pritchard, J. K. and P. Donnelly (2001): “Case-control studies of association in structured or admixed populations,” Theor. Popul. Biol., 60, 227–237.10.1006/tpbi.2001.1543Suche in Google Scholar PubMed

Pritchard, J. K., M. Stephens, N. A. Rosenberg and P. Donnelly (2000): “Association mapping in structured populations,” Am. J. Hum. Genet., 67, 170–181.10.1086/302959Suche in Google Scholar PubMed PubMed Central

Purcell, S., B. Neale, K. Todd-Brown, L. Thomas, M. A. Ferreira, D. Bender, J. Maller, P. Sklar, P. I. de Bakker, M. J. Daly and P. C. Sham (2007): “PLINK: a tool set for whole-genome association and population-based linkage analyses,” Am. J. Hum. Genet., 81, 559–575.10.1086/519795Suche in Google Scholar PubMed

Ray, D. and S. Basu (2017): “A novel association test for multiple secondary phenotypes from a case-control GWAS,” Genet. Epidemiol., 41, 413–426.10.1002/gepi.22045Suche in Google Scholar PubMed

Rosenbaum, P. R. and D. B. Rubin (1983): “The central role of the propensity score in observational studies for causal effects,” Biometrika, 70, 41–55.10.1093/biomet/70.1.41Suche in Google Scholar

Tracy, C. A. and H. Widom (1993): “Level-spacing distributions and the Airy kernel,” Phys. Lett. B., 305, 115–118.10.1016/0370-2693(93)91114-3Suche in Google Scholar

Tracy, C. A. and H. Widom (1994): “Level-spacing distributions and the Airy kernel,” Commun. Math. Phys., 159, 151–174.10.1007/BF02100489Suche in Google Scholar

Tracy, C. A. and H. Widom (1996): “On orthogonal and symplectic matrix ensembles,” Commun. Math. Phys., 177, 727–754.10.1007/BF02099545Suche in Google Scholar

Voight, B. F. and J. K. Pritchard (2005): “Confounding from cryptic relatedness in case-control association studies,” PLoS Genet., 1:e32.10.1371/journal.pgen.0010032Suche in Google Scholar PubMed PubMed Central

Wan, F. and N. Mitra (2016): “An evaluation of bias in propensity score adjusted non-linear regression models,” Stat. Methods Med. Res., 27:846–862.10.1177/0962280216643739Suche in Google Scholar

Wang, D., Y. Sun, P. Stang, J. A. Berlin, M. A. Wilcox and Q. Li (2009): “Comparison of methods for correcting population stratification in a genome-wide association study of rheumatoid arthritis: Principal-component analysis versus multidimensional scaling,” BMC Proc., 3(Suppl 7), S109.10.1186/1753-6561-3-S7-S109Suche in Google Scholar PubMed PubMed Central

Weir, B. S., A. D. Anderson and A. B. Hepler (2006): “Genetic relatedness analysis: modern data and new challenges,” Nat. Rev. Genet., 7, 771–780.10.1038/nrg1960Suche in Google Scholar PubMed

Zhang, Y. and W. Pan (2015): “Principal component regression and linear mixed model in associaiton analysis of structured samples: competitors or complements?,” Genet. Epidemiol., 39, 149–155.10.1002/gepi.21879Suche in Google Scholar PubMed PubMed Central

Zhang, Z., E. Ersoz, C.-Q. Lai, R. J. Todhunter and H. K. Tiwari (2010): “Mixed linear model approach adapted for genome-wide association studies,” Nat. Genet., 42, 355–360.10.1038/ng.546Suche in Google Scholar PubMed PubMed Central

Zhang, Y., W. Guan and W. Pan (2013a): “Adjustment for population stratification via principal components in association analysis of rare variants,” Genet. Epidemiol., 37, 99–109.10.1002/gepi.21691Suche in Google Scholar PubMed PubMed Central

Zhang, Y., X. Shen and W. Pan (2013b): “Adjusting for population stratification in a fine scale with principal components and sequencing data,” Genet. Epidemiol., 37, 787–801.10.1002/gepi.21764Suche in Google Scholar PubMed PubMed Central

Zhao, H., T. R. Rebbeck and N. Mitra (2009): “A propensity score approach to correction for bias due to population stratification using genetic and non-genetic factors,” Genet. Epidemiol., 33, 679–690.10.1002/gepi.20419Suche in Google Scholar PubMed PubMed Central

Zhao, H., T. R. Rebbeck and N. Mitra (2012): “Analyzing genetic association studies with an extended propensity score approach,” Stat. Appl. Genet. Mol. Biol., 11, ISSN (Online) 1544–6115, DOI: https://doi.org/10.1515/1544-6115.1790.10.1515/1544-6115.1790Suche in Google Scholar PubMed PubMed Central

Zhu, X., S. Li, R. S. Cooper and R. C. Elston (2008): “A unified association analysis approach for family and unrelated samples correcting for stratificaiton,” Am. J. Hum. Genet., 82, 352–365.10.1016/j.ajhg.2007.10.009Suche in Google Scholar PubMed PubMed Central

Zou, F., S. Lee, R. Knowles and F. A. Wright (2010): “Quantification of population structure using correlated SNPs by shrinkage principal components,” Hum. Hered., 70, 9–22.10.1159/000288706Suche in Google Scholar PubMed PubMed Central

Supplementary Material

The online version of this article offers supplementary material (DOI: https://doi.org/10.1515/sagmb-2017-0054).

Published Online: 2018-12-04

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Supplementary Material Details

Artikel in diesem Heft

https://doi.org/10.1515/sagmb-2017-0054

Schlagwörter für diesen Artikel

bias; principal components analysis; propensity score; testicular germ cell tumors; Tracy-Widom statistic

A practical approach to adjusting for population stratification in genome-wide association studies: principal components and propensity scores (PCAPS)

Artikel

Abstract

References

Supplementary Material

Zusatzmaterial

Artikel in diesem Heft

Artikel in diesem Heft

Artikel in diesem Heft