Home Bayesian LASSO for population stratification correction in rare haplotype association studies
Article
Licensed
Unlicensed Requires Authentication

Bayesian LASSO for population stratification correction in rare haplotype association studies

  • Zilu Liu ORCID logo , Asuman Seda Turkmen and Shili Lin EMAIL logo
Published/Copyright: January 19, 2024

Abstract

Population stratification (PS) is one major source of confounding in both single nucleotide polymorphism (SNP) and haplotype association studies. To address PS, principal component regression (PCR) and linear mixed model (LMM) are the current standards for SNP associations, which are also commonly borrowed for haplotype studies. However, the underfitting and overfitting problems introduced by PCR and LMM, respectively, have yet to be addressed. Furthermore, there have been only a few theoretical approaches proposed to address PS specifically for haplotypes. In this paper, we propose a new method under the Bayesian LASSO framework, QBLstrat, to account for PS in identifying rare and common haplotypes associated with a continuous trait of interest. QBLstrat utilizes a large number of principal components (PCs) with appropriate priors to sufficiently correct for PS, while shrinking the estimates of unassociated haplotypes and PCs. We compare the performance of QBLstrat with the Bayesian counterparts of PCR and LMM and a current method, haplo.stats. Extensive simulation studies and real data analyses show that QBLstrat is superior in controlling false positives while maintaining competitive power for identifying true positives under PS.


Corresponding author: Shili Lin, Department of Statistics, The Ohio State University, Columbus, OH 43210, USA, E-mail:

Award Identifier / Grant number: R01GM114142

Funding source: National Center for Advancing Translational Sciences

Award Identifier / Grant number: Unassigned

Funding source: National Heart, Lung, and Blood Institute

Award Identifier / Grant number: Unassigned

Acknowledgment

The authors would like to acknowledge the Hoffman Family Center in Genetics and Epidemiology and the National Center for Advancing Translational Sciences (NCATS) for supporting the DHS study, and the National Heart, Lung, and Blood Institute (NHLBI) for the MESA data collection.

  1. Research ethics: Not applicable.

  2. Author contributions: The authors have accepted responsibility for theentire content of this manuscript and approved its submission.

  3. Competing interests: The authors state no conflict of interest.

  4. Research funding: Research on this project is supported in part by National Institutes of Health (NIH) R01GM114142.

  5. Data availability: The raw data can be obtained on request from the corresponding author.

References

Abegaz, F., Chaichoompu, K., Génin, E., Fardo, D.W., König, I.R., Mahachie John, J.M., and Van Steen, K. (2019). Principals about principal components in statistical genetics. Briefings Bioinf. 20: 2200–2216, https://doi.org/10.1093/bib/bby081.Search in Google Scholar PubMed

Albertsen, H.M., Chettier, R., Farrington, P., and Ward, K. (2013). Genome-wide association study link novel loci to endometriosis. PloS one 8: e58257, https://doi.org/10.1371/journal.pone.0058257.Search in Google Scholar PubMed PubMed Central

Balding, D.J. and Nichols, R.A. (1995). A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96: 3–12, https://doi.org/10.1007/bf01441146.Search in Google Scholar PubMed

Bild, D.E., Bluemke, D.A., Burke, G.L., Detrano, R., Diez Roux, A.V., Folsom, A.R., Greenland, P., JacobsJr, D.R., Kronmal, R., Liu, K., et al.. (2002). Multi-ethnic study of atherosclerosis: objectives and design. Am. J. Epidemiol. 156: 871–881, https://doi.org/10.1093/aje/kwf113.Search in Google Scholar PubMed

Biswas, S. and Lin, S. (2012). Logistic Bayesian lasso for identifying association with rare haplotypes and application to age-related macular degeneration. Biometrics 68: 587–597, https://doi.org/10.1111/j.1541-0420.2011.01680.x.Search in Google Scholar PubMed

Bland, J.M. and Altman, D.G. (1995). Multiple significance tests: the Bonferroni method. BMJ 310: 170, https://doi.org/10.1136/bmj.310.6973.170.Search in Google Scholar PubMed PubMed Central

Brooks, S.P. and Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. J. Comput. Graph Stat. 7: 434–455, https://doi.org/10.2307/1390675.Search in Google Scholar

Burkett, K., Graham, J., and McNeney, B. (2006). hapassoc: software for likelihood inference of trait associations with snp haplotypes and other attributes. J. Stat. Software 16: 1–19, https://doi.org/10.18637/jss.v016.i02.Search in Google Scholar

Chen, H., Hao, Z., Zhao, Y., and Yang, R. (2020). A fast-linear mixed model for genome-wide haplotype association analysis: application to agronomic traits in maize. BMC Genom. 21: 1–9, https://doi.org/10.1186/s12864-020-6552-x.Search in Google Scholar PubMed PubMed Central

Datta, A.S. and Biswas, S. (2016). Comparison of haplotype-based statistical tests for disease association with rare and common variants. Briefings Bioinf. 17: 657–671, https://doi.org/10.1093/bib/bbv072.Search in Google Scholar PubMed PubMed Central

de Luis, D., Izaola, O., Primo, D., Gomez, E., Lopez, J.J., Ortola, A., and Aller, R. (2018). Association of a cholesteryl ester transfer protein variant (rs1800777) with fat mass, hdl cholesterol levels, and metabolic syndrome. Endocrinol. Diab. Nutr. 65: 387–393, https://doi.org/10.1016/j.endien.2018.07.002.Search in Google Scholar

Diao, G. and Lin, D.-y. (2020). Statistically efficient association analysis of quantitative traits with haplotypes and untyped snps in family studies. BMC Genet. 21: 1–11, https://doi.org/10.1186/s12863-020-00902-x.Search in Google Scholar PubMed PubMed Central

Grassmann, F., Heid, I.M., Weber, B.H., and IAMDGC, I.A.G.C. (2017). Recombinant haplotypes narrow the arms2/htra1 association signal for age-related macular degeneration. Genetics 205: 919–924, https://doi.org/10.1534/genetics.116.195966.Search in Google Scholar PubMed PubMed Central

Grindflek, E., Hansen, M.H., Lien, S., and van Son, M. (2018). Genome-wide association study reveals a qtl and strong candidate genes for umbilical hernia in pigs on ssc14. BMC Genom. 19: 1–9, https://doi.org/10.1186/s12864-018-4812-9.Search in Google Scholar PubMed PubMed Central

Guo, W. and Lin, S. (2009). Generalized linear modeling with regularization for detecting common disease rare haplotype association. Genet Epidemiol. 33: 308–316, https://doi.org/10.1002/gepi.20382.Search in Google Scholar PubMed PubMed Central

Hamazaki, K. and Iwata, H. (2020). Rainbow: haplotype-based genome-wide association study using a novel snp-set method. PLoS Comput. Biol. 16: e1007663, https://doi.org/10.1371/journal.pcbi.1007663.Search in Google Scholar PubMed PubMed Central

Hoffman, G.E. (2013). Correcting for population structure and kinship using the linear mixed model: theory and extensions. PloS one 8: e75707, https://doi.org/10.1371/journal.pone.0075707.Search in Google Scholar PubMed PubMed Central

Holland, S.M. (2008). Principal components analysis (pca). Department of Geology, University of Georgia, Athens, GA, pp. 30602–32501.Search in Google Scholar

Hudson, R.R. (2002). Generating samples under a wright–Fisher neutral model of genetic variation. Bioinformatics 18: 337–338, https://doi.org/10.1093/bioinformatics/18.2.337.Search in Google Scholar PubMed

Kang, H.M., Sul, J.H., Service, S.K., Zaitlen, N.A., Kong, S.-y., Freimer, N.B., Sabatti, C., and Eskin, E. (2010). Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42: 348–354, https://doi.org/10.1038/ng.548.Search in Google Scholar PubMed PubMed Central

Kettunen, J., Holmes, M.V., Allara, E., Anufrieva, O., Ohukainen, P., Oliver-Williams, C., Wang, Q., Tillin, T., Hughes, A.D., Kähönen, M., et al.. (2019). Lipoprotein signatures of cholesteryl ester transfer protein and hmg-coa reductase inhibition. PLoS Biol. 17: e3000572, https://doi.org/10.1371/journal.pbio.3000572.Search in Google Scholar PubMed PubMed Central

Lake, S.L., Lyon, H., Tantisira, K., Silverman, E., Weiss, S., Laird, N., and Schaid, D. (2003). Estimation and tests of haplotype-environment interaction when linkage phase is ambiguous. Hum. Hered. 55: 56–65, https://doi.org/10.1159/000071811.Search in Google Scholar PubMed

Lawson, D.J., Davies, N.M., Haworth, S., Ashraf, B., Howe, L., Crawford, A., Hemani, G., Smith, G.D., and Timpson, N.J. (2020). Is population structure in the genetic biobank era irrelevant, a challenge, or an opportunity? Hum. Genet. 139: 23–41, https://doi.org/10.1007/s00439-019-02014-8.Search in Google Scholar PubMed PubMed Central

Li, W., Liu, X., Huang, C., Liu, L., Tan, X., and Wang, X. (2020). The loss-of-function mutation of cetp affects hdlc levels but not apoa1 in patients with acute myocardial infarction. Nutr. Metabol. Cardiovasc. Dis. 31: 602–607.10.1016/j.numecd.2020.10.019Search in Google Scholar PubMed

Lin, W.-Y., Yi, N., Lou, X.-Y., Zhi, D., Zhang, K., Gao, G., Tiwari, H.K., and Liu, N. (2013). Haplotype kernel association test as a powerful method to identify chromosomal regions harboring uncommon causal variants. Genet Epidemiol. 37: 560–570, https://doi.org/10.1002/gepi.21740.Search in Google Scholar PubMed PubMed Central

Liu, Z., Turkmen, A., and Lin, S. (2023). Population stratification correction using Bayesian shrinkage priors for genetic association studies. Ann. Hum. Genet. 87: 302−315.10.1111/ahg.12527Search in Google Scholar PubMed

Manolio, T.A., Collins, F.S., Cox, N.J., Goldstein, D.B., Hindorff, L.A., Hunter, D.J., McCarthy, M.I., Ramos, E.M., Cardon, L.R., Chakravarti, A., et al.. (2009). Finding the missing heritability of complex diseases. Nature 461: 747–753, https://doi.org/10.1038/nature08494.Search in Google Scholar PubMed PubMed Central

Musunuru, K., Romaine, S.P., Lettre, G., Wilson, J.G., Volcik, K.A., Tsai, M.Y., Taylor, H.A.Jr, Schreiner, P.J., Rotter, J.I., Rich, S.S., et al.. (2012). Multi-ethnic analysis of lipid-associated loci: the nhlbi care project. PloS one 7: e36473, https://doi.org/10.1371/journal.pone.0036473.Search in Google Scholar PubMed PubMed Central

Nicoletti, P., Aithal, G.P., Bjornsson, E.S., Andrade, R.J., Sawle, A., Arrese, M., Barnhart, H.X., Bondon-Guitton, E., Hayashi, P.H., Bessone, F., et al.. (2017). Association of liver injury from specific drugs, or groups of drugs, with polymorphisms in hla and other genes in a genome-wide association study. Gastroenterology 152: 1078–1089, https://doi.org/10.1053/j.gastro.2016.12.016.Search in Google Scholar PubMed PubMed Central

Pirim, D., Wang, X., Radwan, Z.H., Niemsiri, V., Bunker, C.H., Barmada, M.M., Kamboh, M.I., and Demirci, F.Y. (2015). Resequencing of lpl in african blacks and associations with lipoprotein–lipid levels. Eur. J. Hum. Genet. 23: 1244–1253, https://doi.org/10.1038/ejhg.2014.268.Search in Google Scholar PubMed PubMed Central

Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A., and Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38: 904–909, https://doi.org/10.1038/ng1847.Search in Google Scholar PubMed

Price, A.L., Zaitlen, N.A., Reich, D., and Patterson, N. (2010). New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet. 11: 459–463, https://doi.org/10.1038/nrg2813.Search in Google Scholar PubMed PubMed Central

Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., De Bakker, P.I., Daly, M.J., et al.. (2007). Plink: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81: 559–575, https://doi.org/10.1086/519795.Search in Google Scholar PubMed PubMed Central

Raftery, A.E., Gilks, W., Richardson, S., and Spiegelhalter, D. (1995). Hypothesis testing and model. In: Markov chain Monte Carlo in Practice. Chapman & Hall, Boca Raton, pp. 165–187.Search in Google Scholar

Raftery, A.E. and Lewis, S.M. (1995). The number of iterations, convergence diagnostics and generic metropolis algorithms. Pract. Markov Chain Monte Carlo 7: 763–773.Search in Google Scholar

Samedy, L.-A., Ryan, G.J., Superko, R.H., and Momary, K.M. (2019). Cetp genotype and concentrations of hdl and lipoprotein subclasses in african–american men. Future Cardiol. 15: 187–195, https://doi.org/10.2217/fca-2018-0058.Search in Google Scholar PubMed

Schaid, D.J., Rowland, C.M., Tines, D.E., Jacobson, R.M., and Poland, G.A. (2002). Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am. J. Hum. Genet. 70: 425–434, https://doi.org/10.1086/338688.Search in Google Scholar PubMed PubMed Central

Trinder, M., Wang, Y., Madsen, C.M., Ponomarev, T., Bohunek, L., Daisely, B.A., Julia Kong, H., Blauw, L.L., Nordestgaard, B.G., Tybjærg-Hansen, A., et al.. (2021). Inhibition of cholesteryl ester transfer protein preserves high-density lipoprotein cholesterol and improves survival in sepsis. Circulation 143: 921–934, https://doi.org/10.1161/circulationaha.120.048568.Search in Google Scholar PubMed

Tzeng, J.-Y. and Bondell, H.D. (2010). A comprehensive approach to haplotype-specific analysis by penalized likelihood. Eur. J. Hum. Genet. 18: 95–103, https://doi.org/10.1038/ejhg.2009.118.Search in Google Scholar PubMed PubMed Central

Van Leeuwen, E.M., Huffman, J.E., Bis, J.C., Isaacs, A., Mulder, M., Sabo, A., Smith, A.V., Demissie, S., Manichaikul, A., Brody, J.A., et al.. (2015). Fine mapping the cetp region reveals a common intronic insertion associated to hdl-c. Aging Mech. Dis. 1: 1–9, https://doi.org/10.1038/npjamd.2015.11.Search in Google Scholar PubMed PubMed Central

Victor, R.G., Haley, R.W., Willett, D.L., Peshock, R.M., Vaeth, P.C., Leonard, D., Basit, M., Cooper, R.S., Iannacchione, V.G., Visscher, W.A., et al.. (2004). The dallas heart study: a population-based probability sample for the multidisciplinary study of ethnic differences in cardiovascular health. Am. J. Cardiol. 93: 1473–1480, https://doi.org/10.1016/j.amjcard.2004.02.058.Search in Google Scholar PubMed

Wang, M. and Lin, S. (2015). Detecting associations of rare variants with common diseases: collapsing or haplotyping? Briefings Bioinf. 16: 759–768, https://doi.org/10.1093/bib/bbu050.Search in Google Scholar PubMed PubMed Central

Weir, B. (1996). Genetic data analysis ii: Methods for discrete population genetic data. Sinauer Associates, Sunderland.Search in Google Scholar

Wojcik, G.L., Graff, M., Nishimura, K.K., Tao, R., Haessler, J., Gignoux, C.R., Highland, H.M., Patel, Y.M., Sorokin, E.P., Avery, C.L., et al.. (2019). Genetic analyses of diverse populations improves discovery for complex traits. Nature 570: 514–518, https://doi.org/10.1038/s41586-019-1310-4.Search in Google Scholar PubMed PubMed Central

Young, A.I. (2019). Solving the missing heritability problem. PLoS Genet. 15: e1008222, https://doi.org/10.1371/journal.pgen.1008222.Search in Google Scholar PubMed PubMed Central

Yuan, X. and Biswas, S. (2019). Bivariate logistic Bayesian lasso for detecting rare haplotype association with two correlated phenotypes. Genet. Epidemiol. 43: 996–1017, https://doi.org/10.1002/gepi.22258.Search in Google Scholar PubMed PubMed Central

Zhang, F. and Deng, H.-W. (2010). Confounding from cryptic relatedness in haplotype-based association studies. Genetica 138: 945–950, https://doi.org/10.1007/s10709-010-9476-6.Search in Google Scholar PubMed

Zhang, H. (2017). Detecting rare haplotype-environmental interaction and nonlinear effects of rare haplotypes using Bayesian LASSO on quantitative traits, PhD thesis. The Ohio State University.Search in Google Scholar

Zhang, Y. and Pan, W. (2015). Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements? Genet. Epidemiol. 39: 149–155, https://doi.org/10.1002/gepi.21879.Search in Google Scholar PubMed PubMed Central


Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/sagmb-2022-0034).


Received: 2022-07-19
Accepted: 2023-12-19
Published Online: 2024-01-19

© 2024 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 16.11.2025 from https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2022-0034/html
Scroll to top button