GMEPS: a fast and efficient likelihood approach for genome-wide mediation analysis under extreme phenotype sequencing
-
Janaka S. S. Liyanage
Abstract
Due to many advantages such as higher statistical power of detecting the association of genetic variants in human disorders and cost saving, extreme phenotype sequencing (EPS) is a rapidly emerging study design in epidemiological and clinical studies investigating how genetic variations associate with complex phenotypes. However, the investigation of the mediation effect of genetic variants on phenotypes is strictly restrictive under the EPS design because existing methods cannot well accommodate the non-random extreme tails sampling process incurred by the EPS design. In this paper, we propose a likelihood approach for testing the mediation effect of genetic variants through continuous and binary mediators on a continuous phenotype under the EPS design (GMEPS). Besides implementing in EPS design, it can also be utilized as a general mediation analysis procedure. Extensive simulations and two real data applications of a genome-wide association study of benign ethnic neutropenia under EPS design and a candidate-gene study of neurocognitive performance in patients with sickle cell disease under random sampling design demonstrate the superiority of GMEPS under the EPS design over widely used mediation analysis procedures, while demonstrating compatible capabilities under the general random sampling framework.
Acknowledgments
We acknowledge dbGAP for approval of our use of benign ethnic neutropenia data. The data were obtained from Matthew Hsieh’s ancillary proposal to the Reasons of Geographic and Racial Differences in Stroke (REGARDS) study. Matthew Hsieh is supported by the intramural research program of NHLBI and NIDDK at NIH. Genotyping services were provided by the Center for Inherited Disease Research (CIDR). CIDR is funded through a federal contract from the National Institutes of Health to The Johns Hopkins University, contract number HHSN268200782096C and HHSN268201100011I. This manuscript was also prepared using CSSCD Research Materials obtained from the NHLBI Biologic Specimen and Data Repository information Coordinating Center and does not necessarily reflect the opinions or views of the CSSCD or the NHLBI. We acknowledge the High Performance Computing Facility (HPCF) at SJCRH for providing shared HPC resources that have contributed to the research results reported within this article. We also thank reviewers whose suggestions helped improve and clarify this manuscript.
-
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: This research is supported by the American Lebanese and Syrian Associated Charities (ALSAC).
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
References
Amanat, S., Requena, T., and Lopez-Escamez, J.A. (2020). A systematic review of extreme phenotype strategies to search for rare variants in genetic studies of complex disorders. Genes 11: 987. https://doi.org/10.3390/genes11090987.Search in Google Scholar PubMed PubMed Central
Barnett, I.J., Lee, S., and Lin, X. (2013). Detecting rare variant effects using extreme phenotype sampling in sequencing association studies. Genet. Epidemiol. 37: 142–151. https://doi.org/10.1002/gepi.21699.Search in Google Scholar PubMed PubMed Central
Berrettini, W., Yuan, X., Tozzi, F., Song, K., Francks, C., Chilcoat, H., Waterworth, D., Muglia, P., and Mooser, V. (2008). α − 5/α − 3 nicotinic receptor subunit alleles increase risk for heavy smoking. Mol. Psychiatr. 13: 368–373. https://doi.org/10.1038/sj.mp.4002154.Search in Google Scholar PubMed PubMed Central
Bi, W., Li, Y., Smeltzer, M.P., Gao, G., Zhao, S., and Kang, G. (2020). Steps: an efficient prospective likelihood approach to genetic association analyses of secondary traits in extreme phenotype sequencing. Biostatistics 21: 33–49. https://doi.org/10.1093/biostatistics/kxy030.Search in Google Scholar PubMed PubMed Central
Dai, J.Y., Stanford, J.L., and LeBlanc, M. (2020). A multiple-testing procedure for high-dimensional mediation hypotheses. J. Am. Stat. Assoc. 0: 1–16. https://doi.org/10.1080/01621459.2020.1765785.Search in Google Scholar PubMed PubMed Central
Fernández, J.A., Prats, J.M., Artero, J.V., Mora, A.C., Fariñas, A.V., Espinal, A., and Méndez, J.A. (2012). Systemic inflammation in 222.841 healthy employed smokers and nonsmokers: white blood cell count and relationship to spirometry. Tob. Induc. Dis. 10: 7. https://doi.org/10.1186/1617-9625-10-7.Search in Google Scholar PubMed PubMed Central
Howard, V.J., Cushman, M., Pulley, L., Gomez, C.R., Go, R.C., Prineas, R.J., Graham, A., Moy, C.S., and Howard, G. (2005). The reasons for geographic and racial differences in stroke study: objectives and design. Neuroepidemiology 25: 135–143. https://doi.org/10.1159/000086678.Search in Google Scholar PubMed
Hutton, J., Fatima, T., Major, T.J., Topless, R., Stamp, L.K., Merriman, T.R., and Dalbeth, N. (2018). Mediation analysis to understand genetic relationships between habitual coffee intake and gout. Arthritis Res. Ther. 20. https://doi.org/10.1186/s13075-018-1629-5.Search in Google Scholar PubMed PubMed Central
Imai, K., Keele, L., and Tingley, D. (2010a). A general approach to causal mediation analysis. Psychol. Methods 15: 309–334. https://doi.org/10.1037/a0020761.Search in Google Scholar PubMed
Imai, K., Keele, L., and Yamamoto, T. (2010b). Identification, inference and sensitivity analysis for causal mediation effects. Stat. Sci. 25: 51–71. https://doi.org/10.1214/10-sts321.Search in Google Scholar
Johar, A.S., Anaya, J.-M., Andrews, D., Patel, H.R., Field, M., Goodnow, C., and Arcos-Burgos, M. (2015). Candidate gene discovery in autoimmunity by using extreme phenotypes, next generation sequencing and whole exome capture. Autoimmun. Rev. 14: 204–209. https://doi.org/10.1016/j.autrev.2014.10.021.Search in Google Scholar PubMed
Kang, G., Lin, D., Hakonarson, H., and Chen, J. (2012). Two-stage extreme phenotype sequencing design for discovering and testing common and rare genetic variants: efficiency and power. Hum. Hered. 73: 139–147. https://doi.org/10.1159/000337300.Search in Google Scholar PubMed PubMed Central
Kang, G., Bi, W., Zhao, Y., Zhang, J.F., Yang, J.J., Xu, H., Loh, M.L., Hunger, S.P., Relling, M.V.P.S., and Cheng, C. (2014). A new system identification approach to identify genetic variants in sequencing studies for a binary phenotype. Hum. Hered. 78: 104–116. https://doi.org/10.1159/000363660.Search in Google Scholar PubMed PubMed Central
Kim, S., Forno, E., Yan, Q., Jiang, Y., Zhang, R., Boutaoui, N., Acosta-Pérez, E., Canino, G., Chen, W., and Celedón, J. (2020). Snps identified by gwas affect asthma risk through dna methylation and expression of cis-genes in airway epithelium. Eur. Respir. J. 55: 1902079. https://doi.org/10.1183/13993003.02079-2019.Search in Google Scholar PubMed PubMed Central
Korrick, S., Hunter, D., Rotnitzky, A., Hu, H., and Speizer, F. (1999). Lead and hypertension in a sample of middle-aged women. Am. J. Publ. Health 89: 330–335. https://doi.org/10.2105/ajph.89.3.330.Search in Google Scholar PubMed PubMed Central
Li, Y., Schneider, J.A., and Bennett, D.A. (2007). Estimation of the mediation effect with a binary mediator. Stat. Med. 26: 3398–3414. https://doi.org/10.1002/sim.2730.Search in Google Scholar PubMed
Liu, Z., Shen, J., Barfield, R., Schwartz, J., Baccarelli, A.A., and Lin, X. (2021). Large-scale hypothesis testing for causal mediation effects with applications in genome-wide epigenetic studies. J. Am. Stat. Assoc. 0: 1–15. https://doi.org/10.1080/01621459.2021.1914634.Search in Google Scholar PubMed PubMed Central
Lutz, S.M. and Hokanson, J.E. (2015). Mediation analysis in genome-wide association studies: current perspectives. Open Access Bioinf. 7: 1–5. https://doi.org/10.2147/oab.s63643.Search in Google Scholar
Mackinnon, D. and Dwyer, J. (1993). Estimating mediated effects in prevention studies. Eval. Rev. 17: 144–158. https://doi.org/10.1177/0193841x9301700202.Search in Google Scholar
MacKinnon, D.P., Lockwood, C.M., Hoffman, J.M., West, S.G., and Sheets, V. (2002). A comparison of methods to test mediation and other intervening variable effects. Psychol. Methods 7: 83–104. https://doi.org/10.1037/1082-989x.7.1.83.Search in Google Scholar PubMed PubMed Central
Partanen, M., Kang, G., Wang, W., Krull, K., King, A., Schreiber, J., Porter, J., Hodges, J., Hankins, J., and Jacola, L. (2020). Association between hydroxycarbamide exposure and neurocognitive function in adolescents with sickle cell disease. Br. J. Haematol. 189: 1192–1203. https://doi.org/10.1111/bjh.16519.Search in Google Scholar PubMed
Pedersen, K., Çolak, Y., Ellervik, C., Hasselbalch, H., Bojesen, S., and Nordestgaard, B. (2019). Smoking and increased white and red blood cells. Arterioscler. Thromb. Vasc. Biol. 39: 965–977. https://doi.org/10.1161/atvbaha.118.312338.Search in Google Scholar
Peloso, G., Rader, D.G.S., Kathiresan, S., Daly, M., and Neale, B. (2016). Phenotypic extremes in rare variant study designs. Eur. J. Hum. Genet. 24: 924–930.https://doi.org/10.1038/ejhg.2015.197.Search in Google Scholar PubMed PubMed Central
Pierce, B., Tong, L., Chen, L., Rahaman, R., Argos, M., Farzana, J., Roy, S., Paul-Brutus, R., Westra, H., Franke, L., et al.. (2014). Mediation analysis demonstrates that trans-eqtls are often explained by cis-mediation: a genome-wide analysis among 1,800 south asians. PLoS Genet. 10: e1004818. https://doi.org/10.1371/journal.pgen.1004818.Search in Google Scholar PubMed PubMed Central
Rampersaud, E., Kang, G., Palmer, L.E., Rashkin, S.R., Wang, S., Bi, W., Alberts, N.M., Anghelescu, D., Barton, M., Birch, K., et al.. (2021). A polygenic score for acute vaso-occlusive pain in pediatric sickle cell disease. Blood Adv. 5: 2839–2851. https://doi.org/10.1182/bloodadvances.2021004634.Search in Google Scholar PubMed PubMed Central
Reich, D., Nalls, M., Kao, W., Akylbekova, E., Tandon, A., Patterson, N., Mullikin, J., Hsueh, W., Cheng, C., Coresh, J., et al.. (2009). Reduced neutrophil count in people of african descent is due to a regulatory variant in the duffy antigen receptor for chemokines gene. PLoS Genet. 5: e1000360. https://doi.org/10.1371/journal.pgen.1000360.Search in Google Scholar PubMed PubMed Central
Reiner, A., Lettre, G., Nalls, M., Ganesh, S., Mathias, R., Austin, M., Eric, D., Sampath, A., Angela, B., Zhao, C., et al.. (2011). Genome-wide association study of white blood cell count in 16,388 african americans: the continental origins and genetic epidemiology network (cogent). PLoS Genet. 7: e1002108. https://doi.org/10.1371/journal.pgen.1002108.Search in Google Scholar PubMed PubMed Central
Ruffieux, N., Njamnshi, A., Wonkam, A., Hauert, C., Chanal, J., Verdon, V., Fonsah, J., Eta, S., Doh, R., Ngamaleu, R., et al.. (2013). Association between biological markers of sickle cell disease and cognitive functioning amongst cameroonian children. Child Neuropsychol. 19: 143–160. https://doi.org/10.1080/09297049.2011.640932.Search in Google Scholar PubMed
Siedlinski, M., Tingley, D.L.P.J., Cho, M.H., Litonjua, A.A., Sparrow, D., Bakke, P., Gulsvik, A., Lomas, D.A., Anderson, W., Kong, X., et al.. (2013). Dissecting direct and indirect genetic effects on chronic obstructive pulmonary disease (copd) susceptibility. Hum. Genet. 132: 431–441. https://doi.org/10.1007/s00439-012-1262-3.Search in Google Scholar PubMed PubMed Central
Song, N., Shin, A., Jung, H., Oh, J., and Kim, J. (2017). Effects of interactions between common genetic variants and smoking on colorectal cancer. BMC Cancer 17: 869. https://doi.org/10.1186/s12885-017-3886-0.Search in Google Scholar PubMed PubMed Central
The Tobacco and Genetics Consortium, Furberg, H., Kim, Y., Jennifer, D., Eric, B., Nora, F., Diego, A., Luisa, B., Pier, L.M., Francesco, M., et al.. (2010). Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat. Genet. 42: 441–447. https://doi.org/10.1038/ng.571.Search in Google Scholar PubMed PubMed Central
Tingley, D., Yamamoto, T., Hirose, K., Keele, L., and Imai, K. (2014). Mediation: R package for causal mediation analysis. J. Stat. Software 59: 1–38. https://doi.org/10.18637/jss.v059.i05.Search in Google Scholar
Valeri, L. (2012). Statistical methods for causal mediation analysis, Doctoral dissertation. Cambridge, Massachusetts, USA, Harvard University.Search in Google Scholar
VanderWeele, T. and Vansteelandt, S. (2010). Odds ratios for mediation analysis for a dichotomous outcome. Am. J. Epidemiol. 172: 1339–1348. https://doi.org/10.1093/aje/kwq332.Search in Google Scholar PubMed PubMed Central
VanderWeele, T. and Vansteelandt, S. (2014). Mediation analysis with multiple mediators. Epidemiol. Methods 2: 95–115. https://doi.org/10.1515/em-2012-0010.Search in Google Scholar PubMed PubMed Central
VanderWeele, T.J. (2016). Mediation analysis: a practitioner’s guide. Annu. Rev. Publ. Health 37: 17–32, PMID: 26653405. https://doi.org/10.1146/annurev-publhealth-032315-021402.Search in Google Scholar PubMed
Wang, W., Enos, L., Gallagher, G., Thompson, R., Guarini, L., Vichinsky, E., Wright, E., Zimmerman, R., and Daniel Armstrong, F. (2001). Neuropsychologic performance in school-aged children with sickle cell disease: a report from the Cooperative Study of Sickle Cell Disease. J. Pediatr. 139: 391–397. https://doi.org/10.1067/mpd.2001.116935.Search in Google Scholar PubMed
Weuve, J., Korrick, S., Weisskopf, M., Ryan, L., Schwartz, J., Nie, H., Grodstein, F., and Hu, H. (2009). Cumulative exposure to lead in relation to cognitive function in older women. Environ. Health Perspect. 117: 574–580. https://doi.org/10.1289/ehp.11846.Search in Google Scholar PubMed PubMed Central
Winship, C. and Mare, R.D. (1983). Structural equations and path analysis for discrete data. Am. J. Sociol. 89: 54–110. https://doi.org/10.1086/227834.Search in Google Scholar
Zhong, W., Spracklen, C., Mohlke, K., Zheng, X., Fine, J., and Li, Y. (2019). Multi-snp mediation intersection-union test. Bioinformatics 35: 4724–4729. https://doi.org/10.1093/bioinformatics/btz285.Search in Google Scholar PubMed PubMed Central
Zhong, W., Darville, T., Zheng, X., Fine, J., and Li, Y. (2020). Generalized multi-snp mediation intersection-union test. Biometrics.10.1111/biom.13418Search in Google Scholar PubMed PubMed Central
Supplementary Material
The online version of this article offers supplementary material (https://doi.org/10.1515/sagmb-2021-0071).
© 2022 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Review Article
- Challenges for machine learning in RNA-protein interaction prediction
- Research Articles
- Distinct characteristics of correlation analysis at the single-cell and the population level
- pwrBRIDGE: a user-friendly web application for power and sample size estimation in batch-confounded microarray studies with dependent samples
- Use of SVM-based ensemble feature selection method for gene expression data analysis
- A robust association test with multiple genetic variants and covariates
- Estimation of the covariance structure from SNP allele frequencies
- GMEPS: a fast and efficient likelihood approach for genome-wide mediation analysis under extreme phenotype sequencing
- Sparse latent factor regression models for genome-wide and epigenome-wide association studies
Articles in the same Issue
- Review Article
- Challenges for machine learning in RNA-protein interaction prediction
- Research Articles
- Distinct characteristics of correlation analysis at the single-cell and the population level
- pwrBRIDGE: a user-friendly web application for power and sample size estimation in batch-confounded microarray studies with dependent samples
- Use of SVM-based ensemble feature selection method for gene expression data analysis
- A robust association test with multiple genetic variants and covariates
- Estimation of the covariance structure from SNP allele frequencies
- GMEPS: a fast and efficient likelihood approach for genome-wide mediation analysis under extreme phenotype sequencing
- Sparse latent factor regression models for genome-wide and epigenome-wide association studies