Abstract
In recent years, meta-analyzing summary results from multiple studies has become a common practice in genomic research, leading to a significant improvement in the power of statistical detection compared to an individual genomic study. Meta analysis methods that combine statistical estimates across studies are known to be statistically more powerful than those combining statistical significance measures. An approach combining effect size estimates based on a fixed-effects model, called METAL, has gained extreme popularity to perform the former type of meta-analysis. In this article, we discuss the limitations of METAL due to its dependence on the theoretical null distribution, leading to incorrect significance testing results. Through various simulation studies and real genomic data application, we show how modifying the z-scores in METAL, using an empirical null distribution, can significantly improve the results, especially in presence of hidden confounders. For the estimation of the null distribution, we consider two different approaches, and we highlight the scenarios when one null estimation approach outperforms the other. This article will allow researchers to gain an insight into the importance of using an empirical null distribution in the fixed-effects meta-analysis as well as in choosing the appropriate empirical null distribution estimation approach.
-
Research ethics: Not applicable.
-
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Use of Large Language Models, AI and Machine Learning Tools: None declared.
-
Conflict of interest: The authors state no conflict of interest.
-
Research funding: None declared.
-
Data availability: Not applicable.
References
Begum, F., Ghosh, D., Tseng, G.C., and Feingold, E. (2012). Comprehensive literature review and statistical considerations for GWAS meta-analysis. Nucleic Acids Res. 40: 3777–3784. https://doi.org/10.1093/nar/gkr1255.Search in Google Scholar PubMed PubMed Central
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57: 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.Search in Google Scholar
Choi, J.K., Yu, U., Kim, S., and Yoo, O.J. (2003). Combining multiple microarray studies and modeling interstudy variation. Bioinformatics 19: i84–i90. https://doi.org/10.1093/bioinformatics/btg1010.Search in Google Scholar PubMed
Dettori, J.R., Norvell, D.C., and Chapman, J.R. (2022). Fixed-effect vs random-effects models for meta-analysis: 3 points to consider. Global Spine J. 12: 1624–1626. https://doi.org/10.1177/21925682221110527.Search in Google Scholar PubMed PubMed Central
Efron, B. (2004). Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Am. Stat. Assoc. 99: 96–104. https://doi.org/10.1198/016214504000000089.Search in Google Scholar
Efron, B. (2007). Size, power and false discovery rates. Ann. Stat. 35: 1351–1377. https://doi.org/10.1214/009053606000001460.Search in Google Scholar
Efron, B. and Tibshirani, R. (1996). Using specially designed exponential families for density estimation. Ann. Stat. 24: 2431–2461. https://doi.org/10.1214/aos/1032181161.Search in Google Scholar
Fisher, R.A. (1925). Statistical methods for research workers. Oliver & Boyd, Edinburgh, Scotland.Search in Google Scholar
Han, B. and Eskin, E. (2011). Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am. J. Hum. Genet. 88: 586–598. https://doi.org/10.1016/j.ajhg.2011.04.014.Search in Google Scholar PubMed PubMed Central
He, Y., Koido, M., Sutoh, Y., Shi, M., Otsuka-Yamasaki, Y., Munter, H.M., Murakami, Y., Morisaki, T., Nagai, A., Murakami, Y., et al.. (2023). East Asian-specific and cross-ancestry genome-wide meta-analyses provide mechanistic insights into peptic ulcer disease. Nat. Genet. 55: 2129–2138. https://doi.org/10.1038/s41588-023-01569-7.Search in Google Scholar PubMed PubMed Central
Hughey, J.J. and Butte, A.J. (2015). Robust meta-analysis of gene expression using the elastic net. Nucleic Acids Res. 43: e79. https://doi.org/10.1093/nar/gkv229.Search in Google Scholar PubMed PubMed Central
Jayanetti, W.T., Chaganty, N.R., and Sikdar, S. (2024). An empirically adjusted weighted ordered p-values meta-analysis method for large-scale simultaneous significance testing in genomic experiments. Res. Methods Med. Health Sci. 5: 37–48, https://doi.org/10.1177/26320843231191645.Search in Google Scholar
Karim, J.N., Bradburn, E., Roberts, N., and Papageorghiou, A.T. (2022). First-trimester ultrasound detection of fetal heart anomalies: systematic review and meta-analysis. Ultrasound Obstet. Gynecol. 59: 11–25. https://doi.org/10.1002/uog.23740.Search in Google Scholar PubMed PubMed Central
Lee, M., Huan, T., McCartney, D.L., Chittoor, G., de Vries, M., Lahousse, L., Nguyen, J.N., Brody, J.A., Castillo-Fernandez, J., Terzikhan, N., et al.. (2022). Pulmonary function and blood DNA methylation: a multiancestry epigenome-wide association meta-analysis. Am. J. Respir. Crit. Care Med. 206: 321–336. https://doi.org/10.1164/rccm.202108-1907oc.Search in Google Scholar PubMed PubMed Central
Li, Y. and Ghosh, D. (2014). Meta-analysis based on weighted ordered p-values for genomic data with heterogeneity. BMC Bioinf. 15: 226. https://doi.org/10.1186/1471-2105-15-226.Search in Google Scholar PubMed PubMed Central
Panagiotou, O.A., Willer, C.J., Hirschhorn, J.N., and Ioannidis, J.P.A. (2013). The power of meta-analysis in genome-wide association studies. Annu. Rev. Genomics Hum. Genet. 14: 441–465. https://doi.org/10.1146/annurev-genom-091212-153520.Search in Google Scholar PubMed PubMed Central
Raftery, A.E. (1996) Hypothesis testing and model selection. In: Gilks, W.R., Richardson, S., and Spiegelhalter, D.J. (Eds.), Markov chain Monte Carlo in practice. Chapman & Hall, London, pp. 163–188.Search in Google Scholar
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43: e47. https://doi.org/10.1093/nar/gkv007.Search in Google Scholar PubMed PubMed Central
Sikdar, S. (2022). Robust meta-analysis for large-scale genomic experiments based on an empirical approach. BMC Med. Res. Methodol. 22: 43. https://doi.org/10.1186/s12874-022-01530-y.Search in Google Scholar PubMed PubMed Central
Sikdar, S., Datta, S., and Datta, S. (2017). EAMA: empirically adjusted meta-analysis for large-scale simultaneous hypothesis testing in genomic experiments. PLoS One 12: e0187287. https://doi.org/10.1371/journal.pone.0187287.Search in Google Scholar PubMed PubMed Central
Sikdar, S., Joehanes, R., Joubert, B.R., Xu, C.J., Vives-Usano, M., Rezwan, F.I., Felix, J.F., Ward, J.M., Guan, W., Richmond, R.C., et al.. (2019). Comparison of smoking-related DNA methylation between newborns from prenatal exposure and adults from personal smoking. Epigenomics 11: 1487–1500. https://doi.org/10.2217/epi-2019-0066.Search in Google Scholar PubMed PubMed Central
Stouffer, S.A., Suchman, E.A., DeVinney, L.C., Star, S.A., and Williams, R.M.JR. (1949). The american soldier: adjustment during army life. Princeton University Press, Princeton.Search in Google Scholar
Urbut, S.M., Wang, G., Carbonetto, P., and Stephens, M. (2019). Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 51: 187–195. https://doi.org/10.1038/s41588-018-0268-8.Search in Google Scholar PubMed PubMed Central
van Iterson, M., van Zwet, E.W., Consortium, B., and Heijmans, B.T. (2017). Controlling bias and inflation in epigenome-and transcriptome-wide association studies using the empirical null distribution. Genome Biol. 18: 19. https://doi.org/10.1186/s13059-016-1131-9.Search in Google Scholar PubMed PubMed Central
Willer, C.J., Li, Y., and Abecasis, G.R. (2010). METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26: 2190–2191. https://doi.org/10.1093/bioinformatics/btq340.Search in Google Scholar PubMed PubMed Central
Yoon, S., Baik, B., Park, T., and Nam, D. (2021). Powerful p-value combination methods to detect incomplete association. Sci. Rep. 11: 6980. https://doi.org/10.1038/s41598-021-86465-y.Search in Google Scholar PubMed PubMed Central
Supplementary Material
The online version of this article offers supplementary material (https://doi.org/10.1515/sagmb-2023-0041).
© 2024 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- Empirically adjusted fixed-effects meta-analysis methods in genomic studies
- A CNN-CBAM-BIGRU model for protein function prediction
- A heavy-tailed model for analyzing miRNA-seq raw read counts
- Flexible model-based non-negative matrix factorization with application to mutational signatures
- Choice of baseline hazards in joint modeling of longitudinal and time-to-event cancer survival data
- Assessing the feasibility of statistical inference using synthetic antibody-antigen datasets
- A global test of hybrid ancestry from genome-scale data
- Integrative pathway analysis with gene expression, miRNA, methylation and copy number variation for breast cancer subtypes
- Bayesian LASSO for population stratification correction in rare haplotype association studies
Articles in the same Issue
- Frontmatter
- Research Articles
- Empirically adjusted fixed-effects meta-analysis methods in genomic studies
- A CNN-CBAM-BIGRU model for protein function prediction
- A heavy-tailed model for analyzing miRNA-seq raw read counts
- Flexible model-based non-negative matrix factorization with application to mutational signatures
- Choice of baseline hazards in joint modeling of longitudinal and time-to-event cancer survival data
- Assessing the feasibility of statistical inference using synthetic antibody-antigen datasets
- A global test of hybrid ancestry from genome-scale data
- Integrative pathway analysis with gene expression, miRNA, methylation and copy number variation for breast cancer subtypes
- Bayesian LASSO for population stratification correction in rare haplotype association studies