Startseite pwrBRIDGE: a user-friendly web application for power and sample size estimation in batch-confounded microarray studies with dependent samples
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

pwrBRIDGE: a user-friendly web application for power and sample size estimation in batch-confounded microarray studies with dependent samples

  • Qing Xia , Jeffrey A. Thompson und Devin C. Koestler EMAIL logo
Veröffentlicht/Copyright: 10. Oktober 2022

Abstract

Batch effect Reduction of mIcroarray data with Dependent samples usinG Empirical Bayes (BRIDGE) is a recently developed statistical method to address the issue of batch effect correction in batch-confounded microarray studies with dependent samples. The key component of the BRIDGE methodology is the use of samples run as technical replicates in two or more batches, “bridging samples”, to inform batch effect correction/attenuation. While previously published results indicate a relationship between the number of bridging samples, M, and the statistical power of downstream statistical testing on the batch-corrected data, there is of yet no formal statistical framework or user-friendly software, for estimating M to achieve a specific statistical power for hypothesis tests conducted on the batch-corrected data. To fill this gap, we developed pwrBRIDGE, a simulation-based approach to estimate the bridging sample size, M, in batch-confounded longitudinal microarray studies. To illustrate the use of pwrBRIDGE, we consider a hypothetical, longitudinal batch-confounded study whose goal is to identify Alzheimer’s disease (AD) progression-associated genes from amnestic mild cognitive impairment (aMCI) to AD in human blood after a 5-year follow-up. pwrBRIDGE helps researchers design and plan batch-confounded microarray studies with dependent samples to avoid over- or under-powered studies.


Corresponding author: Devin C. Koestler, Department of Biostatistics & Data Science, University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66106, USA, Email:

Award Identifier / Grant number: P20 GM103418

Award Identifier / Grant number: P20 GM130423

Award Identifier / Grant number: P30 CA168524

Funding source: University of Kansas

Award Identifier / Grant number: Unassigned

Acknowledgments

We would like to extend our gratitude to members of the Statistical ‘Omics Working Group in the Department of Biostatistics & Data Science at the University of Kansas Medical Center for their constructive feedback on this manuscript. In particular, a special thanks to: Lisa Neums, Shachi Patel, Shelby Bell-Glenn, Whitney Shae, Bo Zhang, Samuel Boyd, Emily Nissen, Jonah Amponsah, Dr. Prabhakar Chalise, Dr. Jinxiang Hu, Dr. Lynn Chollet-Hinton, Dr. Nanda Yellapu, Dr. Dong Pei, and Dr. Mihaela Sardiu.

  1. Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: Research reported was supported by: the National Cancer Institute (NCI) Cancer Center Support Grant P30 CA168524; the Kansas IDeA Network of Biomedical Research Excellence Bioinformatics Core, supported by the National Institute of General Medical Science award P20 GM103418; the Kansas Institute for Precision Medicine COBRE, supported by the National Institute of General Medical Science award P20 GM130423.

  3. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References

Allen, M., Carrasquillo, M.M., Funk, C., Heavner, B.D., Zou, F., Younkin, C.S., Burgess, J.D., Chai, H.S., Crook, J., Eddy, J.A., et al.. (2016). Human whole genome genotype and transcriptome data for Alzheimer’s and other neurodegenerative diseases. Sci. Data 3: 1–10. https://doi.org/10.1038/sdata.2016.89.Suche in Google Scholar PubMed PubMed Central

Banchereau, R., Hong, S., Cantarel, B., Baldwin, N., Baisch, J., Edens, M., Cepika, A.M., Acs, P., Turner, J., Anguiano, E., et al.. (2016). Personalized immunomonitoring uncovers molecular networks that stratify lupus patients. Cell 165: 551–565. https://doi.org/10.1016/j.cell.2016.05.057.Suche in Google Scholar PubMed

Beer, J.C., Tustison, N.J., Cook, P.A., Davatzikos, C., Sheline, Y.I., Shinohara, R.T., Linn, K.A., and Alzheimer’s Disease Neuroimaging Initiative. (2020). Longitudinal ComBat: a method for harmonizing longitudinal multi-scanner imaging data. Neuroimage 220: 117129. https://doi.org/10.1016/j.neuroimage.2020.117129.Suche in Google Scholar PubMed PubMed Central

Blaise, B.J., Correia, G., Tin, A., Young, J.H., Vergnaud, A.C., Lewis, M., Pearce, J.T., Elliott, P., Nicholson, J.K., Holmes, E., et al.. (2016). Power analysis and sample size determination in metabolic phenotyping. Anal. Chem. 88: 5179–5188. https://doi.org/10.1021/acs.analchem.6b00188.Suche in Google Scholar PubMed

Chapuis, J., Hot, D., Hansmannel, F., Kerdraon, O., Ferreira, S., Hubans, C., Maurage, C.A., Huot, L., Bensemain, F., Laumet, G., et al.. (2009). Transcriptomic and genetic studies identify IL-33 as a candidate gene for Alzheimer’s disease. Mol. Psychiatr. 14: 1004–1016. https://doi.org/10.1038/mp.2009.10.Suche in Google Scholar PubMed PubMed Central

Chen, R. and Snyder, M. (2013). Promise of personalized omics to precision medicine. Wiley Interdiscip. Rev. Syst. Biol. Med. 5: 73–82. https://doi.org/10.1002/wsbm.1198.Suche in Google Scholar PubMed PubMed Central

Ching, T., Huang, S., and Garmire, L.X. (2014). Power analysis and sample size estimation for RNA-Seq differential expression. RNA 20: 1684–1696. https://doi.org/10.1261/rna.046011.114.Suche in Google Scholar PubMed PubMed Central

De Jager, P.L., Ma, Y., McCabe, C., Xu, J., Vardarajan, B.N., Felsky, D., Klein, H.U., White, C.C., Peters, M.A., Lodgson, B., et al.. (2018). A multi-omic atlas of the human frontal cortex for aging and Alzheimer’s disease research. Sci. Data 5: 1–13. https://doi.org/10.1038/sdata.2018.142.Suche in Google Scholar PubMed PubMed Central

Feng, S., Wang, S., Chen, C.C., and Lan, L. (2011). GWAPower: a statistical power calculation software for genome-wide association studies with quantitative traits. BMC Genet. 12: 12. https://doi.org/10.1186/1471-2156-12-12.Suche in Google Scholar PubMed PubMed Central

Goh, W.W.B., Wang, W., and Wong, L.J.T.i.b. (2017). Why batch effects matter in omics data, and how to avoid them. Trends Biotechnol. 35: 498–507. https://doi.org/10.1016/j.tibtech.2017.02.012.Suche in Google Scholar PubMed

Graw, S., Henn, R., Thompson, J.A., and Koestler, D.C. (2019). pwrEWAS: a user-friendly tool for comprehensive power estimation for epigenome wide association studies (EWAS). BMC Bioinf. 20: 218. https://doi.org/10.1186/s12859-019-2804-7.Suche in Google Scholar PubMed PubMed Central

Guo, Y., Zhao, S., Li, C.I., Sheng, Q., and Shyr, Y. (2014). RNAseqPS: a web tool for estimating sample size and power for RNAseq experiment. Cancer Inf. 13(Suppl. 6): 1–5. https://doi.org/10.4137/CIN.S17688.Suche in Google Scholar PubMed PubMed Central

Hasin, Y., Seldin, M., and Lusis, A. (2017). Multi-omics approaches to disease. Genome Biol. 18: 83. https://doi.org/10.1186/s13059-017-1215-1.Suche in Google Scholar PubMed PubMed Central

Hejblum, B.P., Skinner, J., and Thiébaut, R.J.P.c.b. (2015). Time-course gene set analysis for longitudinal gene expression data. PLOS Comput. Biol. 11: e1004310. https://doi.org/10.1371/journal.pcbi.1004310.Suche in Google Scholar PubMed PubMed Central

Hicks, S.C., Townes, F.W., Teng, M., and Irizarry, R.A.J.B. (2018). Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19: 562–578. https://doi.org/10.1093/biostatistics/kxx053.Suche in Google Scholar PubMed PubMed Central

Johnson, W.E., Li, C., and Rabinovic, A. (2007). Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8: 118–127. https://doi.org/10.1093/biostatistics/kxj037.Suche in Google Scholar PubMed

La Rosa, P.S., Brooks, J.P., Deych, E., Boone, E.L., Edwards, D.J., Wang, Q., Sodergren, E., Weinstock, G., and Shannon, W.D. (2012). Hypothesis testing and power calculations for taxonomic-based human microbiome data. PLoS One 7: e52078. https://doi.org/10.1371/journal.pone.0052078.Suche in Google Scholar PubMed PubMed Central

Li, J., Bushel, P.R., Chu, T.M., and Wolfinger, R.D. (2009). Principal variance components analysis: estimating batch effects in microarray gene expression data. In: Batch effects noise in microarray experiments: sources solutions. John Wiley and Sons, Chichester, West Sussex, UK, pp. 141–154.10.1002/9780470685983.ch12Suche in Google Scholar

Müller, C., Schillert, A., Röthemeier, C., Trégouët, D.-A., Proust, C., Binder, H., Pfeiffer, N., Beutel, M., Lackner, K.J., Schnabel, R.B., et al.. (2016). Removing batch effects from longitudinal gene expression-quantile normalization plus ComBat as best approach for microarray transcriptome data. PLoS One 11: e0156594. https://doi.org/10.1371/journal.pone.0156594.Suche in Google Scholar PubMed PubMed Central

Panitch, R., Hu, J., Chung, J., Zhu, C., Meng, G., Xia, W., Bennett, D.A., Lunetta, K.L., Ikezu, T., Au, R., et al.. (2021). Integrative brain transcriptome analysis links complement component 4 and HSPA2 to the APOE ε2 protective effect in Alzheimer disease. Mol. Psychiatr. 26: 1–11. https://doi.org/10.1038/s41380-021-01266-z.Suche in Google Scholar PubMed PubMed Central

Scherer, A. (2009). Batch effects and noise in microarray experiments: sources and solutions, Vol. 868. John Wiley and Sons, Chichester, West Sussex, UK.10.1002/9780470685983Suche in Google Scholar

Schwartzman, A. and Lin, X. (2011). The effect of correlation in false discovery rate estimation. Biometrika 98: 199–214. https://doi.org/10.1093/biomet/asq075.Suche in Google Scholar PubMed PubMed Central

Sharma, A., Cao, E.Y., Kumar, V., Zhang, X., Leong, H.S., Wong, A.M.L., Ramakrishnan, N., Hakimullah, M., Teo, H.M.V., Chong, F.T., et al.. (2018). Longitudinal single-cell RNA sequencing of patient-derived primary cells reveals drug-induced infidelity in stem cell hierarchy. Nat. Commun. 9: 4931. https://doi.org/10.1038/s41467-018-07261-3.Suche in Google Scholar PubMed PubMed Central

Skol, A.D., Scott, L.J., Abecasis, G.R., and Boehnke, M. (2006). Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 38: 209–213. https://doi.org/10.1038/ng1706.Suche in Google Scholar PubMed

Soon, W.W., Hariharan, M., and Snyder, M.P. (2013). High-throughput sequencing for biology and medicine. Mol. Syst. Biol. 9: 640. https://doi.org/10.1038/msb.2012.61.Suche in Google Scholar PubMed PubMed Central

Syed, H., Jorgensen, A.L., and Morris, A.P. (2016). SurvivalGWAS_Power: a user friendly tool for power calculations in pharmacogenetic studies with “time to event” outcomes. BMC Bioinf. 17: 1–8. https://doi.org/10.1186/s12859-016-1407-9.Suche in Google Scholar PubMed PubMed Central

Tang, Z.-Z., Chen, G., and Alekseyenko, A.V. (2016). PERMANOVA-S: association test for microbial community composition that accommodates confounders and multiple distances. Bioinformatics 32: 2618–2625. https://doi.org/10.1093/bioinformatics/btw311.Suche in Google Scholar PubMed PubMed Central

Tasaki, S., Suzuki, K., Kassai, Y., Takeshita, M., Murota, A., Kondo, Y., Ando, T., Nakayama, Y., Okuzono, Y., Takiguchi, M., et al.. (2018). Multi-omics monitoring of drug response in rheumatoid arthritis in pursuit of molecular remission. Nat. Commun. 9: 1–12. https://doi.org/10.1038/s41467-018-05044-4.Suche in Google Scholar PubMed PubMed Central

Taub, M.A., Bravo, H.C., and Irizarry, R.A.J.G.m. (2010a). Overcoming bias and systematic errors in next generation sequencing data. Genome Med. 2: 87. https://doi.org/10.1186/gm208.Suche in Google Scholar PubMed PubMed Central

Taub, M.A., Bravo, H.C., and Irizarry, R.A. (2010b). Overcoming bias and systematic errors in next generation sequencing data. Genome Med. 2: 87. https://doi.org/10.1186/gm208.Suche in Google Scholar

Vancoillie, L., Hebberecht, L., Dauwe, K., Demecheleer, E., Dinakis, S., Vaneechoutte, D., Mortier, V., and Verhofstede, C. (2017). Longitudinal sequencing of HIV-1 infected patients with low-level viremia for years while on ART shows no indications for genetic evolution of the virus. Virology 510: 185–193. https://doi.org/10.1016/j.virol.2017.07.010.Suche in Google Scholar PubMed

Wong, C.-J., Wang, L.H., Friedman, S.D., Shaw, D., Campbell, A.E., Budech, C.B., Lewis, L.M., Lemmers, R.J., Statland, J.M., van der Maarel, S.M., et al.. (2020). Longitudinal measures of RNA expression and disease activity in FSHD muscle biopsies. Hum. Mol. Genet. 29: 1030–1043. https://doi.org/10.1093/hmg/ddaa031.Suche in Google Scholar PubMed PubMed Central

Wu, H., Wang, C., and Wu, Z. (2015). PROPER: comprehensive power evaluation for differential expression using RNA-seq. Bioinformatics 31: 233–241. https://doi.org/10.1093/bioinformatics/btu640.Suche in Google Scholar PubMed PubMed Central

Xia, Q., Thompson, J.A., and Koestler, D.C. (2021). Batch effect reduction of microarray data with dependent samples using an empirical Bayes approach (BRIDGE). Stat. Appl. Genet. Mol. Biol. 20: 101–119. https://doi.org/10.1515/sagmb-2021-0020.Suche in Google Scholar PubMed

Zou, F., Chai, H.S., Younkin, C.S., Allen, M., Crook, J., Pankratz, V.S., Carrasquillo, M.M., Rowley, C.N., Nair, A.A., Middha, S., et al.. (2012). Brain expression genome-wide association study (eGWAS) identifies human disease-associated variants. PLoS Genet. 8: e1002707. https://doi.org/10.1371/journal.pgen.1002707.Suche in Google Scholar PubMed PubMed Central


Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/sagmb-2022-0003).


Received: 2022-01-19
Revised: 2022-07-21
Accepted: 2022-09-16
Published Online: 2022-10-10

© 2022 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 8.10.2025 von https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2022-0003/html
Button zum nach oben scrollen