Home P-value calibration for multiple testing problems in genomics
Article
Licensed
Unlicensed Requires Authentication

P-value calibration for multiple testing problems in genomics

  • John P. Ferguson EMAIL logo and Dean Palejev
Published/Copyright: October 18, 2014

Abstract

Conservative statistical tests are often used in complex multiple testing settings in which computing the type I error may be difficult. In such tests, the reported p-value for a hypothesis can understate the evidence against the null hypothesis and consequently statistical power may be lost. False Discovery Rate adjustments, used in multiple comparison settings, can worsen the unfavorable effect. We present a computationally efficient and test-agnostic calibration technique that can substantially reduce the conservativeness of such tests. As a consequence, a lower sample size might be sufficient to reject the null hypothesis for true alternatives, and experimental costs can be lowered. We apply the calibration technique to the results of DESeq, a popular method for detecting differentially expressed genes from RNA sequencing data. The increase in power may be particularly high in small sample size experiments, often used in preliminary experiments and funding applications.


Corresponding author: John P. Ferguson, Department of Nephrology, Graduate Entry Medical School, University of Limerick, Clinical Academic Liaison Building, St Nessans Road, Limerick, Ireland, e-mail: ;

Acknowledgments

The numerical simulations shown in this article were performed on computational resources provided under EU FP7 project EGI-InSPIRE (contract number RI-261323), and described in Atanassov et al. (forthcoming). The authors would like to thank the Associate Editor and anonymous reviewers for their thoughtful suggestions and comments which resulted in a significant improvement of this work.

References

Anders, S. and W. Huber (2010): “Differential expression analysis for sequence count data,” Genome Biol., 11, R106+.10.1186/gb-2010-11-10-r106Search in Google Scholar PubMed PubMed Central

Anders, S. and W. Huber (2013): “Differential expression of RNA-Seq data at the gene level – the DESeq package,” http://www.bioconductor.org/packages/2.12/bioc/html/DESeq.html.Search in Google Scholar

Atanassov, E., T. Gurov, A. Karaivanova, S. Ivanovska, M. Durchova, D. Georgiev and D. Dimitrov (forthcoming): “Tuning for scalability on hybrid HPC cluster,” Math. Industry.Search in Google Scholar

Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. Roy. Stat. Soc. B Met., 57, 289–300.Search in Google Scholar

Bolouri, H. and W. L. Ruzzo (2012): “Integration of 198 ChIP-seq datasets reveals human cis-regulatory regions,” J. Comput. Biol., 19, 989–997.Search in Google Scholar

Bottomly, D., N. A. Walter, J. E. E. Hunter, P. Darakjian, S. Kawane, K. J. Buck, R. P. Searles, M. Mooney, S. K. McWeeney and R. Hitzemann (2011): “Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum using RNA-Seq and microarrays,” PLos One, 6, e17820+.10.1371/journal.pone.0017820Search in Google Scholar PubMed PubMed Central

Brooks, A. N., L. Yang, M. O. Duff, K. D. Hansen, J. W. Park, S. Dudoit, S. E. Brenner and B. R. Graveley (2011): “Conservation of an RNA regulatory map between Drosophila and mammals,” Genome Res., 21, 193–202.Search in Google Scholar

Burgess, D. (2013): “Genetic screens: RNA-seq into the toolkit,” Nature Reviews Genetics 14, 154–155. DOI: 10.1038/nrg3432.10.1038/nrg3432Search in Google Scholar PubMed

Cantor, R. M., K. Lange and J. S. Sinsheimer (2010): “Prioritizing GWAS results: a review of statistical methods and recommendations for their application,” Am. J. Hum. Genet., 86, 6–22.Search in Google Scholar

Casella, G. and R. L. Berger (2002): Statistical inference, Thomson learning, 2nd edition, Duxbury Press: Belmont, CA.Search in Google Scholar

Devlin, B. and K. Roeder (1999): “Genomic control for association studies,” Biometrics, 55, 997–1004.10.1111/j.0006-341X.1999.00997.xSearch in Google Scholar

Efron, B. (2008): “Microarrays, empirical bayes and the two-groups model,” Stat. Sci., 23, 1–22.Search in Google Scholar

Fodor, A. A., T. L. Tickle and C. Richardson (2007): “Towards the uniform distribution of null p values on affymetrix microarrays,” Genome Biol., 8, R69.Search in Google Scholar

Frazee, A., B. Langmead and J. Leek (2011): “ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets,” BMC Bioinformatics, 12, 449+.10.1186/1471-2105-12-449Search in Google Scholar PubMed PubMed Central

Furey, T. S. (2012): “ChIP–seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions,” Nat. Rev. Genet., 13, 840–852.Search in Google Scholar

Hindorff, L. A., P. Sethupathy, H. A. Junkins, E. M. Ramos, J. P. Mehta, F. S. Collins and T. A. Manolio (2009): “Potential etiologic and functional implications of genome-wide association loci for human diseases and traits,” Proc. Natl. Acad. Sci., 106, 9362–9367.Search in Google Scholar

Hubbard, R. and M. Bayarri (2003): “Confusion over measures of evidence (p’s) versus errors (α’s) in classical statistical testing,” Am. Stat., 57, 171–178.Search in Google Scholar

Kvam, V. M., P. Liu and Y. Si (2012): “A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data,” Am. J. Botany, 99, 248–256.10.3732/ajb.1100340Search in Google Scholar PubMed

Lehmann, E. L. and J. P. Romano (2005): Testing statistical hypotheses (Springer Texts in Statistics), Springer: New York, NY.Search in Google Scholar

Mardis, E. R. (2007): “ChIP-seq: welcome to the new frontier,” Nat. Methods, 4, 613–613.Search in Google Scholar

McCarthy, M. I., G. R. Abecasis, L. R. Cardon, D. B. Goldstein, J. Little, J. P. Ioannidis and J. N. Hirschhorn (2008): “Genome-wide association studies for complex traits: consensus, uncertainty and challenges,” Nat. Rev. Genet., 9, 356–369.Search in Google Scholar

Mortazavi, A., B. Williams, K. McCue, L. Schaeffer and B. Wold (2008): “Mapping and quantifying mammalian transcriptomes by RNA-Seq,” Nat. Methods, 5, 621–628.Search in Google Scholar

Muralidharan, O., G. Natsoulis, J. Bell, H. Ji and N. R. Zhang (2012): “Detecting mutations in mixed sample sequencing data using empirical bayes,” Ann. Appl. Stat., 6, 1047–1067.Search in Google Scholar

Price, A. L., N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A. Shadick and D. Reich (2006): “Principal components analysis corrects for stratification in genome-wide association studies,” Nat. Genet., 38, 904–909.Search in Google Scholar

R Core Team (2013): “R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria,” http://www.R-project.org/.Search in Google Scholar

Sellke, T., M. J. Bayarri and J. O. Berger (2001): “Calibration of p values for testing precise null hypotheses,” Am. Stat., 55, 62–71.Search in Google Scholar

Skol, A. D., L. J. Scott, G. R. Abecasis and M. Boehnke (2006): “Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies,” Nat. Genet., 38, 209–213.Search in Google Scholar

Soneson, C. and M. Delorenzi (2013): “A comparison of methods for differential expression analysis of RNA-seq data,” BMC Bioinformatics, 14, 91+.10.1186/1471-2105-14-91Search in Google Scholar PubMed PubMed Central

Wang, W. Y., B. J. Barratt, D. G. Clayton and J. A. Todd (2005): “Genome-wide association studies: theoretical and practical concerns,” Nat. Rev. Genet., 6, 109–118.Search in Google Scholar

Wang, Z., M. Gerstein and M. Snyder (2009): “RNA-Seq: a revolutionary tool for transcriptomics,” Nat. Rev. Genet., 10, 57–63.Search in Google Scholar

Wasserman, L. (2006): All of nonparametric statistics, Springer: New York, NY.Search in Google Scholar


Supplemental Material

The online version of this article (DOI: 10.1515/sagmb-2013-0074) offers supplementary material, available to authorized users.


Published Online: 2014-10-18
Published in Print: 2014-12-1

©2014 by De Gruyter

Downloaded on 25.10.2025 from https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2013-0074/html
Scroll to top button