Startseite A novel method to accurately calculate statistical significance of local similarity analysis for high-throughput time series
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

A novel method to accurately calculate statistical significance of local similarity analysis for high-throughput time series

  • Fang Zhang , Ang Shan und Yihui Luan EMAIL logo
Veröffentlicht/Copyright: 17. November 2018

Abstract

In recent years, a large number of time series microbial community data has been produced in molecular biological studies, especially in metagenomics. Among the statistical methods for time series, local similarity analysis is used in a wide range of environments to capture potential local and time-shifted associations that cannot be distinguished by traditional correlation analysis. Initially, the permutation test is popularly applied to obtain the statistical significance of local similarity analysis. More recently, a theoretical method has also been developed to achieve this aim. However, all these methods require the assumption that the time series are independent and identically distributed. In this paper, we propose a new approach based on moving block bootstrap to approximate the statistical significance of local similarity scores for dependent time series. Simulations show that our method can control the type I error rate reasonably, while theoretical approximation and the permutation test perform less well. Finally, our method is applied to human and marine microbial community datasets, indicating that it can identify potential relationship among operational taxonomic units (OTUs) and significantly decrease the rate of false positives.

Award Identifier / Grant number: 11371227, 61432010, 11626247

Funding statement: The research was supported by the Natural Science Foundation of China Grants (Funder Id: 10.13039/501100001809, 11371227, 61432010, 11626247).

Appendix A. Supplementary Materials

The type I error rate performance of three models with different time delays are shown in Supplementary Materials.

References

Andersson, M. G. I., M. Berga, E. S. Lindström and S. Langenheder (2014): “The spatial structure of bacterial communities is influenced by historical environmental conditions,” Ecology, 95, 1134–1140.10.1890/13-1300.1Suche in Google Scholar PubMed

Balasubramaniyan, R., E. Hüllermeier, N. Weskamp and J. Kämper (2005): “Clustering of gene expression data using a local shape-based similarity measure,” Bioinformatics, 21, 1069–1077.10.1093/bioinformatics/bti095Suche in Google Scholar PubMed

Barberán, A., S. T. Bates, E. O. Casamayor and N. Fierer (2011): “Using network analysis to explore co-occurrence patterns in soil microbial communities,” ISME J., 6, 343–351.10.1038/ismej.2011.119Suche in Google Scholar PubMed PubMed Central

Beman, J. M., J. A. Steele and J. A. Fuhrman (2011): “Co-occurrence patterns for abundant marine archaeal and bacterial lineages in the deep chlorophyll maximum of coastal california,” ISME J., 5, 1077–1085.10.1038/ismej.2010.204Suche in Google Scholar PubMed PubMed Central

Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: A practical and powerful approach to multiple testing,” J. R. Stat. Soc. B, 57, 289–300.10.1111/j.2517-6161.1995.tb02031.xSuche in Google Scholar

Berkowitz, J. and L. Kilian (2000): “Recent developments in bootstrapping time series,” Economet. Rev., 19, 1–48.10.1080/07474930008800457Suche in Google Scholar

Caporaso, J. G., C. L. Lauber, E. K. Costello, D. Berg-Lyons, A. Gonzalez, J. Stombaugh, D. Knights, P. Gajer, J. Ravel, N. Fierer, J. I. Gordon and R. Knight (2011): “Moving pictures of the human microbiome,” Genome Biol., 12, R50.10.1186/gb-2011-12-5-r50Suche in Google Scholar PubMed PubMed Central

Carlstein, E. (1986): “The use of subseries values for estimating the variance of a general statistic from a stationary sequence,” Ann. Stat., 14, 1171–1179.10.1214/aos/1176350057Suche in Google Scholar

Chaffron, S., H. Rehrauer, J. Pernthaler and C. von Mering (2010): “A global network of coexisting microbes from environmental and whole-genome sequence data,” Genome Res., 20, 947–959.10.1101/gr.104521.109Suche in Google Scholar PubMed PubMed Central

Cram, J. A., L. C. Xia, D. M. Needham, R. Sachdeva, F. Sun and J. A. Fuhrman (2015): “Cross-depth analysis of marine bacterial networks suggests downward propagation of temporal changes,” ISME J., 9, 2573–2586.10.1038/ismej.2015.76Suche in Google Scholar PubMed PubMed Central

Durno, W. E., Hanson, N. W., Konwar, K. M & Hallam, S. J. 2013, ‘Expanding the boundaries of local similarity analysis’, BMC Genomics, vol. 14, pp. S3–.10.1186/1471-2164-14-S1-S3Suche in Google Scholar PubMed PubMed Central

Faust, K., J. F. Sathirapongsasuti, J. Izard, N. Segata, D. Gevers, J. Raes and C. Huttenhower (2012): “Microbial co-occurrence relationships in the human microbiome,” PLOS Comput. Biol., 8, 1–17.10.1371/journal.pcbi.1002606Suche in Google Scholar PubMed PubMed Central

Faust, K., L. Lahti, D. Gonze, W. M. de Vos and J. Raes (2015): “Metagenomics meets time series analysis: unraveling microbial community dynamics,” Curr. Opin. Microbiol., 25, 56–66.10.1016/j.mib.2015.04.004Suche in Google Scholar PubMed

Fierer, N., D. Nemergut, R. Knight and J. M. Craine (2010): “Changes through time: integrating microorganisms into the study of succession,” Res. Microbiol., 161, 635–642.10.1016/j.resmic.2010.06.002Suche in Google Scholar PubMed

Fuhrman, J. A., I. Hewson, M. S. Schwalbach, J. A. Steele, M. V. Brown and S. Naeem (2006): “Annually reoccurring bacterial communities are predictable from ocean conditions,” Proc. Natl. Acad. Sci. USA, 103, 13104–13109.10.1073/pnas.0602399103Suche in Google Scholar PubMed PubMed Central

Gilbert, J. A., J. A. Steele, J. G. Caporaso, L. Steinbrück, J. Reeder, B. Temperton, S. Huse, A. C. McHardy, R. Knight, I. Joint, P. Somerfield, J. A. Fuhrman and D. Field (2012): “Defining seasonal marine microbial community dynamics,” ISME J., 6, 298–308.10.1038/ismej.2011.107Suche in Google Scholar PubMed PubMed Central

Giovannoni, S. J. and K. L. Vergin (2012): “Seasonality in ocean microbial communities,” Science, 335, 671–676.10.1126/science.1198078Suche in Google Scholar PubMed

Gonçalves, J. and S. Madeira (2014): “Latebiclustering: Efficient heuristic algorithm for time-lagged bicluster identification,” IEEE/ACM T. Comput. Bi, 11, 801–813.10.1109/TCBB.2014.2312007Suche in Google Scholar PubMed

Ji, L. and K.-L. Tan (2004): “Mining gene expression data for positive and negative co-regulated gene clusters,” Bioinformatics, 20, 2711–2718.10.1093/bioinformatics/bth312Suche in Google Scholar PubMed

Künsch, H. R. (1989): “The jackknife and the bootstrap for general stationary observations,” Ann. Stat., 17, 1217–1241.10.1214/aos/1176347265Suche in Google Scholar

Liu, R. Y. and K. Singh (1992): Moving blocks jackknife and bootstrap capture weak dependence, New York: John Wiley, pp. 225–248.Suche in Google Scholar

Lagnoux, A., S. Mercier, P. Vallois (2017): “Statistical significance based on length and position of the local score in a model of i.i.d. sequences,” Bioinformatics, 33, 654–660.10.1093/bioinformatics/btw699Suche in Google Scholar PubMed

Ljung, G. M. and G. E. P. Box (1978): “On a measure of lack of fit in time series models,” Biometrika, 65, 297–303.10.1093/biomet/65.2.297Suche in Google Scholar

Madeira, S. C., M. C. Teixeira, I. Sa-Correia and A. L. Oliveira (2010): “Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm,” IEEE/ACM T. Comput. Bi, 7, 153–165.10.1109/TCBB.2008.34Suche in Google Scholar PubMed

Mudelsee, M. (2010): Climate Time Series Analysis: Classical Statistical and Bootstrap Methods, Dordrecht: Atmospheric and Oceanographic Sciences Library, Springer.10.1007/978-90-481-9482-7Suche in Google Scholar

Palmer, C., E. M. Bik, D. B. DiGiulio, D. A. Relman and P. O. Brown (2007): “Development of the human infant intestinal microbiota,” PLOS Biol., 5, 1–18.10.1371/journal.pbio.0050177Suche in Google Scholar PubMed PubMed Central

Pei, Y., Q. Gao, J. Li and X. Zhao (2014): “Identifying local co-regulation relationships in gene expression data,” J. Theor. Biol., 360, 200–207.10.1016/j.jtbi.2014.06.032Suche in Google Scholar PubMed

Qian, J., M. Dolled-Filhart, J. Lin, H. Yu and M. Gerstein (2001): “Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions11edited by f. cohen,” J. Mol. Biol., 314, 1053–1066.10.1006/jmbi.2000.5219Suche in Google Scholar PubMed

Qin, J., R. Li, J. Raes, M. Arumugam, K. S. Burgdorf, C. Manichanh, T. Nielsen, N. Pons, F. Levenez, T. Yamada, D. R. Mende, J. Li, J. Xu, S. Li, D. Li, J. Cao, B. Wang, H. Liang, H. Zheng, Y. Xie, J. Tap, P. Lepage, M. Bertalan, J.-M. Batto, T. Hansen, D. Le Paslier, A. Linneberg, H. B. Nielsen, E. Pelletier, P. Renault, T. Sicheritz-Ponten, K. Turner, H. Zhu, C. Yu, S. Li, M. Jian, Y. Zhou, Y. Li, X. Zhang, S. Li, N. Qin, H. Yang, J. Wang, S. Brunak, J. Doré, F. Guarner, K. Kristiansen, O. Pedersen, J. Parkhill, J. Weissenbach, M. Consortium, P. Bork, S. D. Ehrlich and J. Wang (2010): “A human gut microbial gene catalogue established by metagenomic sequencing,” Nature, 464, 59–65.10.1038/nature08821Suche in Google Scholar PubMed PubMed Central

Ruan, Q., D. Dutta, M. S. Schwalbach, J. A. Steele, J. A. Fuhrman and F. Sun (2006): “Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors,” Bioinformatics, 22, 2532–2538.10.1093/bioinformatics/btl417Suche in Google Scholar PubMed

Shade, A., J. S. Read, N. D. Youngblut, N. Fierer, R. Knight, T. K. Kratz, N. R. Lottig, E. E. Roden, E. H. Stanley, J. Stombaugh, R. J. Whitaker, C. H. Wu and K. D. McMahon (2012): “Lake microbial communities are resilient after a whole-ecosystem disturbance,” ISME J., 6, 2153–2167.10.1038/ismej.2012.56Suche in Google Scholar PubMed PubMed Central

Shade, A., J. Gregory Caporaso, J. Handelsman, R. Knight and N. Fierer (2013): “A meta-analysis of changes in bacterial and archaeal communities with time,” ISME J., 7, 1493–1506.10.1038/ismej.2013.54Suche in Google Scholar PubMed PubMed Central

Sherman, M., F. M. Speed Jr and F. M. Speed (1998): “Analysis of tidal data via the blockwise bootstrap,” J. Appl. Stat., 25, 333–340.10.1080/02664769823061Suche in Google Scholar

Steele, J. A., P. D. Countway, L. Xia, P. D. Vigil, J. M. Beman, D. Y. Kim, C.-E. T. Chow, R. Sachdeva, A. C. Jones, M. S. Schwalbach, J. M. Rose, I. Hewson, A. Patel, F. Sun, D. A. Caron and J. A. Fuhrman (2011): “Marine bacterial, archaeal and protistan association networks reveal ecological linkages,” ISME J., 5, 1414–1425.10.1038/ismej.2011.24Suche in Google Scholar PubMed PubMed Central

Storey, J. D. (2002): “A direct approach to false discovery rates,” J. R. Stat. Soc. B, 64, 479–498.10.1111/1467-9868.00346Suche in Google Scholar

Storey, J. D., A. J. Bass, A. Dabney and D. Robinson (2015): qvalue: Q-value estimation for false discovery rate control. R package version 2.6.0.Suche in Google Scholar

The Human Microbiome Project Consortium. (2012): “Structure, function and diversity of the healthy human microbiome,” Nature, 486, 207–214.10.1038/nature11234Suche in Google Scholar PubMed PubMed Central

Trosvik, P., N. C. Stenseth and K. Rudi (2010): “Convergent temporal dynamics of the human infant gut microbiota,” ISME J., 4, 151–158.10.1038/ismej.2009.96Suche in Google Scholar PubMed

Weiss, S., W. V. Treuren, C. Lozupone, K. Faust, J. Friedman, D. Ye, L. C. Xia, Z. Z. Xu, L. Ursell, E. J. Alm, A. Birmingham, J. A. Cram, J. A. Fuhrman, J. Raes, F. Sun, J. Zhou and R. Knight (2016): “Correlation detection strategies in microbial data sets vary widely in sensitivityand precision.” ISME J., 10, 1669–1681.10.1038/ismej.2015.235Suche in Google Scholar PubMed PubMed Central

Waterman, M. S. (1995): Introduction to Computational Biology: Maps, Sequences and Genomes, NY, USA: Chapman and Hall/CRC.10.1007/978-1-4899-6846-3Suche in Google Scholar

Xia, L. C., J. A. Steele, J. A. Cram, Z. G. Cardon, S. L. Simmons, J. J. Vallino, J. A. Fuhrman and F. Sun (2011): “Extended local similarity analysis (elsa) of microbial community and other time series data with replicates,” BMC Syst. Biol., 5, S15.10.1186/1752-0509-5-S2-S15Suche in Google Scholar PubMed PubMed Central

Xia, L. C., D. Ai, J. Cram, J. A. Fuhrman and F. Sun (2013): “Efficient statistical significance approximation for local similarity analysis of high-throughput time series data,” Bioinformatics, 29, 230–237.10.1093/bioinformatics/bts668Suche in Google Scholar PubMed PubMed Central

Xia, L. C., D. Ai, J. A. Cram, X. Liang, J. A. Fuhrman and F. Sun (2015): “Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of markov chains,” BMC Bioinformatics, 16, 301.10.1186/s12859-015-0732-8Suche in Google Scholar PubMed PubMed Central

Zhou, J., Y. Deng, P. Zhang, K. Xue, Y. Liang, J. D. Van Nostrand, Y. Yang, Z. He, L. Wu, D. A. Stahl, T. C. Hazen, J. M. Tiedje and A. P. Arkin (2014): “Stochasticity, succession, and environmental perturbations in a fluidic ecosystem,” Proc. Natl. Acad. Sci. USA, 111, 836–845.10.1073/pnas.1324044111Suche in Google Scholar PubMed PubMed Central


Supplementary Material

The online version of this article offers supplementary material (DOI: https://doi.org/10.1515/sagmb-2018-0019).


Published Online: 2018-11-17

©2018 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 22.9.2025 von https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2018-0019/html
Button zum nach oben scrollen