Startseite Comparison of statistical methods for finding network motifs
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Comparison of statistical methods for finding network motifs

  • Vanna Albieri und Vanessa Didelez EMAIL logo
Veröffentlicht/Copyright: 14. Juni 2014

Abstract

There has been much recent interest in systems biology for investigating the structure of gene regulatory systems. Such networks are often formed of specific patterns, or network motifs, that are interesting from a biological point of view. Our aim in the present paper is to compare statistical methods specifically with regard to the question of how well they can detect such motifs. One popular approach is by network analysis with Gaussian graphical models (GGMs), which are statistical models associated with undirected graphs, where vertices of the graph represent genes and edges indicate regulatory interactions. Gene expression microarray data allow us to observe the amount of mRNA simultaneously for a large number of genes p under different experimental conditions n, where p is usually much larger than n prohibiting the use of standard methods. We therefore compare the performance of a number of procedures that have been specifically designed to address this large p-small n issue: G-Lasso estimation, Neighbourhood selection, Shrinkage estimation using empirical Bayes for model selection, and PC-algorithm. We found that all approaches performed poorly on the benchmark E. coli network. Hence we systematically studied their ability to detect specific network motifs, pairs, hubs and cascades, in extensive simulations. We conclude that all methods have difficulty detecting hubs, but the PC-algorithm is most promising.


Corresponding author: Vanessa Didelez, School of Mathematics, University of Bristol University Walk, Bristol BS81TW, UK, e-mail:

Acknowledgments

We would like to thank Alberto Roverato, Sofia Massa, Arnoldo Frigessi, and Edmund Jones for helpful comments, and Markus Kalisch for help with the “pcAlgo” package. Financial support from the Leverhulme Trust (Research Fellowship RF-2011-320) is gratefully acknowledged.

References

Alon, U. (2007): “Network motifs: theory and experimental approaches,” Nat. Rev. Genet., 8, 450–461.Suche in Google Scholar

Banerjee, O., L. E. Ghaoui and A. d’Aspremont (2008): “Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data,” J. Mach. Learn. Res., 9, 485–516.Suche in Google Scholar

Barrett, T., T. O. Suzek, D. B. Troup, S. E. Wilhite, W. C. Ngau, P. Ledoux, D. Rudnev, A. E. Lash, W. Fujibuchi and R. Edgar (2007): “Ncbi geo: mining millions of expression profilesdatabase and tools,” Nucl. Acids Res., 35, D562–D566.Suche in Google Scholar

Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. R. Statist. Soc. B, 57, 289–300.Suche in Google Scholar

Castelo, R. and A. Roverato (2009): “Reverse engineering molecular regulatory networks from microarray data with qp-graphs,” J. Comput. Biol., 16, 2621–2650.Suche in Google Scholar

Chiquet, S. A. G. G. M. C. A. C., J. (2009): “SIMoNe: statistical inference for MOdular NEtworks,” Bioinformatics, 25, 417–418.10.1093/bioinformatics/btn637Suche in Google Scholar PubMed

Colombo, D., M. Maathuis, M. Kalisch and T. Richardson (2012): “Learning high-dimensional directed acyclic graphs with latent and selection variables,” Ann. Stat., 40, 294–321.Suche in Google Scholar

Covert, M. W., E. M. Knight, J. L. Reed, M. J. Herrgard and B. O. Palsson (2004): “Integrating high-throughput and computational data elucidates bacterial networks,” Nature, 429, 92–96.10.1038/nature02456Suche in Google Scholar PubMed

Dawid, A. (1979): “Conditional independence in statistical theory,” J. R. Stat. Soc. Ser. B (Methodol.), 41, 1–31.Suche in Google Scholar

Friedman, N. (2004): “Inferring cellular network using probabilistic graphical models,” Science, 303, 799–805.10.1126/science.1094068Suche in Google Scholar PubMed

Friedman, J., T. Hastie and R. Tibshirani (2008): “Sparse inverse covariance estimation with the graphical lasso,” Biostatistics, 9, 432–441.10.1093/biostatistics/kxm045Suche in Google Scholar PubMed PubMed Central

Gama-Castro, S., V. Jimenez-Jacinto, M. Peralta-Gil, A. Santos-Zavaleta, M. I. Pealoza-Spinola, B. Contreras-Moreira, J. Segura-Salazar, L. Muniz-Rascado, I. Martnez-Flores, H. Salgado, C. Bonavides-Martnez, C. Abreu-Goodger, C. Rodrguez-Penagos, J. Miranda-Ros, E. Morett, E. Merino, A. M. Huerta, L. Trevino-Quintanilla and J. Collado-Vides (2008): “Regulondb (version 6.0): gene regulation model of escherichia coli k-12 beyond transcription, active (experimental) annotated promoters and textpresso navigation,” Nucl. Acids Res., 36, D120–D124.Suche in Google Scholar

Hotelling, H. (1953): “New light on the correlation coefficient and its transforms,” J. R. Statist. Soc. B, 15, 193–232.Suche in Google Scholar

Kalisch, M. and P. Bühlmann (2007): “Estimating high-dimensional directed acyclic graphs with the pc-algorithm,” J. Mach. Learn. Res., 8, 613–636.Suche in Google Scholar

Lauritzen, S. (1996): Graphical models, Oxford: Oxford University Press.Suche in Google Scholar

Maathuis, M., M. Kalisch and P. Bühlmann (2009): “Estimating high-dimensional intervention effects from observational data,” Ann. Stat., 37, 3133–3164.Suche in Google Scholar

Meinshausen, N. (2008): “A note on the lasso for graphical gaussian model selection,” Statist. Probab. Lett., 78, 880–884.Suche in Google Scholar

Meinshausen, N. and P. Bühlmann (2006): “High-dimensional graphs and variable selection with the lasso,” Ann. Statist., 34, 1436–1462.Suche in Google Scholar

Meinshausen, N. and P. Bühlmann (2010): “Stability selection,” J. R. Stat. Soc. Ser. B, 72, 417–473.Suche in Google Scholar

Milo, R., S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii and U. Alon (2002): “Network motifs: simple building blocks of complex networks,” Science, 298, 824–827.10.1126/science.298.5594.824Suche in Google Scholar PubMed

Opgen-Rhein, R. and K. Strimmer (2006): “Inferring gene dependency networks from genomic longitudinal data: a functional data approach,” REVSTAT, 4, 53–65.Suche in Google Scholar

Schäfer, J. and K. Strimmer (2005a): “An empirical bayes approach to inferring large-scale gene association networks,” Bioinformatics, 21, 754–764.10.1093/bioinformatics/bti062Suche in Google Scholar PubMed

Schäfer, J. and K. Strimmer (2005b): “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics,” Stat. Appl. Genet. Mol. Biol., 4, 1–32.10.2202/1544-6115.1175Suche in Google Scholar PubMed

Sing, T., O. Sander, N. Beerenwinkel and T. Lengauer (2005): “Rocr: visualizing classifier performance in r,” Bioinformatics, 21, 3940–3941.10.1093/bioinformatics/bti623Suche in Google Scholar PubMed

Spirtes, P. and C. Glymour (1991): “An algorithm for fast recovery of sparse causal graphs,” Soc. Sci. Comput. Rev., 9, 62–72.Suche in Google Scholar

Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. R. Statist. Soc. B, 50, 267–288.Suche in Google Scholar

Uhler, C. (2012): “Geometry of maximum likelihood estimation in gaussian graphical models,” The Annals of Statistics, 40, 238–261.10.1214/11-AOS957Suche in Google Scholar


Supplemental Material

The online version of this article (DOI 10.1515/sagmb-2013-0017) offers supplementary material, available to authorized users.


Published Online: 2014-6-14
Published in Print: 2014-8-1

© 2014 by De Gruyter

Heruntergeladen am 17.11.2025 von https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2013-0017/pdf
Button zum nach oben scrollen