Home Comparison of statistical methods for finding network motifs
Article
Licensed
Unlicensed Requires Authentication

Comparison of statistical methods for finding network motifs

  • Vanna Albieri and Vanessa Didelez EMAIL logo
Published/Copyright: June 14, 2014

Abstract

There has been much recent interest in systems biology for investigating the structure of gene regulatory systems. Such networks are often formed of specific patterns, or network motifs, that are interesting from a biological point of view. Our aim in the present paper is to compare statistical methods specifically with regard to the question of how well they can detect such motifs. One popular approach is by network analysis with Gaussian graphical models (GGMs), which are statistical models associated with undirected graphs, where vertices of the graph represent genes and edges indicate regulatory interactions. Gene expression microarray data allow us to observe the amount of mRNA simultaneously for a large number of genes p under different experimental conditions n, where p is usually much larger than n prohibiting the use of standard methods. We therefore compare the performance of a number of procedures that have been specifically designed to address this large p-small n issue: G-Lasso estimation, Neighbourhood selection, Shrinkage estimation using empirical Bayes for model selection, and PC-algorithm. We found that all approaches performed poorly on the benchmark E. coli network. Hence we systematically studied their ability to detect specific network motifs, pairs, hubs and cascades, in extensive simulations. We conclude that all methods have difficulty detecting hubs, but the PC-algorithm is most promising.


Corresponding author: Vanessa Didelez, School of Mathematics, University of Bristol University Walk, Bristol BS81TW, UK, e-mail:

Acknowledgments

We would like to thank Alberto Roverato, Sofia Massa, Arnoldo Frigessi, and Edmund Jones for helpful comments, and Markus Kalisch for help with the “pcAlgo” package. Financial support from the Leverhulme Trust (Research Fellowship RF-2011-320) is gratefully acknowledged.

References

Alon, U. (2007): “Network motifs: theory and experimental approaches,” Nat. Rev. Genet., 8, 450–461.Search in Google Scholar

Banerjee, O., L. E. Ghaoui and A. d’Aspremont (2008): “Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data,” J. Mach. Learn. Res., 9, 485–516.Search in Google Scholar

Barrett, T., T. O. Suzek, D. B. Troup, S. E. Wilhite, W. C. Ngau, P. Ledoux, D. Rudnev, A. E. Lash, W. Fujibuchi and R. Edgar (2007): “Ncbi geo: mining millions of expression profilesdatabase and tools,” Nucl. Acids Res., 35, D562–D566.Search in Google Scholar

Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. R. Statist. Soc. B, 57, 289–300.Search in Google Scholar

Castelo, R. and A. Roverato (2009): “Reverse engineering molecular regulatory networks from microarray data with qp-graphs,” J. Comput. Biol., 16, 2621–2650.Search in Google Scholar

Chiquet, S. A. G. G. M. C. A. C., J. (2009): “SIMoNe: statistical inference for MOdular NEtworks,” Bioinformatics, 25, 417–418.10.1093/bioinformatics/btn637Search in Google Scholar PubMed

Colombo, D., M. Maathuis, M. Kalisch and T. Richardson (2012): “Learning high-dimensional directed acyclic graphs with latent and selection variables,” Ann. Stat., 40, 294–321.Search in Google Scholar

Covert, M. W., E. M. Knight, J. L. Reed, M. J. Herrgard and B. O. Palsson (2004): “Integrating high-throughput and computational data elucidates bacterial networks,” Nature, 429, 92–96.10.1038/nature02456Search in Google Scholar PubMed

Dawid, A. (1979): “Conditional independence in statistical theory,” J. R. Stat. Soc. Ser. B (Methodol.), 41, 1–31.Search in Google Scholar

Friedman, N. (2004): “Inferring cellular network using probabilistic graphical models,” Science, 303, 799–805.10.1126/science.1094068Search in Google Scholar PubMed

Friedman, J., T. Hastie and R. Tibshirani (2008): “Sparse inverse covariance estimation with the graphical lasso,” Biostatistics, 9, 432–441.10.1093/biostatistics/kxm045Search in Google Scholar PubMed PubMed Central

Gama-Castro, S., V. Jimenez-Jacinto, M. Peralta-Gil, A. Santos-Zavaleta, M. I. Pealoza-Spinola, B. Contreras-Moreira, J. Segura-Salazar, L. Muniz-Rascado, I. Martnez-Flores, H. Salgado, C. Bonavides-Martnez, C. Abreu-Goodger, C. Rodrguez-Penagos, J. Miranda-Ros, E. Morett, E. Merino, A. M. Huerta, L. Trevino-Quintanilla and J. Collado-Vides (2008): “Regulondb (version 6.0): gene regulation model of escherichia coli k-12 beyond transcription, active (experimental) annotated promoters and textpresso navigation,” Nucl. Acids Res., 36, D120–D124.Search in Google Scholar

Hotelling, H. (1953): “New light on the correlation coefficient and its transforms,” J. R. Statist. Soc. B, 15, 193–232.Search in Google Scholar

Kalisch, M. and P. Bühlmann (2007): “Estimating high-dimensional directed acyclic graphs with the pc-algorithm,” J. Mach. Learn. Res., 8, 613–636.Search in Google Scholar

Lauritzen, S. (1996): Graphical models, Oxford: Oxford University Press.Search in Google Scholar

Maathuis, M., M. Kalisch and P. Bühlmann (2009): “Estimating high-dimensional intervention effects from observational data,” Ann. Stat., 37, 3133–3164.Search in Google Scholar

Meinshausen, N. (2008): “A note on the lasso for graphical gaussian model selection,” Statist. Probab. Lett., 78, 880–884.Search in Google Scholar

Meinshausen, N. and P. Bühlmann (2006): “High-dimensional graphs and variable selection with the lasso,” Ann. Statist., 34, 1436–1462.Search in Google Scholar

Meinshausen, N. and P. Bühlmann (2010): “Stability selection,” J. R. Stat. Soc. Ser. B, 72, 417–473.Search in Google Scholar

Milo, R., S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii and U. Alon (2002): “Network motifs: simple building blocks of complex networks,” Science, 298, 824–827.10.1126/science.298.5594.824Search in Google Scholar PubMed

Opgen-Rhein, R. and K. Strimmer (2006): “Inferring gene dependency networks from genomic longitudinal data: a functional data approach,” REVSTAT, 4, 53–65.Search in Google Scholar

Schäfer, J. and K. Strimmer (2005a): “An empirical bayes approach to inferring large-scale gene association networks,” Bioinformatics, 21, 754–764.10.1093/bioinformatics/bti062Search in Google Scholar PubMed

Schäfer, J. and K. Strimmer (2005b): “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics,” Stat. Appl. Genet. Mol. Biol., 4, 1–32.10.2202/1544-6115.1175Search in Google Scholar PubMed

Sing, T., O. Sander, N. Beerenwinkel and T. Lengauer (2005): “Rocr: visualizing classifier performance in r,” Bioinformatics, 21, 3940–3941.10.1093/bioinformatics/bti623Search in Google Scholar PubMed

Spirtes, P. and C. Glymour (1991): “An algorithm for fast recovery of sparse causal graphs,” Soc. Sci. Comput. Rev., 9, 62–72.Search in Google Scholar

Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. R. Statist. Soc. B, 50, 267–288.Search in Google Scholar

Uhler, C. (2012): “Geometry of maximum likelihood estimation in gaussian graphical models,” The Annals of Statistics, 40, 238–261.10.1214/11-AOS957Search in Google Scholar


Supplemental Material

The online version of this article (DOI 10.1515/sagmb-2013-0017) offers supplementary material, available to authorized users.


Published Online: 2014-6-14
Published in Print: 2014-8-1

© 2014 by De Gruyter

Downloaded on 17.11.2025 from https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2013-0017/pdf
Scroll to top button