Abstract
Inference on gene regulatory networks from high-throughput expression data turns out to be one of the main current challenges in systems biology. Such networks can be very insightful for the deep understanding of interactions between genes. Because genes-gene interactions is often viewed as joint contributions to known biological mechanisms, inference on the dependence among gene expressions is expected to be consistent to some extent with the functional characterization of genes which can be derived from ontologies (GO, KEGG, …). The present paper introduces a sparse factor model as a general framework either to account for a prior knowledge on joint contributions of modules of genes to latent biological processes or to infer on the corresponding co-expression network. We propose an ℓ1 – regularized EM algorithm to fit a sparse factor model for correlation. We demonstrate how it helps extracting modules of genes and more generally improves the gene clustering performance. The method is compared to alternative estimation procedures for sparse factor models of relevance networks in a simulation study. The integration of a biological knowledge based on the gene ontology (GO) is also illustrated on a liver expression data generated to understand adiposity variability in chicken.
Funding source: Agence Nationale de la Recherche
Award Identifier / Grant number: FatInteger ANR-11-BSV7-0004
Funding statement: Agence Nationale de la Recherche, (Grant/Award Number: ‘FatInteger ANR-11-BSV7-0004’).
References
Aittokallio, T. and B. Schwikowski (2006): “Graph-based methods for analyzing networks in cell biology,” Brief. Bioinform., 7, 243–255.10.1093/bib/bbl022Search in Google Scholar PubMed
Banerjee, O., A. El Ghaoui and A. d’Aspremont (2008): “Model selection through sparse maximum likelihood estimation,” J. Mach. Learn. Res., 9, 485–516.Search in Google Scholar
Blum, Y., G. Le Mignon, S. Lagarrigue and D. Causeur (2010): “A factor model to analyze heterogeneity in gene expression,” BMC Bioinformatics, 11, 368.10.1186/1471-2105-11-368Search in Google Scholar PubMed PubMed Central
Buja, A. and N. Eyuboglu (1992): “Remarks on parallel analysis,” Multivar. Behav. Res., 27, 509–540.10.1207/s15327906mbr2704_2Search in Google Scholar PubMed
Butte, A., P. Tamayo, D. Slonim, T. Golub and I. Kohane (2000): “Discovering functional relationships between rna expression and chemotherapeutic susceptibility using relevance networks,” Proc. Natl. Acad. Sci., 97, 12182.10.1073/pnas.220392197Search in Google Scholar PubMed PubMed Central
Carter, S., C. Brechbühler, M. Griffin and A. Bond (2004): “Gene co-expression network topology provides a framework for molecular characterization of cellular state,” Bioinformatics, 20, 2242–2250.10.1093/bioinformatics/bth234Search in Google Scholar PubMed
Carvalho, C. M., J. Chang, J. E. Lucas, J. R. Nevins, Q. Wang and M. West (2008): “High-dimensional sparse factor modeling: Applications in gene expression genomics,” J. Am. Stat. Assoc., 103, 1438–1456.10.1198/016214508000000869Search in Google Scholar PubMed PubMed Central
Dempster, A., N. Laird and D. Rubin (1977): “Maximum likelihood from incomplete data via the em algorithm,” J. Royal Stat. Soc. B Met., 39, 1–38.10.1111/j.2517-6161.1977.tb01600.xSearch in Google Scholar
Friedman, J., T. Hastie and R. Tibshirani (2008): “Sparse inverse covariance estimation with the graphical lasso,” Biostatistics, 9, 432–441.10.1093/biostatistics/kxm045Search in Google Scholar PubMed PubMed Central
Friedman, J., T. Hastie and R. Tibshirani (2010): “Regularization paths for generalized linear models via coordinate descent,” J. Stat. Softw., 33, 1–22.10.18637/jss.v033.i01Search in Google Scholar
Friguet, C., M. Kloareg and D. Causeur (2009): “A factor model approach to multiple testing under dependence,” J. Am. Stat. Assoc., 104, 1406–1415.10.1198/jasa.2009.tm08332Search in Google Scholar
Goldenberg, A., A.-X. Zheng, S. Fienberg and E.-M. Airoldi (2010): “A survey of statistical network models,” Foundations and Trends in Machine Learning, 2, 129–233.10.1561/2200000005Search in Google Scholar
Harris, M.-A., J. Clark, A. Ireland, J. Lomax, M. Ashburner, R. Foulger, K. Eilbeck, S. Lewis, B. Marshall, C. Mungall, J. Richter, G.-M. Rubin, J.-A. Blake, C. Bult, M. Dolan, H. Drabkin, J.-T. Eppig, D.-P. Hill, L. Ni, M. Ringwald, R. Balakrishnan, J.-M. Cherry, K.-R. Christie, M.-C. Costanzo, S.-S. Dwight, S. Engel, D.-G. Fisk, J.-E. Hirschman, E.-L. Hong, R.-S. Nash, A. Sethuraman, C.-L. Theesfeld, D. Botstein, K. Dolinski, B. Feierbach, T. Berardini, S. Mundodi, S.-Y. Rhee, R. Apweiler, D. Barrell, E. Camon, E. Dimmer, V. Lee, R. Chisholm, P. Gaudet, W. Kibbe, R. Kishore, E.-M. Schwarz, P. Sternberg, M. Gwinn, L. Hannick, J. Wortman, M. Berriman, V. Wood, N. de la Cruz, P. Tonellato, P. Jaiswal, T. Seigfried, R. White and Gene Ontology Consortium (2004): “The gene ontology (go) database and informatics resource,” Nuc. Acids Res., 32, D258.10.1093/nar/gkh036Search in Google Scholar
Jöreskog, K. (1969): “A general approach to confirmatory maximum likelihood factor analysis,” Psychometrika, 34, 183–202.10.1007/BF02289343Search in Google Scholar
Langfelder, P. and S. Horvath (2007): “Eigengene networks for studying the relationships between co-expression modules,” BMC Syst. Biol., 1, 54.10.1186/1752-0509-1-54Search in Google Scholar
Langfelder, P. and S. Horvath (2008): “WGCNA: an R package for weighted correlation network analysis,” BMC Bioinformatics, 9, 559.10.1186/1471-2105-9-559Search in Google Scholar
Langfelder, P., B. Zhang and S. Horvath (2008): “Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for R,” Bioinformatics, 24, 719–720.10.1093/bioinformatics/btm563Search in Google Scholar
Le Mignon, G., C. Désert, F. Pitel, S. Leroux, O. Demeure, G. Guernec, B. Abasht, M. Douaire, P. Le Roy and S. Lagarrigue (2009): “Using transcriptome profiling to characterize qtl regions on chicken chromosome 5,” BMC Genomics, 10, 575.10.1186/1471-2164-10-575Search in Google Scholar
Leek, J. and J. Storey (2007): “Capturing heterogeneity in gene expression studies by surrogate variable analysis,” PLoS Genet., 3, 1724–1735.10.1371/journal.pgen.0030161Search in Google Scholar
Leek, J. and J. Storey (2008): “A general framework for multiple testing dependence,” Proc. Natl. Acad. Sci., 105, 18718.10.1073/pnas.0808709105Search in Google Scholar
Miettinen, T. and H. Gylling (2000): “Cholesterol absorption efficiency and sterol metabolism in obesity,” Atherosclerosis, 153, 241–248.10.1016/S0021-9150(00)00404-4Search in Google Scholar
Rand, W. (1971): “Objective criteria for the evaluation of clustering methods,” J. Am. Stat. Assoc., 66, 846–850.10.1080/01621459.1971.10482356Search in Google Scholar
Rubin, D. and D. Thayer (1982): “Em algorithms for ml factor analysis,” Psychometrika, 47, 69–76.10.1007/BF02293851Search in Google Scholar
Schäfer, J. and K. Strimmer (2005): “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics,” Stat. Appl. Genet. Mol. Biology, 4.10.2202/1544-6115.1175Search in Google Scholar PubMed
Stuart, J., E. Segal, D. Koller and S. Kim (2003): “A gene-coexpression network for global discovery of conserved genetic modules,” Science, 302, 249–255.10.1126/science.1087447Search in Google Scholar PubMed
Sun, Y., N.-R. Zhang and A.-B. Owen (2012): “Multiple hypothesis testing adjusted for latent variables, with an application to the AGEMAP gene expression data,” Ann. Appl. Stat., 6, 1664–1688.10.1214/12-AOAS561Search in Google Scholar
Swierczynski, J., L. Zabrocka, E. Goyke, S. Raczynska, W. Adamonis and Z. Sledzinski (2003): “Enhanced glycerol 3-phosphate dehydrogenase activity in adipose tissue of obese humans,” Mol. Cell. Biochem., 254, 55–59.10.1023/A:1027332523114Search in Google Scholar
Witten, D., R. Tibshirani and T. Hastie (2009): “A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis,” Biostatistics, 10, 515–534.10.1093/biostatistics/kxp008Search in Google Scholar PubMed PubMed Central
Woodbury, M. (1950): “Inverting modified matrices,” Memorandum report, 42, 106.Search in Google Scholar
Wu, C., J. Kang, L. Peng, H. Li, S. Khan, C. Hillard, D. Okar and A. Lange (2005): “Enhancing hepatic glycolysis reduces obesity: differential effects on lipogenesis depend on site of glycolytic modulation,” Cell Metab., 2, 131–140.10.1016/j.cmet.2005.07.003Search in Google Scholar PubMed
Wu, T. and K. Lange (2008): “Coordinate descent algorithms for lasso penalized regression,” Ann. Appl. Stat., 2, 224–244.10.1214/07-AOAS147Search in Google Scholar
Zhang, B. and S. Horvath (2005): “A general framework for weighted gene co-expression network analysis,” Stat. Appl. Genet. Mol. Biol., 4, 1128.10.2202/1544-6115.1128Search in Google Scholar PubMed
©2016 by De Gruyter
Articles in the same Issue
- Frontmatter
- Research Articles
- Model selection for factorial Gaussian graphical models with an application to dynamic regulatory networks
- Testing differentially expressed genes in dose-response studies and with ordinal phenotypes
- Differential methylation tests of regulatory regions
- Sparse factor model for co-expression networks with an application using prior biological knowledge
Articles in the same Issue
- Frontmatter
- Research Articles
- Model selection for factorial Gaussian graphical models with an application to dynamic regulatory networks
- Testing differentially expressed genes in dose-response studies and with ordinal phenotypes
- Differential methylation tests of regulatory regions
- Sparse factor model for co-expression networks with an application using prior biological knowledge