Abstract
There has been much interest in reconstructing bi-directional regulatory networks linking the circadian clock to metabolism in plants. A variety of reverse engineering methods from machine learning and computational statistics have been proposed and evaluated. The emphasis of the present paper is on combining models in a model ensemble to boost the network reconstruction accuracy, and to explore various model combination strategies to maximize the improvement. Our results demonstrate that a rich ensemble of predictors outperforms the best individual model, even if the ensemble includes poor predictors with inferior individual reconstruction accuracy. For our application to metabolomic and transcriptomic time series from various mutagenesis plants grown in different light-dark cycles we also show how to determine the optimal time lag between interactions, and we identify significant interactions with a randomization test. Our study predicts new statistically significant interactions between circadian clock genes and metabolites in Arabidopsis thaliana, and thus provides independent statistical evidence that the regulation of metabolism by the circadian clock is not uni-directional, but that there is a statistically significant feedback mechanism aiming from metabolism back to the circadian clock.
Acknowledgments
The work described in the present article is part of the TiMet project on linking the circadian clock to metabolism in plants. TiMet is a collaborative project (Grant Agreement 245143) funded by the European Commission FP7, in response to call FP7-KBBE-2009-3. A.A. is supported by the TiMet project. Parts of the work were done while M.G. was supported by the German Research Foundation (DFG), research grant GR3853/1-1. We would like to thank Catherine Higham for helpful discussions.
Appendix
Marginal likelihood for time delays
In the hierarchical Bayesian regression (HBR) model, described in Subsection 2.5 of Aderhold et al. (2014), the target observations yg are modelled independently for each target g (g=1, …, G):
where wg are the regression parameter vectors, determined by the sets of regulators (covariates) πg,
where δg is the target-specific SNR-hyperparameter. The noise variance parameters
As the targets g=1, …, G are modelled independently and the overall network structure ℳ is determined by the individual regulator sets, symbolically ℳ=(π1, …, πG), the joint marginal likelihood has a modular form:
For a given regulator set πg and a fixed SNR-hyperparameter δg marginalizing the HBR likelihood over the regression parameters and the noise variances:
yields a closed-form solution, see eq (15) in Aderhold et al. (2014).[6] It then follows from eqns. (15–16):
The prior, p(πg), on the regulator sets, πg, in the HBR model is assumed to be a uniform distribution subject to a constraint on the maximal cardinality ℱ of the number of regulators [see Grzegorczyk and Husmeier (2012) and Aderhold et al. (2014)]. We thus obtain:
where Tg is the number of all valid regulator sets. Hence, if there are N potential regulators for a target g, then the number of valid parent sets Tg is given by:
so that Tg grows polynomially with the power of ℱ.[7] If the inner sums can be computed by full-enumeration, the marginal likelihood can be estimated by repeatedly sampling δg (g=1, …, G) from their inverse Gamma prior distributions and computing the following Monte-Carlo estimator:
where
To get an idea of the approximation error of our marginal likelihood estimator, we consider J independent Monte-Carlo estimators
In our application the data set D consists of a set of individual time series. When the network interactions are modelled subject to a time lag corresponding to τ data points, then the first τ target observations have to be withdrawn from each time series.[9] For a fair comparison among different time lags τ we first choose a maximal lag τMAX, and we withdraw the first τMAX observations from all time series. Subsequently, the remaining target observations yg (g=1, …, G) can be used for all lags τ∈{0, …, τMAX}, i.e., this approach ensures that the target observations do not differ from τ to τ and that exactly the same target observations have to be modelled for all lags τ.[10]
Detailed simulation results
For our performance evaluation on the simulated data described in Section 4.1, we were running hundreds of simulations for a variety of different settings, related to the observation status of the molecular components (mRNA only versus mRNAs and proteins), the method for derivative estimation (described in Section 3.1), the regulatory network structure (shown in Figure 1), and the method applied for learning this structure from data (reviewed in Sections 3.2 and 3.3). The results from these studies are shown in Figures 12–14. These results are complex, and patterns are not easily discernible by visual inspection. This has motivated the application of the ANOVA scheme described in Section 3.5.

Detailed AUROC scores: coarse gradient.
The boxplots show a fraction of the results obtained from the simulated data described in Section 4.1. The AUROC scores were obtained from the coarse response gradients with 2-h intervals. The corresponding results for the fine gradient and the Gaussian process interpolation are displayed in Figures 13, 14. Left panel: Incomplete data, with mRNA but no protein concentrations. Right panel: Complete data that include both protein and mRNA concentrations. Each panel contains six subpanels, representing the six different network topologies shown in Figure 1.
References
Aderhold, A., D. Husmeier and M. Grzegorczyk (2014): “Statistical inference of regulatory networks for circadian regulation,” Stat. Appl. Genet. Mol. Biol., (SAGMB), 13, 227–273.Suche in Google Scholar
Ahmed, A. and E. P. Xing (2009): “Recovering time-varying networks of dependencies in social and biological studies,” Proc. Natl. Acad. Sci., 106, 11878–11883.Suche in Google Scholar
Äijö, T. and H. Lähdesmäki (2009): “Learning gene regulatory networks from gene expression measurements using non-parametric molecular kinetics,” Bioinformatics, 25, 2937–2944.10.1093/bioinformatics/btp511Suche in Google Scholar PubMed
Barenco, M., D. Tomescu, D. Brewer, R. Callard, J. Stark and M. Hubank (2006): “Ranked prediction of p53 targets using hidden variable dynamic modeling,” Genome Biol., 7, R25.Suche in Google Scholar
Beal, M. (2003): Variational algorithms for approximate bayesian inference, Ph.D. thesis, Gatsby Computational Neuroscience Unit, University College London, UK.Suche in Google Scholar
Beal, M., F. Falciani, Z. Ghahramani, C. Rangel and D. Wild (2005): “A Bayesian approach to reconstructing genetic regulatory networks with hidden factors,” Bioinformatics, 21, 349–356.10.1093/bioinformatics/bti014Suche in Google Scholar PubMed
Bläsing, O. E., Y. Gibon, M. Günther, M. Höhne, R. Morcuende, D. Osuna, O. Thimm, B. Usadel, W.-R. Scheible and M. Stitt (2005): “Sugars and circadian regulation make major contributions to the global regulation of diurnal gene expression in arabidopsis,” The Plant Cell Online, 17, 3257–3281.10.1105/tpc.105.035261Suche in Google Scholar PubMed PubMed Central
Brandt, S. (1999): Data analysis: statistical and computational methods for scientists and engineers, Springer: New York, USA.Suche in Google Scholar
Chevaleyre, Y., U. Endriss, J. Lang and N. Maudet (2007): A short introduction to computational social choice, Springer.10.1007/978-3-540-69507-3_4Suche in Google Scholar
Chib, S. and I. Jeliazkov (2001): “Marginal likelihood from the Metropolis-Hastings output,” J. Am. Stat. Assoc., 96, 270–281.Suche in Google Scholar
Ciocchetta, F. and J. Hillston (2009): “Bio-PEPA: A framework for the modelling and analysis of biological systems,” Theor. Comput. Sci., 410, 3065–3084.Suche in Google Scholar
Dalchau, N., S. J. Baek, H. M. Briggs, F. C. Robertson, A. N. Dodd, M. J. Gardner, M. A. Stancombe, M. J. Haydon, G.-B. Stan, J. M. Gonçalves, A. A. Webb (2011): “The circadian oscillator gene gigantea mediates a long-term response of the arabidopsis thaliana circadian clock to sucrose,” Proc. Natl. Acad. Sci., 108, 5104–5109.Suche in Google Scholar
Davies, J. and M. Goadrich (2006): “The relationship between Precision-Recall and ROC curves,” Proc. 23rd Int. Conf. Mach. learn., 233–240.Suche in Google Scholar
Feugier, F. and A. Satake (2012): “Dynamical feedback between circadian clock and sucrose availability explains adaptive response of starch metabolism to various photoperiods,” Frontiers in Plant Science, 3.10.3389/fpls.2012.00305Suche in Google Scholar PubMed PubMed Central
Flis, A., P. Fernandez, T. Zielinski, R. Sulpice, A. Pokhilko, H. G. McWatters, A. J. Millar, M. Stitt and K. J. Halliday (2013): “Biological regulation identified by sharing timeseries data outside the ’omics,” Submitted.Suche in Google Scholar
Friedman, N., M. Linial, I. Nachman and D. Pe’er (2000): “Using Bayesian networks to analyze expression data,” J. Comput. Biol., 7, 601–620.Suche in Google Scholar
Fukushima, A., M. Kusano, N. Nakamichi, M. Kobayashi, N. Hayashi, H. Sakakibara, T. Mizuno and K. Saito (2009): “Impact of clock-associated arabidopsis pseudo-response regulators in metabolic coordination,” Proc. Natl. Acad. Sci., 106, 7251–7256.Suche in Google Scholar
Geigenberger, P. (2011): “Regulation of starch biosynthesis in response to a fluctuating environment,” Plant Physiol., 155, 1566–1577.Suche in Google Scholar
Geiger, D. and D. Heckerman (1994): “Learning gaussian networks,” In: International Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, pp. 235–243.Suche in Google Scholar
Gille, S., K. Cheng, M. E. Skinner, A. H. Liepman, C. G. Wilkerson and M. Pauly (2011): “Deep sequencing of voodoo lily (Amorphophallus konjac): an approach to identify relevant genes involved in the synthesis of the hemicellulose glucomannan,” Planta, 234, 515–526.10.1007/s00425-011-1422-zSuche in Google Scholar PubMed PubMed Central
Graf, A., A. Schlereth, M. Stitt and A. M. Smith (2010): “Circadian control of carbohydrate availability for growth in Arabidopsis plants at night,” Proc. Natl. Acad. Sci., 107, 9458–9463.Suche in Google Scholar
Grzegorczyk, M. and D. Husmeier (2012): “A non-homogeneous dynamic Bayesian network with sequentially coupled interaction parameters for applications in systems and synthetic biology,” Stat. Appl. Genet. Mol. Biol. (SAGMB), 11, article 7.10.1515/1544-6115.1761Suche in Google Scholar PubMed
Grzegorczyk, M. and D. Husmeier (2013): “Regularization of non-homogeneous dynamic Bayesian networks with global information-coupling based on hierarchical Bayesian models,” Mach. Learn., 1–50.Suche in Google Scholar
Guerriero, M., A. Pokhilko, A. Fernández, K. Halliday, A. Millar and J. Hillston (2012): “Stochastic properties of the plant circadian clock,” J. R. Soc. Interf., 9, 744–756.Suche in Google Scholar
Hanley, J. A. and B. J. McNeil (1982): “The meaning and use of the area under a receiver operating characteristic (ROC) curve,” Radiology, 143, 29–36.10.1148/radiology.143.1.7063747Suche in Google Scholar PubMed
Hastie, T., R. Tibshirani and J. J. H. Friedman (2001): The elements of statistical learning, volume 1, Springer New York.10.1007/978-0-387-21606-5_1Suche in Google Scholar
Haydon, M. J., O. Mielczarek, F. C. Robertson, K. E. Hubbard and A. A. Webb (2013): “Photosynthetic entrainment of the arabidopsis thaliana circadian clock,” Nature, 502, 689–692.10.1038/nature12603Suche in Google Scholar PubMed PubMed Central
Herrero, E., E. Kolmos, N. Bujdoso, Y. Yuan, M. Wang, M. C. Berns, H. Uhlworm, G. Coupland, R. Saini, M. Jaskolski, A. Webb, J. Gonçalves and S. J. Davis (2012): “EARLY FLOWERING4 recruitment of EARLY FLOWERING3 in the nucleus sustains the Arabidopsis circadian clock,” Plant Cell Online, 24, 428–443.10.1105/tpc.111.093807Suche in Google Scholar PubMed PubMed Central
Kalaitzis, A. A., A. Honkela, P. Gao and N. D. Lawrence (2013): gptk: Gaussian processes tool-kit, URL http://CRAN.R-project.org/package=gptk, R package version 1.06.Suche in Google Scholar
Kikis, E. A., R. Khanna and P. H. Quail (2005): “ELF4 is a phytochrome-regulated component of a negative-feedback loop involving the central oscillator components CCA1 and LHY,” The Plant Journal, 44, 300–313.10.1111/j.1365-313X.2005.02531.xSuche in Google Scholar PubMed
Ko, Y., C. Zhai and S. Rodriguez-Zas (2009): “Inference of gene pathways using mixture Bayesian networks,” BMC Systems Biol., 3, 54.Suche in Google Scholar
Kolmos, E., M. Nowak, M. Werner, K. Fischer, G. Schwarz, S. Mathews, H. Schoof, F. Nagy, J. M. Bujnicki and S. J. Davis (2009): “Integrating ELF4 into the circadian system through combined structural and functional studies,” HFSP J., 3, 350–366.Suche in Google Scholar
Kuncheva, L. I. (2004): Combining pattern classifiers: methods and algorithms, John Wiley & Sons, Inc., Hoboken, NJ.10.1002/0471660264Suche in Google Scholar
Lawrence, N. D., M. Girolami, M. Rattray and G. Sanguinetti (2010): Learning and inference in computational systems biology, MIT Press, Cambridge, MA.Suche in Google Scholar
Locke, J. C. W., L. Kozma-Bognár, P. D. Gould, B. Fehér, E. Kevei, F. Nagy, M. S. Turner, A. Hall and A. J. Millar (2006): “Experimental validation of a predicted feedback loop in the multi-oscillator clock of Arabidopsis thaliana,” Mol. Systems Biol., 2:59.Suche in Google Scholar
Marbach, D., J. C. Costello, R. Küffner, N. M. Vega, R. J. Prill, D. M. Camacho, K. R. Allison, M. Kellis, J. J. Collins, G. Stolovitzky, et al. (2012): “Wisdom of crowds for robust gene network inference,” Nat. Methods, 9, 796–804.Suche in Google Scholar
Margolin, A. A., I. Nemenman, K. Basso, C. Wiggins, G. Stolovitzky, R. Dalla-Favera and A. Califano (2006): “ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context,” BMC Bioinformatics, 7 (Suppl. 1).10.1186/1471-2105-7-S1-S7Suche in Google Scholar PubMed PubMed Central
Morrissey, E. R., M. A. Juárez, K. J. Denby and N. J. Burroughs (2011): “Inferring the time-invariant topology of a nonlinear sparse gene regulatory network using fully Bayesian spline autoregression,” Biostatistics, 12, 682–694.10.1093/biostatistics/kxr009Suche in Google Scholar PubMed
Pokhilko, A., S. Hodge, K. Stratford, K. Knox, K. Edwards, A. Thomson, T. Mizuno and A. Millar (2010): “Data assimilation constrains new connections and components in a complex, eukaryotic circadian clock model,” Mol. Systems Biol., 6:416.Suche in Google Scholar
Pokhilko, A., A. Fernández, K. Edwards, M. Southern, K. Halliday and A. Millar (2012): “The clock gene circuit in Arabidopsis includes a repressilator with additional feedback loops,” Mol. Systems Biol., 8, 574.Suche in Google Scholar
Pokhilko, A., P. Mas and A. J. Millar (2013): “Modelling the widespread effects of TOC1 signalling on the plant circadian clock and its outputs,” BMC Systems Biol., 7, 1–12.Suche in Google Scholar
Polikar, R. (2006): “Ensemble based systems in decision making,” Circuits and Systems Magazine, IEEE, 6, 21–45.10.1109/MCAS.2006.1688199Suche in Google Scholar
Rasmussen, C. E. (1996): Evaluation of Gaussian processes and other methods for non-linear regression, Ph.D. thesis, Graduate Department of Computer Science, University of Toronto, Canada.Suche in Google Scholar
Rasmussen, C. E., R. M. Neal, G. E. Hinton, D. van Camp, M. Revow, Z. Ghahramani, R. Kustra and R. Tibshirani (1996): “The DELVE manual,” URL http://www.cs.toronto.edu/delve.Suche in Google Scholar
Rogers, S. and M. Girolami (2005): “A Bayesian regression approach to the inference of regulatory networks from gene expression data,” Bioinformatics, 21, 3131–3137.10.1093/bioinformatics/bti487Suche in Google Scholar PubMed
Schäfer, J. and K. Strimmer (2005): “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics,” Stat. Appl. Genom. Mol. Biol., 4(1), Article 32.Suche in Google Scholar
Solak, E., R. Murray-Smith, W. E. Leithead, D. J. Leith and C. E. Rasmussen (2003): “Derivative observations in Gaussian process models of dynamic systems,” Advances in Neural Information Processing Systems, 15, (Becker, S., Thrun, S., and Obermayer, K. Eds.) MIT Press, 1033–1040.Suche in Google Scholar
Tibshirani, R. (1995): “Regression shrinkage and selection via the Lasso,” J. R. Stat. Soc.. Series B (Methodological), 58, 267–288.10.1111/j.2517-6161.1996.tb02080.xSuche in Google Scholar
Wang, R., K. Guegler, S. T. LaBrie and N. M. Crawford (2000): “Genomic analysis of a nutrient response in arabidopsis reveals diverse expression patterns and novel metabolic and potential regulatory genes induced by nitrate,” The Plant Cell Online, 12, 1491–1509.10.1105/tpc.12.8.1491Suche in Google Scholar PubMed PubMed Central
©2015 by De Gruyter
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Study of triplet periodicity differences inside and between genomes
- H-CLAP: hierarchical clustering within a linear array with an application in genetics
- Inferring bi-directional interactions between circadian clock genes and metabolism with model ensembles
- Bayesian inference for Markov jump processes with informative observations
- Likelihood free inference for Markov processes: a comparison
- Spatio-temporal model for multiple ChIP-seq experiments
- Software and Application Note
- GenePEN: analysis of network activity alterations in complex diseases via the pairwise elastic net
- Corrigendum
- Corrigendum to: Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Study of triplet periodicity differences inside and between genomes
- H-CLAP: hierarchical clustering within a linear array with an application in genetics
- Inferring bi-directional interactions between circadian clock genes and metabolism with model ensembles
- Bayesian inference for Markov jump processes with informative observations
- Likelihood free inference for Markov processes: a comparison
- Spatio-temporal model for multiple ChIP-seq experiments
- Software and Application Note
- GenePEN: analysis of network activity alterations in complex diseases via the pairwise elastic net
- Corrigendum
- Corrigendum to: Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions