Inference of genetic regulatory networks with regulatory hubs using vector autoregressions and automatic relevance determination with model selections

Chi-Kan Chen

doi:10.1515/sagmb-2020-0054

Artikel

Inference of genetic regulatory networks with regulatory hubs using vector autoregressions and automatic relevance determination with model selections

Chi-Kan Chen

Veröffentlicht/Copyright: 28. Dezember 2021

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Statistical Applications in Genetics and Molecular Biology Band 20 Heft 4-6

Abstract

The inference of genetic regulatory networks (GRNs) reveals how genes interact with each other. A few genes can regulate many genes as targets to control cell functions. We present new methods based on the order-1 vector autoregression (VAR1) for inferring GRNs from gene expression time series. The methods use the automatic relevance determination (ARD) to incorporate the regulatory hub structure into the estimation of VAR1 in a Bayesian framework. Several sparse approximation schemes are applied to the estimated regression weights or VAR1 model to generate the sparse weighted adjacency matrices representing the inferred GRNs. We apply the proposed and several widespread reference methods to infer GRNs with up to 100 genes using simulated, DREAM4 in silico and experimental E. coli gene expression time series. We show that the proposed methods are efficient on simulated hub GRNs and scale-free GRNs using short time series simulated by VAR1s and outperform reference methods on small-scale DREAM4 in silico GRNs and E. coli GRNs. They can utilize the known major regulatory hubs to improve the performance on larger DREAM4 in silico GRNs and E. coli GRNs. The impact of nonlinear time series data on the performance of proposed methods is discussed.

Keywords: ARD; gene expression time series; GRN inference; model selection; regulatory hub; VAR1

Corresponding author: Chi-Kan Chen, Department of Applied Mathematics, National Chung Hsing University, 145 Xingda Rd., South District, Taichung City, Taiwan, ROC, E-mail: cchen@nchu.edu.tw

Author contribution: The author has accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: None declared.
Conflict of interest statement: The author declare no conflicts of interest regarding this article.

Appendix

A.1 Estimation of MR by the ARD

We first present the conditional posterior probability density function and marginal log-likelihood function of parameters. From the expressions in Eqs. (2) and (3),

(A1) L W , β | Y , Z π W | α = ( 2 π ) − N p + p 2 2 β N p 2 D α p 2 exp − 1 2 t r β ( Y − Z W ) ( Y − Z W ) T − 1 2 t r D α W W T .

By completing the squares of terms in the exponent of exponential form in Eq. (A1), the conditional posterior over W is a matrix Gaussian distribution with the probability density function

(A2) π W | Y , Z , α , β = ( 2 π ) − p 2 2 β Z T Z + D α p 2 exp − 1 2 t r β Z T Z + D α ( W − W ̄ ) ( W − W ̄ ) T ,

where W ̄ = Z T Z + D α / β − 1 Z T Y is the mean/mode matrix, and the marginal likelihood function of α, β is expressed as

(A3) L α , β | Y , Z = ( 2 π ) − N p 2 β N p 2 D α p 2 ⋅ β Z T Z + D α − p 2 exp − 1 2 t r ( β ( Y − Z W ) ( Y − Z W ) T ) − 1 2 t r D α W ̄ W ̄ T .

The partial derivatives of ln ⁡ L α , β | Y , Z with respect to parameters are expressed as

(A4) ∂ ∂ α k ′ ln ⁡ L α , β | Y , Z = p 2 α k ′ − p 2 β Z T Z + D α − 1 [ k ′ , k ′ ] − 1 2 ∑ k = 1 p W ̄ [ k ′ , k ] 2 ,

(A5) ∂ ∂ β ln ⁡ L α , β | Y , Z = N p 2 β − p 2 t r ( β Z T Z + D α − 1 Z T Z ) − 1 2 t r ( ( Y − Z W ̄ ) ( Y − Z W ̄ ) T ) .

The partial derivatives of ln π(α), ln π(β) with respect to parameters are expressed as

(A6) ∂ ∂ α k ′ ln π α = s α k ′ − 1 α k ′ − r α k ′ , ∂ ∂ β ln π β = s β − 1 β − r β ,

where π(α), π(β) are in Eq. (4).

The update rules of α, β of iterative process for solving the maximization problem in Eq. (6) are obtained by setting the partial derivatives of Λ(α, β) with respect to parameters in α, β equal 0, respectively. W ̄ in L α , β | Y , Z is recomputed with α, β given by the updated values. Assume the tuning parameter γ = (s _α, r _α, s _β, r _β) is specified. Assume (α ₀, β ₀) is the initial value of (α, β) and δ > 0 is the stopping tolerance. Let S = Z ^T Z, C = Z ^T Y. The pseudocode of Algorithm 1 for solving the maximization problem in Eq. (7) is as follows.

Algorithm 1:

Estimation of MR by the ARD

Initial values:

δ′ ← −∞

(α, β) ← (α ₀, β ₀)

W ̄ ← ( S + D α / β ) − 1 C

δ ^” ←Λ(α, β)

Iterative updates:

while δ ^” − δ′ > δ do

δ′ ← δ″

α k ′ ← p + 2 s α k ′ − 2 p β S + D α − 1 k ′ , k ′ + ∑ k = 1 p W ̄ k ′ , k 2 + 2 r α k ′ , k ′ = 1 , … , p

β ← N p + 2 s β − 2 p t r β S + D α − 1 S + t r Y − Z W ̄ Y − Z W ̄ T + 2 r β

W ̄ ← ( S + D α / β ) − 1 C

δ″ ←Λ(α, β)

end while

return ( α ̃ , β ̃ , W ̃ ) ← ( α , β , W ̄ )

A.2 Estimation of GEN

The minimization problem in Eq. (10) is equivalent to

(A7) ( W ̃ λ , V ̃ ) = argmin W , V ∈ R p × p Y − Z W 2,2 2 + D λ W 2,2 2 + 2 λ V 1,1 s u b j e c t t o W − V = 0 ,

where λ = ( α ̃ / β ̃ ) 1 / 2 ∈ R + p × 1 . The scaled augmented Lagrangian associated with the constrained minimization problem is given by

(A8) L ρ W , V , V ′ = Y − Z W 2,2 2 + D λ W 2,2 2 + 2 λ V 1,1 + ρ W − V + V ′ 2,2 2 − ρ V ′ 2,2 2 ,

where ρ > 0 is the penalty constant and V ′ ∈ R p × p contains the scaled dual variables. The ADMM updates the values of W, V, V′ iteratively from the initial values to find W ̃ λ , V ̃ . The update rules of W, V are derived by setting the partial derivatives of L ρ W , V , V ′ to 0 matrices. The update rule of V′ is the one-step gradient ascent. Assume W 0 , V 0 , V 0 ′ is the initial value of W , V , V ′ , δ > 0 the stopping tolerance and W 0 − V 0 2,2 > δ . The pseudocode of Algorithm 2 for solving the constrained minimization problem in Eq. (A7) is as follows.

Algorithm 2:

Estimation of GEN

Initial value:

W , V , V ′ ← ( W 0 , V 0 , V 0 ′ )

Iterative updates:

while W − V 2,2 > δ do

W ← ( S + D λ + ρ I p × p ) − 1 ( C + ρ V − ρ V ′ )

V ← S 2 λ / ρ W + V ′

V′ ←V′ + η ⋅ (W − V)

end while

return W ̃ λ , V ̃ , V ′ ̃ ← ( W , V , V ′ )

Here, η > 0 is the step size, S _d(⋅) is entry-wise soft thresholding function with the threshold d > 0. The soft thresholding function of w ∈ R with the threshold d is given by

(A9) s d w = w − d i f d < w 0 i f − d ≤ w ≤ d ⋅ w + d i f w < − d

A.3 Numerical implementations of GRN inference

To generate the candidate solutions for (α, β, W), the tuning parameter space Γ contains the 256 combinations of s _α = 2.5, 5, 7.5, 10 × 1_p×1, r _α = 10⁻³, 10⁻², 10⁻¹, 1 × 1_p×1, s _β = 0.5, 1, 1.5, 2, r _β = 10⁻³, 10⁻², 10⁻¹, 1. The initial value and stopping tolerance of Algorithm I are α 0 , β 0 = ( 1 p × 1 , 1 ) , δ = 10⁻³, respectively. To generate sparse weighted adjacency matrices using α ̃ , β ̃ , W ̃ , the AVE performs Algorithm II with η = 0.1, ρ = 1, the initial value W 0 , V 0 , V 0 ′ = ( 0 p × p , 1 p × p , 0 p × p ) and stopping tolerance δ = 10⁻³ to solve for W ̃ λ in Eq. (12). The AVT uses R package igraph (Csardi and Nepusz) to solve for A ̃ t in Eq. (13). The AVM generates Π from W ̃ with τ′ = 6 and uses R package minet (Meyer, Lafitte, and Bontemp 2008) implementing the MRMR method to generate T ̃ m from T with entries defined in Eq. (15). The values of ɛ, λ controlling the levels of sparsity of W ̃ ε , W ̃ λ generated by AVH, AVE respectively, and the numbers of nonzero entries in W ̃ t ̂ , W ̃ m ̂ generated by AVT, AVM, respectively, are set in numerical experiments.

References

Aalto, A., Viitasaari, L., Ilmonen, P., Mombaerts, L., and Goncalves, J. (2020). Gene regulatory network inference from sparsely sampled noisy data. Nat. Commun. 11: 3493. https://doi.org/10.1038/s41467-020-17217-1.Suche in Google Scholar PubMed PubMed Central

Albert, R. (2005). Scale-free networks in cell biology. J. Cell Sci. 118: 4947–4957. https://doi.org/10.1242/jcs.02714.Suche in Google Scholar PubMed

Alexa, A., Rahnenfuhrer, J., and Lengauer, T. (2006). Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22: 1600–1607. https://doi.org/10.1093/bioinformatics/btl140.Suche in Google Scholar PubMed

Bansal, M., Belcastro, V., Ambesi-Impiombato, A., and di Bernardo, D. (2007). How to infer gene networks from expression profiles. Mol. Syst. Biol. 3: 78. https://doi.org/10.1038/msb4100120.Suche in Google Scholar PubMed PubMed Central

Barabasi, A.-L. and Oltvai, Z.N. (2004). Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5: 101–113. https://doi.org/10.1038/nrg1272.Suche in Google Scholar PubMed

Bock, M., Ogishima, S., Tanaka, H., Kramer, S., and Kaderali, L. (2012). Hub-centered gene network reconstruction using automatic relevance determination. PLoS One 7: e35077. https://doi.org/10.1371/journal.pone.0035077.Suche in Google Scholar PubMed PubMed Central

Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2010). Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3: 1–122. https://doi.org/10.1561/2200000016.Suche in Google Scholar

Butte, A.J. and Kohane, I.S. (2000). Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac. Symp. Biocomput. 418–429. https://doi.org/10.1142/9789814447331_0040.Suche in Google Scholar PubMed

Charbonnier, C., Chiquet, J., and Ambroise, C. (2010). Weighted-LASSO for structured network inference from time course data. Stat. Appl. Genet. Mol. Biol. 9: 15. https://doi.org/10.2202/1544-6115.1519.Suche in Google Scholar PubMed

Che, D., Guo, S., Jiang, Q., and Chen, L. (2020). PFBNet: a priori-fused boosting method for gene regulatory network inference. BMC Bioinf. 21: 308. https://doi.org/10.1186/s12859-020-03639-7.Suche in Google Scholar PubMed PubMed Central

Chen, G., Larsen, P., Almasri, E., and Dai, Y. (2008). Rank-based edge reconstruction for scale-free genetic regulatory networks. BMC Bioinf. 9: 75. https://doi.org/10.1186/1471-2105-9-75.Suche in Google Scholar

Cornish, A.J. and Markowetz, F. (2014). SANTA: quantifying the functional content of molecular networks. PLoS Comput. Biol. 10: e1003808. https://doi.org/10.1371/journal.pcbi.1003808.Suche in Google Scholar

Csardi, G. and Nepusz, T. (2006). The igraph software package for complex network research. Int. J. Complex Syst.: 1695.Suche in Google Scholar

D’haeseleer, P., Wen, X., Fuhrman, S., and Somogyi, R. (1998). Mining the gene expression matrix: Inferring gene relationships from large scale gene expression data. Boston, MA: Springer.10.1007/978-1-4615-5345-8_22Suche in Google Scholar

Edgar, R., Domrachev, M., and Lash, A.E. (2002). Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30: 207–210. https://doi.org/10.1093/nar/30.1.207.Suche in Google Scholar

Emmert-Streib, F., Dehmer, M., and Haibe-Kains, B. (2014). Gene regulatory networks and their applications: understanding biological and medical problems in terms of networks. Front. Cell Dev. Biol. 2: 38. https://doi.org/10.3389/fcell.2014.00038.Suche in Google Scholar

Freeman, L.C. (1979). Centrality in social networks conceptual clarification. Soc. Network. 1: 215–239.10.1016/0378-8733(78)90021-7Suche in Google Scholar

Friedman, N., Linial, M., Nachman, I., and Pe’er, D. (2000). Using Bayesian networks to analyze expression data. Comput. Biol. Med. 7: 601–620. https://doi.org/10.1089/106652700750050961.Suche in Google Scholar PubMed

Fujita, A., Sato, J.R., Garay-Malpartida, H.M., Yamaguchi, R., Miyano, S., Sogayar, M.C., and Ferreira, C.E. (2007). Modeling gene expression regulatory networks with the sparse vector autoregressive model. BMC Syst. Biol. 1: 39. https://doi.org/10.1186/1752-0509-1-39.Suche in Google Scholar PubMed PubMed Central

Gama-Castro, S., Salgado, H., Santos-Zavaleta, A., Ledezma-Tejeida, D., Muniz-Rascado, L., Garcia-Sotelo, J.S., Alquicira-Hernandez, K., Martinez-Flores, I., Pannier, L., Castro-Mondragon, J.A., et al.. (2016). RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond. Nucleic Acids Res. 44: D133–D143. https://doi.org/10.1093/nar/gkv1156.Suche in Google Scholar PubMed PubMed Central

Granger, C.W.J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37: 424–438. https://doi.org/10.2307/1912791.Suche in Google Scholar

Hecker, M., Lambeck, S., Toepfer, S., van Someren, E., and Guthke, R. (2009). Gene regulatory network inference: data integration in dynamic models-a review. BioSystems 96: 86–103. https://doi.org/10.1016/j.biosystems.2008.12.004.Suche in Google Scholar PubMed

Hube, W., Carey, V.J., Gentleman, R., and Anders, S. (2015). Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12: 115–121. https://doi.org/10.1038/nmeth.3252.Suche in Google Scholar PubMed PubMed Central

Huynh-Thu, V.A. and Geurts, P. (2018). dynGENIE3: dynamical GENIE3 for the inference of gene networks from time series expression data. Sci. Rep. 8: 3384. https://doi.org/10.1038/s41598-018-21715-0.Suche in Google Scholar PubMed PubMed Central

Huynh-Thu, V.A., Irrthum, A., Wehenkel, L., and Geurts, P. (2010). Inferring regulatory networks from expression data using tree-based methods. PLoS One 5: e12776. https://doi.org/10.1371/journal.pone.0012776.Suche in Google Scholar PubMed PubMed Central

Kaderali, L., Zander, T., Faigle, U., Wolf, J., Schultze, J.L., and Schrader, R. (2006). CASPAR: a hierarchical Bayesian approach to predict survival times in cancer from gene expression data. Bioinformatics 22: 1495–1502. https://doi.org/10.1093/bioinformatics/btl103.Suche in Google Scholar PubMed

Kramer, N., Schafer, J., and Boulesteix, A.L. (2009). Regularized estimation of large-scale gene association networks using graphical Gaussian models. BMC Bioinf. 10: 384. https://doi.org/10.1186/1471-2105-10-384.Suche in Google Scholar PubMed PubMed Central

Langfelder, P. and Horvath, S. (2007). Eigengene networks for studying the relationships between co-expression modules. BMC Syst. Biol. 1: 54. https://doi.org/10.1186/1752-0509-1-54.Suche in Google Scholar PubMed PubMed Central

Langfelder, P. and Horvath, S. (2008). WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 9: 559. https://doi.org/10.1186/1471-2105-9-559.Suche in Google Scholar PubMed PubMed Central

Lo, K., Raftery, A.E., Dombek, K.M., Zhu, J., Schadt, E.E., Bumgarner, R.E., and Yeung, K.Y. (2012). Integrating external biological knowledge in the construction of regulatory networks from time-series expression data. BMC Syst. Biol. 6: 101. https://doi.org/10.1186/1752-0509-6-101.Suche in Google Scholar PubMed PubMed Central

Marbach, D., Schaffter, T., Mattiussi, C., and Floreano, D. (2009). Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J. Comput. Biol. 16: 229–239. https://doi.org/10.1089/cmb.2008.09tt.Suche in Google Scholar PubMed

Margolin, A.A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Dalla Favera, R., and Califano, A. (2006). ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinf. 7: S7. https://doi.org/10.1186/1471-2105-7-S1-S7.Suche in Google Scholar PubMed PubMed Central

Meyer, P.E., Kontos, K., Lafitte, F., and Bontempi, G. (2007). Information-theoretic inference of large transcriptional regulatory networks. EURASIP J. Bioinf. Syst. Biol. 79879. https://doi.org/10.1155/2007/79879.Suche in Google Scholar PubMed PubMed Central

Meyer, P.E., Lafitte, F., and Bontemp, G. (2008). MINET: an open source R/Bioconductor package for mutual information based network inference. BMC Bioinf. 9: 461. https://doi.org/10.1186/1471-2105-9-461.Suche in Google Scholar PubMed PubMed Central

Michailidis, G. and d’Alche-Buc, F. (2013). Autoregressive models for gene regulatory network inference: sparsity, stability and causality issues. Math. Biosci. 246: 326–334. https://doi.org/10.1016/j.mbs.2013.10.003.Suche in Google Scholar PubMed

Nagarajan, R., Scutari, M., and Lebre, S. (2013). Bayesian networks in R with applications in systems biology. New York: Springer.10.1007/978-1-4614-6446-4Suche in Google Scholar

Nepomuceno-Chamorro, I.A., Aguilar-Ruiz, J.S., and Riquelme, J.C. (2010). Inferring gene regression networks with model trees. BMC Bioinf. 11: 517. https://doi.org/10.1186/1471-2105-11-517.Suche in Google Scholar PubMed PubMed Central

Nonaka, G., Blankschien, M., Herman, C., Gross, C.A., and Rhodius, V.A. (2006). Regulon and promoter analysis of the E. coli heat-shock factor, sigma32, reveals a multifaceted cellular response to heat stress. Genes Dev. 20: 1776–1789. https://doi.org/10.1101/gad.1428206.Suche in Google Scholar PubMed PubMed Central

Omony, J. (2014). Biological Network Inference: a review of methods and assessment of tools and techniques. Annu. Res. Rev. Biol. 4: 577–601. https://doi.org/10.9734/arrb/2014/5718.Suche in Google Scholar

Penfold, C.A. and Wild, D.L. (2011). How to infer gene networks from expression profiles, revisited. Interface Focus 1: 857–870. https://doi.org/10.1098/rsfs.2011.0053.Suche in Google Scholar PubMed PubMed Central

Prim, R.C. (1957). Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36: 1389–1401. https://doi.org/10.1002/j.1538-7305.1957.tb01515.x.Suche in Google Scholar

R Core Team (2021). R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.Suche in Google Scholar

Schaefer, J., Opgen-Rhein, R., and Strimmer, K. (2015). GeneNet: modeling and inferring gene networks. R package version 1.2.13.Suche in Google Scholar

Schäfer, J. and Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 4: 32. https://doi.org/10.2202/1544-6115.1175.Suche in Google Scholar PubMed

Scutari, M. (2010). Learning Bayesian networks with the bnlearn R package. J. Stat. Software 35: 1–22. https://doi.org/10.18637/jss.v035.i03.Suche in Google Scholar

Shannon, P. (2021). DREAM4: synthetic expression data for gene regulatory network inference from the 2009 DREAM4 challenge. R package version 1.30.0.Suche in Google Scholar

Shimamura, T., Imoto, S., Yamaguchi, R., Fujita, A., Nagasaki, M., and Miyano, S. (2009). Recursive regularization for inferring gene networks from time-course gene expression profiles. BMC Syst. Biol. 3: 41.https://doi.org/10.1186/1752-0509-3-41.Suche in Google Scholar PubMed PubMed Central

Sulaimanov, N., Kumar, S., Burdet, F., Ibberson, M., Pagni, M., and Koeppl, H. (2019). Inferring gene expression networks with hubs using a degree weighted Lasso approach. Bioinformatics 35: 987–994. https://doi.org/10.1093/bioinformatics/bty716.Suche in Google Scholar PubMed

Tan, K.M., London, P., Mohan, K., Lee, S.I., Fazel, M., and Witten, D. (2014). Learning graphical models with hubs. J. Mach. Learn. Res. 15: 3297–3331.Suche in Google Scholar

Tipping, M.E. (2001). Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1: 211–244.Suche in Google Scholar

Tsai, M.J., Wang, J.R., Ho, S.J., Shu, L.S., Huang, W.L., and Ho, S.Y. (2020). GREMA: modelling of emulated gene regulatory networks with confidence levels based on evolutionary intelligence to cope with the underdetermined problem. Bioinformatics 36: 3833–3840. https://doi.org/10.1093/bioinformatics/btaa267.Suche in Google Scholar PubMed

Werhli, A.V. and Husmeier, D. (2007). Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowledge. Stat. Appl. Genet. Mol. Biol. 6: 15. https://doi.org/10.2202/1544-6115.1282.Suche in Google Scholar PubMed

Zhang, B. and Horvath, S. (2005). A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4: 17. https://doi.org/10.2202/1544-6115.1128.Suche in Google Scholar PubMed

Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Series B Stat. Methodol. 67: 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x.Suche in Google Scholar

Received: 2020-09-04

Accepted: 2021-11-15

Published Online: 2021-12-28

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/sagmb-2020-0054

Schlagwörter für diesen Artikel

ARD; gene expression time series; GRN inference; model selection; regulatory hub; VAR1