Bayesian hierarchical graph-structured model for pathway analysis using gene expression data

Hui Zhou; Tian Zheng

doi:10.1515/sagmb-2013-0011

Article

Bayesian hierarchical graph-structured model for pathway analysis using gene expression data

Hui Zhou and Tian Zheng

Published/Copyright: May 15, 2013

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Statistical Applications in Genetics and Molecular Biology Volume 12 Issue 3

Abstract

In genomic analysis, there is growing interest in network structures that represent biochemistry interactions. Graph structured or constrained inference takes advantage of a known relational structure among variables to introduce smoothness and reduce complexity in modeling, especially for high-dimensional genomic data. There has been a lot of interest in its application in model regularization and selection. However, prior knowledge on the graphical structure among the variables can be limited and partial. Empirical data may suggest variations and modifications to such a graph, which could lead to new and interesting biological findings. In this paper, we propose a Bayesian random graph-constrained model, rGrace, an extension from the Grace model, to combine a priori network information with empirical evidence, for applications such as pathway analysis. Using both simulations and real data examples, we show that the new method, while leading to improved predictive performance, can identify discrepancy between data and a prior known graph structure and suggest modifications and updates.

Keywords: gene expression; network analysis; Bayesian anslysis

Corresponding author: Tian Zheng, Department of Statistics, Columbia University, New York, NY 10027, USA

Appendix A Derivation of Sampling Scheme for β|σ²

According to Andrews and Mallows (1947), for a>0, a scale mixture of normal distributions representation of the Laplace distribution is

Let and Z=||β_j||₂. Then

Therefore,

Hence

where D_s is a block diagonal matrix with J blocks in the diagonal, and the j th block D_{s_j} is Note that, assuming a block structure for

where each O_j is an orthogonal matrix and is a diagonal matrix, that is With the block diagonal structure assumption of (2) can be written as:

As a result, we can treat β|σ² alternatively as:

This research is, in parts, supported by NSF grants DMS-0714669 and SES-1023176, NIH grant R01 GM070789, and a 2010 Google research award. We would like to thank two anonymous reviewers for their constructive comments.

References

Andrews, D. F. and C. L. Mallows (1947): “Scale mixtures of normal distributions,” J. Roy. Stat. Soc. B, 36, 99–102.Search in Google Scholar

Bae, K. and B. K. Mallick (2004): “Gene selection using a two-level hierarchical Bayesian model,” Bioinformatics, 20, 3423–3430.10.1093/bioinformatics/bth419Search in Google Scholar PubMed

Bar-Joseph, Z., G. K. Gerber, T. I. Lee, N. J. Rinaldi, J. Y. Yoo, F. Robert, D. B. Gordon, E. Fraenkel, T. S. Jaakkola, R. A. Young and D. K. Gifford (2003): “Computational discovery of gene modules and regulatory networks,” Nat. Biotechnol., 21, 1337–1342.Search in Google Scholar

Baranzini, S. E., N. W. Galwey, J. Wang, P. Khankhanian, R. Lindberg, D. Pelletier, W. Wu, B. M. Uitdehaag, L. Kappos, C. H. Polman and GeneMSA Consortium (2009): “Pathway and network-based analysis of genome-wide association studies in multiple sclerosis,” Hum. Mol. Genet., 18, 2078–2090.Search in Google Scholar

Carlin, B. P. and S. Chib (1995): “Bayesian model choice via Markov chain Monte Carlo methods,” J. Roy. Stat. Soc. B, 57, 473–484.Search in Google Scholar

Chung, F. (1997): Spectral graph theory, Vol. 92 of CBMS Regional Conferences Series. American Mathematical Society, Providence.10.1090/cbms/092Search in Google Scholar

De Jong, S., M. Boks, T. Fuller, E. Strengman, E. Janson, C. De Kovel, A. Ori, N. Vi, F. Mulder, J. Blom, B. Glenthoj, C. Schbart, W. Cahn, R. Kahn, S. Horvath and R. Ophoff (2012): “A gene co-expression network in whole blood of schizophrenia patients is independent of antipsychotic-use and enriched for brain-expressed genes,” PLoS One, 7, e39498.10.1371/journal.pone.0039498Search in Google Scholar PubMed PubMed Central

Dellaportas, P., J. J. Forster and I. Ntzoufras (2002): “On Bayesian model and variable selection using MCMC,” Stat. Comput., 12, 27–36.Search in Google Scholar

Elbers, C. C., K. R. van Eijk, L. Franke, F. Mulder, Y. T. van der Schouw, C. Wijmenga, and N. C. Onland-Moret (2009): “Using genome-wide pathway analysis to unravel the etiology of complex diseases,” Genet. Epidemiol., 33, 419–431.Search in Google Scholar

Emily, M., T. Mailund1, J. Hein, L. Schauser and M. H. Schierup (2009): “Using biological networks to search for interacting loci in genome-wide association studies,” Eur. J. Hum. Genet., 17, 1231–1240.Search in Google Scholar

Franke, L., H. van Bakel, L. Fokkens, E. D. de Jong, M. Egmont-Peterson and C. Wijmenga (2006): “Reconstructing of a functional human gene network with an application for prioritizing positional candidate genes,” Am. J. Hum. Genet., 78, 1011–1025.Search in Google Scholar

Friedman, J., T. Hastie and R. Tibshirani (2010): “A note on the group lasso and a sparse group lasso,” Arxiv, arXiv:1001.0736.Search in Google Scholar

Griffin, J. E. and P. J. Brown (2007): “Bayesian adaptive lasso with non-convex penalization,” Aust. NZ. J. Stat., 53, 423–442.Search in Google Scholar

Han, C. and B. P. Carlin (2001): “Markov chain Monte Carlo methods for computing Bayes factors: a comparative review,” J. Am. Stat. Assoc., 96, 1122–1132.Search in Google Scholar

Holmes, C. C and L. Held (2006): “Bayesian auxiliary variable models for binary and multinomial regression,” Bayesian Analysis, 1, 145–168.10.1214/06-BA105Search in Google Scholar

Hunter, S., R. Apweiler, T. K. Attwood, A. Bairoch, A. Bateman, D. Binns, P. Bork, U. Das, L. Daugherty, L. Duquenne, R. D. Finn, J. Gough, D. Haft, N. Hulo, D. Kahn, E. Kelly, A. Laugraud, I. Letunic, D. Lonsdale, R. Lopez, M. Madera, J. Maslen, C. McAnulla, J. McDowall, J. Mistry, A. Mitchell, N. Mulder, D. Natale, C. Orengo, A. F. Quinn, J. D. Selengut, C. J. Sigrist, M. Thimma, P. D. Thomas, F. Valentin, D. Wilson, C. H. Wu and C. Yeats (2009): InterPro: the integrative protein signature database,” Nucl. Acids Res., 37, D211–D215.Search in Google Scholar

Ishwaran, H. and J. S. Rao (2005): “Spike and slab variable selection: frequentist and Bayesian strategies,” Ann. Stat., 33, 730–773.Search in Google Scholar

Kang, J. and J. Guo (2009): Self-adaptive lasso and its Bayesian estimation. Technical report, University of Michigan. Available at http://www.stat.lsa.umich.edu/~guojian/publications/manuscript_bayesso_arxiv.pdf.Search in Google Scholar

Kim, M., H. Shin, T. S. Chung, J.-G. Joung and J. H. Kim (2011): “Extracting regulatory modules from gene expression data by sequential pattern mining,” BMC Genomics, 12(Suppl 3), S5.10.1186/1471-2164-12-S3-S5Search in Google Scholar PubMed PubMed Central

Kyung, M., J. Gill, M. Ghosh and G. Casella (2010): Penalized regression, standard errors, and Bayesian lassos,” Bayesian Analysis, 5, 369–412.10.1214/10-BA607Search in Google Scholar

Langfelder, P. and S. Horvath (2008): “WGCNA: an R package for weighted correlation network analysis,” BMC Bioinformatics, 9, 559.10.1186/1471-2105-9-559Search in Google Scholar PubMed PubMed Central

Li, C. and H. Li (2008): “Network-constrained regularization and variable selection for analysis of genomic data,” Bioinformatics, 24, 1175–118.10.1093/bioinformatics/btn081Search in Google Scholar PubMed

Li, C. and H. Li (2010): “Variable selection and regression analysis for graph-structured covariates with an application to genomics,” Ann. Appl. Stat., 4, 1498–1516.Search in Google Scholar

Li, Q. and N. Lin (2010): “The Bayesian elastic net,” Bayesian Analysis, 5, 151–170.10.1214/10-BA506Search in Google Scholar

Li, F. and N. R. Zhang (2010): “Bayesian variable selection in structured high-dimensional covariate spaces with applications in genomics,” J. Am. Stat. Assoc., 105, 1202–1214.Search in Google Scholar

Li, Y., C. Campbell and M. Tipping (2002): “Bayesian automatic relevance determination algorithms for classifying gene expression data,” Bioinformatics, 18, 1332–1339.10.1093/bioinformatics/18.10.1332Search in Google Scholar PubMed

Liu, F. and A. C. Lozano (2011): “A Graph Laplacian prior for variable selection and grouping,” Biometrika, 98, 1–31.Search in Google Scholar

Liu, J., J. Huang and S. Ma (2013): “Incorporating network structure in integrative analysis of cancer prognosis data,” Genet. Epidemiol., 37, 173–83.Search in Google Scholar

Lu, T., Y. Pan, S. Kao, C. Li, I. Kohane, J. Chan and B. A. Yankner (2004): “Gene regulation and DNA damage in the ageing human brain,” Nature, 429, 883–891.10.1038/nature02661Search in Google Scholar PubMed

Mason, M. J., G. Fan, K. Plath, Q. Zhou and S. Horvath (2009): “Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells,” BMC Genomics, 10, 327.10.1186/1471-2164-10-327Search in Google Scholar PubMed PubMed Central

Meier, L., S. van de Geer and P. Buhlmann (2008): “The group lasso for logistic regression,” J. Roy. Stat. Soc. B, 70, 53–71.Search in Google Scholar

Mukherjee, S. and T. P. Speed (2008): “Network inference using informative priors,” Proc. Natl. Acad. Sci., 105, 14313–14318.Search in Google Scholar

Newman, M. E. (2006): “Modularity and community structure in networks,” Proc. Natl. Acad. Sci., 103, 8577–8582.Search in Google Scholar

Pan, W., B. Xie and X. Shen (2010): “Incorporating predictor network in penalized regression with application to microarray data,” Biometrics, 66, 474–484.10.1111/j.1541-0420.2009.01296.xSearch in Google Scholar PubMed PubMed Central

Park, T. and G. Casella (2008): “The Bayesian lasso,” J. Am. Stat. Assoc., 103, 681–686.Search in Google Scholar

Raman, S., T. J. Fuchs, P. J. Wild, E. Dahl and V. Roth (2009): The Bayesian group-lasso for analyzing contingency tables. Proceedings of the 26th International Conference on Machine Learning, 881–888.10.1145/1553374.1553487Search in Google Scholar

Razick, S., G. Magklaras and I. M. Donaldson (2008): “iRefIndex: a consolidated protein interaction database with provenance,” BMC Bioinformatics, 9, 405.10.1186/1471-2105-9-405Search in Google Scholar PubMed PubMed Central

Ravasz, E., A. L. Somera, D. A. Mongru, Z. N. Oltvai and A. L. Barabasi (2002): “Hierarchical organization of modularity in metabolic networks,” Science, 297, 1551–1555.10.1126/science.1073374Search in Google Scholar PubMed

Robins, G., P. Pattison, Y. Kalish and D. Lusher (2006): “An introduction to exponential random graph models for social networks,” Social Networks, 29, 173–191.10.1016/j.socnet.2006.08.002Search in Google Scholar

Ruan, J. and W. Zhang (2008): “Identifying network communities with a high resolution,” Phys. Rev. E, 77, 016104.Search in Google Scholar

Rzhetsky, A., T. Zheng and C. Weinreb (2006): “Self-correcting maps of molecular pathways,” PLoS One, 1(1), e6110.1371/journal.pone.0000061Search in Google Scholar PubMed PubMed Central

Schadt, E., S. W. Edwards, D. GuhaThakurta, D. Holder, L. Ying, V. Svetnik, A. Leonardson, K. W. Hart, A. Russell, G. Li, G. Cavet, J. Castle, P. McDonagh, Z. Kan, R. Chen, A. Kasarskis, M. Margarint, R. M. Caceres, J. M. Johnson, C. D. Armour, P. W. Garrett-Engele, N. F. Tsinoremas, and D. D. Shoemaker (2004): “A comprehensive transcript index of the human genome generated using microarrays and computational approaches,” Gen. Biol., 5, R73.Search in Google Scholar

Segal, E., M. Shapira, A. Regev, D. Pe’er, D. Botstein, D. Koller and N. Friedman (2003): “Module networks: identifying regulatory modules and their condition specific regulators from gene expression data,” Nat. Genet., 34, 166–176.Search in Google Scholar

Simon, N., J. Friedman, T. Hastie and R. Tibshirani (2012): “The sparse group lasso,” J. Comput. Graph. Stat., DOI:10.1080/10618600.2012.681250.10.1080/10618600.2012.681250Search in Google Scholar

Stingo, F. C., Y. A. Chen, M. G. Tadesse and M. Vannucci (2011): “Incorporating biological information into linear models: A Bayesian approach to the selection of pathways and genes,” Ann. Appl. Stat., 5, 1978–002.Search in Google Scholar

Sun, W., J. G. Ibrahim, and F. Zou (2009): Variable selection by Bayesian adaptive lasso and iterative adaptive lasso, with application for genome-wide multiple loci mapping. Technical report, University of North Carolina at Chapel Hill, Department of Biostatistics. Available at http://biostats.bepress.com/uncbiostat/art10/.Search in Google Scholar

Trunova, S. and E. Giniger (2012): “Absence of the cdk5 activator p35 causes adult-onset neurodegeneration in the central brain of drosophila,” Dis. Mod Mech., 5, 210–219.Search in Google Scholar

Varma, S. and R. Simon (2006): “Bias in error estimation when using cross-validation for model selection,” BMC Bioinformatics, 7, 91.10.1186/1471-2105-7-91Search in Google Scholar PubMed PubMed Central

Watkinson, J., X. Wang, T. Zheng and D. Anastassiou (2008): “Identification of gene interactions associated with disease from gene expression data using synergy networks,” BMC Syst. Biol., 2, 10.Search in Google Scholar

Werhli, A. V. and D. Husmeier (2007): “Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowledge,” Stat. Appl. Genet. Mol. Biol., 6, 15.Search in Google Scholar

Wiggs, J. L., J. H. Kang, B. L. Yaspan, D. B. Mirel, C. Laurie, A. Crenshaw, W. Brodeur, S. Gogarten, L. M. Olson, W. Abdrabou, E. DelBono, S. Loomis, J. L. Haines, L. R. Pasquale, and GENEVA Consortium (2011): “Common variants near CAV1 and CAV2 are associated with primary open-angle glaucoma in Caucasians from the USA,” Hum. Mol. Genet., 20, 4707–4713.Search in Google Scholar

Yang, A. and X. Song (2010): “Bayesian variable selection for disease classification using gene expression data,” Bioinformatics, 26, 215–222.10.1093/bioinformatics/btp638Search in Google Scholar PubMed

Yi, N. and S. Xu (2008): “Bayesian lasso for quantitative trait loci mapping,” Genetics, 179, 1045–1055.10.1534/genetics.107.085589Search in Google Scholar PubMed PubMed Central

Yip, A. and S. Horvath (2007): “The generalized topological overlap matrix for detecting modules in gene network,” BMC Bioinformatics, 8, 22.10.1186/1471-2105-8-22Search in Google Scholar PubMed PubMed Central

Yuan, M. and Y. Lin (2006): “Model selection and estimation in regression with grouped variables,” J. Roy. Stat. Soc. B, 68, 49–67.Search in Google Scholar

Zhang, B. and S. Horvath (2005): “A general framework for weighted gene co-expression network analysis,” Stat. Appl. Genet. Mol. Biol., 4, 17.Search in Google Scholar

Zou, H. and T. Hastie (2005): Regularization and variable selection via the elastic net,” J. Roy. Stat. Soc. B, 67, 301–320.Search in Google Scholar

Published Online: 2013-05-15

Published in Print: 2013-06-01

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/sagmb-2013-0011

Keywords for this article

gene expression; network analysis; Bayesian anslysis

Bayesian hierarchical graph-structured model for pathway analysis using gene expression data

Article

Abstract

Appendix A Derivation of Sampling Scheme for β|σ2

References

Articles in the same Issue

Articles in the same Issue

Articles in the same Issue

Appendix A Derivation of Sampling Scheme for β|σ²