Abstract
In genomic analysis, there is growing interest in network structures that represent biochemistry interactions. Graph structured or constrained inference takes advantage of a known relational structure among variables to introduce smoothness and reduce complexity in modeling, especially for high-dimensional genomic data. There has been a lot of interest in its application in model regularization and selection. However, prior knowledge on the graphical structure among the variables can be limited and partial. Empirical data may suggest variations and modifications to such a graph, which could lead to new and interesting biological findings. In this paper, we propose a Bayesian random graph-constrained model, rGrace, an extension from the Grace model, to combine a priori network information with empirical evidence, for applications such as pathway analysis. Using both simulations and real data examples, we show that the new method, while leading to improved predictive performance, can identify discrepancy between data and a prior known graph structure and suggest modifications and updates.
Appendix A Derivation of Sampling Scheme for β|σ2
According to Andrews and Mallows (1947), for a>0, a scale mixture of normal distributions representation of the Laplace distribution is

Let
and Z=||βj||2. Then

or

Therefore,

Hence

where Ds is a block diagonal matrix with J blocks in the diagonal, and the j th block Dsj is
Note that, assuming a block structure for 

where each Oj is an orthogonal matrix and
is a diagonal matrix, that is
With the block diagonal structure assumption of
(2) can be written as:

As a result, we can treat β|σ2 alternatively as:

This research is, in parts, supported by NSF grants DMS-0714669 and SES-1023176, NIH grant R01 GM070789, and a 2010 Google research award. We would like to thank two anonymous reviewers for their constructive comments.
References
Andrews, D. F. and C. L. Mallows (1947): “Scale mixtures of normal distributions,” J. Roy. Stat. Soc. B, 36, 99–102.Search in Google Scholar
Bae, K. and B. K. Mallick (2004): “Gene selection using a two-level hierarchical Bayesian model,” Bioinformatics, 20, 3423–3430.10.1093/bioinformatics/bth419Search in Google Scholar PubMed
Bar-Joseph, Z., G. K. Gerber, T. I. Lee, N. J. Rinaldi, J. Y. Yoo, F. Robert, D. B. Gordon, E. Fraenkel, T. S. Jaakkola, R. A. Young and D. K. Gifford (2003): “Computational discovery of gene modules and regulatory networks,” Nat. Biotechnol., 21, 1337–1342.Search in Google Scholar
Baranzini, S. E., N. W. Galwey, J. Wang, P. Khankhanian, R. Lindberg, D. Pelletier, W. Wu, B. M. Uitdehaag, L. Kappos, C. H. Polman and GeneMSA Consortium (2009): “Pathway and network-based analysis of genome-wide association studies in multiple sclerosis,” Hum. Mol. Genet., 18, 2078–2090.Search in Google Scholar
Carlin, B. P. and S. Chib (1995): “Bayesian model choice via Markov chain Monte Carlo methods,” J. Roy. Stat. Soc. B, 57, 473–484.Search in Google Scholar
Chung, F. (1997): Spectral graph theory, Vol. 92 of CBMS Regional Conferences Series. American Mathematical Society, Providence.10.1090/cbms/092Search in Google Scholar
De Jong, S., M. Boks, T. Fuller, E. Strengman, E. Janson, C. De Kovel, A. Ori, N. Vi, F. Mulder, J. Blom, B. Glenthoj, C. Schbart, W. Cahn, R. Kahn, S. Horvath and R. Ophoff (2012): “A gene co-expression network in whole blood of schizophrenia patients is independent of antipsychotic-use and enriched for brain-expressed genes,” PLoS One, 7, e39498.10.1371/journal.pone.0039498Search in Google Scholar PubMed PubMed Central
Dellaportas, P., J. J. Forster and I. Ntzoufras (2002): “On Bayesian model and variable selection using MCMC,” Stat. Comput., 12, 27–36.Search in Google Scholar
Elbers, C. C., K. R. van Eijk, L. Franke, F. Mulder, Y. T. van der Schouw, C. Wijmenga, and N. C. Onland-Moret (2009): “Using genome-wide pathway analysis to unravel the etiology of complex diseases,” Genet. Epidemiol., 33, 419–431.Search in Google Scholar
Emily, M., T. Mailund1, J. Hein, L. Schauser and M. H. Schierup (2009): “Using biological networks to search for interacting loci in genome-wide association studies,” Eur. J. Hum. Genet., 17, 1231–1240.Search in Google Scholar
Franke, L., H. van Bakel, L. Fokkens, E. D. de Jong, M. Egmont-Peterson and C. Wijmenga (2006): “Reconstructing of a functional human gene network with an application for prioritizing positional candidate genes,” Am. J. Hum. Genet., 78, 1011–1025.Search in Google Scholar
Friedman, J., T. Hastie and R. Tibshirani (2010): “A note on the group lasso and a sparse group lasso,” Arxiv, arXiv:1001.0736.Search in Google Scholar
Griffin, J. E. and P. J. Brown (2007): “Bayesian adaptive lasso with non-convex penalization,” Aust. NZ. J. Stat., 53, 423–442.Search in Google Scholar
Han, C. and B. P. Carlin (2001): “Markov chain Monte Carlo methods for computing Bayes factors: a comparative review,” J. Am. Stat. Assoc., 96, 1122–1132.Search in Google Scholar
Holmes, C. C and L. Held (2006): “Bayesian auxiliary variable models for binary and multinomial regression,” Bayesian Analysis, 1, 145–168.10.1214/06-BA105Search in Google Scholar
Hunter, S., R. Apweiler, T. K. Attwood, A. Bairoch, A. Bateman, D. Binns, P. Bork, U. Das, L. Daugherty, L. Duquenne, R. D. Finn, J. Gough, D. Haft, N. Hulo, D. Kahn, E. Kelly, A. Laugraud, I. Letunic, D. Lonsdale, R. Lopez, M. Madera, J. Maslen, C. McAnulla, J. McDowall, J. Mistry, A. Mitchell, N. Mulder, D. Natale, C. Orengo, A. F. Quinn, J. D. Selengut, C. J. Sigrist, M. Thimma, P. D. Thomas, F. Valentin, D. Wilson, C. H. Wu and C. Yeats (2009): InterPro: the integrative protein signature database,” Nucl. Acids Res., 37, D211–D215.Search in Google Scholar
Ishwaran, H. and J. S. Rao (2005): “Spike and slab variable selection: frequentist and Bayesian strategies,” Ann. Stat., 33, 730–773.Search in Google Scholar
Kang, J. and J. Guo (2009): Self-adaptive lasso and its Bayesian estimation. Technical report, University of Michigan. Available at http://www.stat.lsa.umich.edu/~guojian/publications/manuscript_bayesso_arxiv.pdf.Search in Google Scholar
Kim, M., H. Shin, T. S. Chung, J.-G. Joung and J. H. Kim (2011): “Extracting regulatory modules from gene expression data by sequential pattern mining,” BMC Genomics, 12(Suppl 3), S5.10.1186/1471-2164-12-S3-S5Search in Google Scholar PubMed PubMed Central
Kyung, M., J. Gill, M. Ghosh and G. Casella (2010): Penalized regression, standard errors, and Bayesian lassos,” Bayesian Analysis, 5, 369–412.10.1214/10-BA607Search in Google Scholar
Langfelder, P. and S. Horvath (2008): “WGCNA: an R package for weighted correlation network analysis,” BMC Bioinformatics, 9, 559.10.1186/1471-2105-9-559Search in Google Scholar PubMed PubMed Central
Li, C. and H. Li (2008): “Network-constrained regularization and variable selection for analysis of genomic data,” Bioinformatics, 24, 1175–118.10.1093/bioinformatics/btn081Search in Google Scholar PubMed
Li, C. and H. Li (2010): “Variable selection and regression analysis for graph-structured covariates with an application to genomics,” Ann. Appl. Stat., 4, 1498–1516.Search in Google Scholar
Li, Q. and N. Lin (2010): “The Bayesian elastic net,” Bayesian Analysis, 5, 151–170.10.1214/10-BA506Search in Google Scholar
Li, F. and N. R. Zhang (2010): “Bayesian variable selection in structured high-dimensional covariate spaces with applications in genomics,” J. Am. Stat. Assoc., 105, 1202–1214.Search in Google Scholar
Li, Y., C. Campbell and M. Tipping (2002): “Bayesian automatic relevance determination algorithms for classifying gene expression data,” Bioinformatics, 18, 1332–1339.10.1093/bioinformatics/18.10.1332Search in Google Scholar PubMed
Liu, F. and A. C. Lozano (2011): “A Graph Laplacian prior for variable selection and grouping,” Biometrika, 98, 1–31.Search in Google Scholar
Liu, J., J. Huang and S. Ma (2013): “Incorporating network structure in integrative analysis of cancer prognosis data,” Genet. Epidemiol., 37, 173–83.Search in Google Scholar
Lu, T., Y. Pan, S. Kao, C. Li, I. Kohane, J. Chan and B. A. Yankner (2004): “Gene regulation and DNA damage in the ageing human brain,” Nature, 429, 883–891.10.1038/nature02661Search in Google Scholar PubMed
Mason, M. J., G. Fan, K. Plath, Q. Zhou and S. Horvath (2009): “Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells,” BMC Genomics, 10, 327.10.1186/1471-2164-10-327Search in Google Scholar PubMed PubMed Central
Meier, L., S. van de Geer and P. Buhlmann (2008): “The group lasso for logistic regression,” J. Roy. Stat. Soc. B, 70, 53–71.Search in Google Scholar
Mukherjee, S. and T. P. Speed (2008): “Network inference using informative priors,” Proc. Natl. Acad. Sci., 105, 14313–14318.Search in Google Scholar
Newman, M. E. (2006): “Modularity and community structure in networks,” Proc. Natl. Acad. Sci., 103, 8577–8582.Search in Google Scholar
Pan, W., B. Xie and X. Shen (2010): “Incorporating predictor network in penalized regression with application to microarray data,” Biometrics, 66, 474–484.10.1111/j.1541-0420.2009.01296.xSearch in Google Scholar PubMed PubMed Central
Park, T. and G. Casella (2008): “The Bayesian lasso,” J. Am. Stat. Assoc., 103, 681–686.Search in Google Scholar
Raman, S., T. J. Fuchs, P. J. Wild, E. Dahl and V. Roth (2009): The Bayesian group-lasso for analyzing contingency tables. Proceedings of the 26th International Conference on Machine Learning, 881–888.10.1145/1553374.1553487Search in Google Scholar
Razick, S., G. Magklaras and I. M. Donaldson (2008): “iRefIndex: a consolidated protein interaction database with provenance,” BMC Bioinformatics, 9, 405.10.1186/1471-2105-9-405Search in Google Scholar PubMed PubMed Central
Ravasz, E., A. L. Somera, D. A. Mongru, Z. N. Oltvai and A. L. Barabasi (2002): “Hierarchical organization of modularity in metabolic networks,” Science, 297, 1551–1555.10.1126/science.1073374Search in Google Scholar PubMed
Robins, G., P. Pattison, Y. Kalish and D. Lusher (2006): “An introduction to exponential random graph models for social networks,” Social Networks, 29, 173–191.10.1016/j.socnet.2006.08.002Search in Google Scholar
Ruan, J. and W. Zhang (2008): “Identifying network communities with a high resolution,” Phys. Rev. E, 77, 016104.Search in Google Scholar
Rzhetsky, A., T. Zheng and C. Weinreb (2006): “Self-correcting maps of molecular pathways,” PLoS One, 1(1), e6110.1371/journal.pone.0000061Search in Google Scholar PubMed PubMed Central
Schadt, E., S. W. Edwards, D. GuhaThakurta, D. Holder, L. Ying, V. Svetnik, A. Leonardson, K. W. Hart, A. Russell, G. Li, G. Cavet, J. Castle, P. McDonagh, Z. Kan, R. Chen, A. Kasarskis, M. Margarint, R. M. Caceres, J. M. Johnson, C. D. Armour, P. W. Garrett-Engele, N. F. Tsinoremas, and D. D. Shoemaker (2004): “A comprehensive transcript index of the human genome generated using microarrays and computational approaches,” Gen. Biol., 5, R73.Search in Google Scholar
Segal, E., M. Shapira, A. Regev, D. Pe’er, D. Botstein, D. Koller and N. Friedman (2003): “Module networks: identifying regulatory modules and their condition specific regulators from gene expression data,” Nat. Genet., 34, 166–176.Search in Google Scholar
Simon, N., J. Friedman, T. Hastie and R. Tibshirani (2012): “The sparse group lasso,” J. Comput. Graph. Stat., DOI:10.1080/10618600.2012.681250.10.1080/10618600.2012.681250Search in Google Scholar
Stingo, F. C., Y. A. Chen, M. G. Tadesse and M. Vannucci (2011): “Incorporating biological information into linear models: A Bayesian approach to the selection of pathways and genes,” Ann. Appl. Stat., 5, 1978–002.Search in Google Scholar
Sun, W., J. G. Ibrahim, and F. Zou (2009): Variable selection by Bayesian adaptive lasso and iterative adaptive lasso, with application for genome-wide multiple loci mapping. Technical report, University of North Carolina at Chapel Hill, Department of Biostatistics. Available at http://biostats.bepress.com/uncbiostat/art10/.Search in Google Scholar
Trunova, S. and E. Giniger (2012): “Absence of the cdk5 activator p35 causes adult-onset neurodegeneration in the central brain of drosophila,” Dis. Mod Mech., 5, 210–219.Search in Google Scholar
Varma, S. and R. Simon (2006): “Bias in error estimation when using cross-validation for model selection,” BMC Bioinformatics, 7, 91.10.1186/1471-2105-7-91Search in Google Scholar PubMed PubMed Central
Watkinson, J., X. Wang, T. Zheng and D. Anastassiou (2008): “Identification of gene interactions associated with disease from gene expression data using synergy networks,” BMC Syst. Biol., 2, 10.Search in Google Scholar
Werhli, A. V. and D. Husmeier (2007): “Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowledge,” Stat. Appl. Genet. Mol. Biol., 6, 15.Search in Google Scholar
Wiggs, J. L., J. H. Kang, B. L. Yaspan, D. B. Mirel, C. Laurie, A. Crenshaw, W. Brodeur, S. Gogarten, L. M. Olson, W. Abdrabou, E. DelBono, S. Loomis, J. L. Haines, L. R. Pasquale, and GENEVA Consortium (2011): “Common variants near CAV1 and CAV2 are associated with primary open-angle glaucoma in Caucasians from the USA,” Hum. Mol. Genet., 20, 4707–4713.Search in Google Scholar
Yang, A. and X. Song (2010): “Bayesian variable selection for disease classification using gene expression data,” Bioinformatics, 26, 215–222.10.1093/bioinformatics/btp638Search in Google Scholar PubMed
Yi, N. and S. Xu (2008): “Bayesian lasso for quantitative trait loci mapping,” Genetics, 179, 1045–1055.10.1534/genetics.107.085589Search in Google Scholar PubMed PubMed Central
Yip, A. and S. Horvath (2007): “The generalized topological overlap matrix for detecting modules in gene network,” BMC Bioinformatics, 8, 22.10.1186/1471-2105-8-22Search in Google Scholar PubMed PubMed Central
Yuan, M. and Y. Lin (2006): “Model selection and estimation in regression with grouped variables,” J. Roy. Stat. Soc. B, 68, 49–67.Search in Google Scholar
Zhang, B. and S. Horvath (2005): “A general framework for weighted gene co-expression network analysis,” Stat. Appl. Genet. Mol. Biol., 4, 17.Search in Google Scholar
Zou, H. and T. Hastie (2005): Regularization and variable selection via the elastic net,” J. Roy. Stat. Soc. B, 67, 301–320.Search in Google Scholar
©2013 by Walter de Gruyter Berlin Boston
Articles in the same Issue
- Masthead
- Masthead
- Review
- Genetic model selection in genome-wide association studies: robust methods and the use of meta-analysis
- Research Articles
- Hierarchical Bayesian mixture modelling for antigen-specific T-cell subtyping in combinatorially encoded flow cytometry studies
- An extension of the Wilcoxon-Mann-Whitney test for analyzing RT-qPCR data
- Block-diagonal discriminant analysis and its bias-corrected rules
- Statistical issues associated with modeling of synonymous mutation data
- Sensitivity to prior specification in Bayesian genome-based prediction models
- Bayesian hierarchical graph-structured model for pathway analysis using gene expression data
Articles in the same Issue
- Masthead
- Masthead
- Review
- Genetic model selection in genome-wide association studies: robust methods and the use of meta-analysis
- Research Articles
- Hierarchical Bayesian mixture modelling for antigen-specific T-cell subtyping in combinatorially encoded flow cytometry studies
- An extension of the Wilcoxon-Mann-Whitney test for analyzing RT-qPCR data
- Block-diagonal discriminant analysis and its bias-corrected rules
- Statistical issues associated with modeling of synonymous mutation data
- Sensitivity to prior specification in Bayesian genome-based prediction models
- Bayesian hierarchical graph-structured model for pathway analysis using gene expression data