Abstract
Pathway topology and relationships between genes have the potential to provide information for modeling effects of mRNA gene expression on complex traits. For example, researchers may wish to incorporate the prior belief that “hub” genes (genes with many neighbors) are more likely to influence the trait. In this paper, we propose and compare six Bayesian pathway-based prior models to incorporate pathway topology information into association analyses. Including prior information regarding the relationships among genes in a pathway was effective in somewhat improving detection rates for genes associated with complex traits. Through an extensive set of simulations, we found that when hub (central) effects are expected, the diagonal degree model is preferred; when spoke (edge) effects are expected, the spatial power model is preferred. When there is no prior knowledge about the location of the effect genes in the pathway (e.g., hub versus spoke model), it is worthwhile to apply multiple models, as the model with the best DIC is not always the one with the best detection rate. We also applied the models to pharmacogenomic studies for the drugs gemcitabine and 6-mercaptopurine and found that the diagonal degree model identified an association between 6-mercaptopurine response and expression of the gene SLC28A3, which was not detectable using the model including no pathway information. These results demonstrate the value of incorporating pathway information into association analyses.
References
Barnard, J., R. E. McCulloch and X. Meng (2000): “Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage,” Stat. Sin., 10, 1281–1311.Search in Google Scholar
Besag, J., J. York and A. Mollie (1991): “Bayesian image restoration with two applications in spatial statistics (with discussion),” Ann. Inst. Stat. Math., 43, 1–59.Search in Google Scholar
Binder, H. and M. Schumacher (2009): “Incorporating pathway information into boosting estimation of high-dimensional risk prediction models,” BMC Bioinformatics, 10, 18.10.1186/1471-2105-10-18Search in Google Scholar PubMed PubMed Central
Conti, D. V. and J. S. Witte (2003): “Hierarchical modeling of linkage disequilibrium: Genetic structure and spatial relations,” Am. J. Hum. Genet., 72, 351–363.Search in Google Scholar
Dørum, G., L. Snipen, M. Solheim and S. Saebo (2011): “Smoothing gene expression data with network information improves consistency of regulated genes,” Stat. Appl. Genet. Mol. Biol., 10, 1–26.Search in Google Scholar
Fridley, B. L., G. Jenkins, D. J. Schaid and L. Wang (2009): “A Bayesian hierarchical nonlinear model for assessing the association between genetic variation and drug cytotoxicity,” Stat. Med., 28, 2709–2722.Search in Google Scholar
Gelman, A. (2006): “Prior distributions for variance parameters in hierarchical models,” Bayesian Analysis, 1, 515–533.10.1214/06-BA117ASearch in Google Scholar
Gelman, A., J. B. Carlin, H. S. Stern and D. B. Rubin (1995): Bayesian data analysis, Boca Raton: Chapman & Hall.10.1201/9780429258411Search in Google Scholar
Gelman, A., J. Carlin, H. Stern and D. Rubin (2004): Bayesian data analysis, Boca Raton: CRC Press.10.1201/9780429258480Search in Google Scholar
Hewett, M., D. E. Oliver, D. L. Rubin, K. L. Easton, J. M. Stuart, et al. (2002): “PharmGKB: The pharmacogenetics knowledge base,” Nucleic Acids Res., 30, 163–165.Search in Google Scholar
Hoerl, A. E. and R. W. Kennard (1970): “Ridge regression: applications to nonorthogonal problems,” Technometrics, 12, 69–82.10.1080/00401706.1970.10488635Search in Google Scholar
Johannes, M., J. C. Brase, H. Frohlich, S. Gade, M. Gehrmann, et al. (2010): “Integration of pathway knowledge into a reweighted recursive feature elimination approach for risk stratification of cancer patients,” Bioinformatics, 26, 2136–2144.10.1093/bioinformatics/btq345Search in Google Scholar PubMed
Kanehisa, M. and S. Goto (2000): “KEGG: Kyoto encyclopedia of genes and genomes,” Nucleic Acids Res., 28, 27–30.Search in Google Scholar
Klein, T. E., J. T. Chang, M. K. Cho, K. L. Easton, R. Fergerson, et al. (2001): “Integrating genotype and phenotype information: an overview of the PharmGKB project,” Pharmacogenomics J., 1, 167–170.Search in Google Scholar
Lebre, S., J. Becq, F. Devaux, M. P. Stumpf and G. Lelandais (2010): “Statistical inference of the time-varying structure of gene-regulation networks,” BMC Syst. Biol., 4, 130.Search in Google Scholar
Li, C. and H. Li (2008): “Network-constrained regularization and variable selection for analysis of genomic data,” Bioinformatics, 24, 1175–1182.10.1093/bioinformatics/btn081Search in Google Scholar PubMed
Li, L., B. Fridley, K. Kalari, G. Jenkins, A. Batzler, et al. (2008): “Gemcitabine and cytosine arabinoside cytotoxicity: association with lymphoblastoid cell expression,” Cancer Res., 68, 7050–7058.Search in Google Scholar
Li, F., B. Fridley, A. Matimba, K. Kalari, L. Pelleymounter, et al. (2010): “Ecto-5′-nucleotidase (NT5E) and thiopurine cellular circulation: association with cytotoxicity,” Drug Metab. Dispos., 38, 2329–2338.Search in Google Scholar
Lunn, D. J., A. Thomas, N. Best and D. Spiegelhalter (2000): “WinBUGS – a Bayesian modelling framework: concepts, structure, and extensibility,” Stat. Comput., 10, 325–337.Search in Google Scholar
Porzelius, C., M. Johannes, H. Binder and T. Beissbarth (2011): “Leveraging external knowledge on molecular interactions in classification methods for risk prediction of patients,” Biom. J. 53, 190–201.Search in Google Scholar
Spiegelhalter, D. J., N. G. Best, B. P. Carlin and A. van der Linde (2002): “Bayesian measures of model complexity and fit,” J. Roy. Stat. Soc. B, 64, 583–639.Search in Google Scholar
Tai, F., W. Pan and X. Shen (2009): “Bayesian variable selection in regression with networked predictors,” Technical report, Department of Biostatistics, University of Minnesota.10.1142/9789814324861_0005Search in Google Scholar
Thomas, A. (2004): “BRugs user manual, version 1.0,” Department of Mathematics & Statistics, University of Helsinki.Search in Google Scholar
Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. Roy. Stat. Soc. B Met. 58, 267–288.Search in Google Scholar
Vannucci, M. and F. C. Stingo (2010): “Bayesian models for variable selection that incorporate biological information,” Bayesian Stat., 9, 659–673.Search in Google Scholar
West, D. B. (2001): Introduction to graph theory, Prentice Hall.Search in Google Scholar
Zaza, G., M. Cheok, N. Krynetskaia, C. Thorn, G. Stocco, et al. (2010): “Thiopurine pathway,” Pharmacogenet Genomics, 20, 573–574.10.1097/FPC.0b013e328334338fSearch in Google Scholar PubMed PubMed Central
©2013 by Walter de Gruyter Berlin Boston
Articles in the same Issue
- Masthead
- Masthead
- Genome-wide association studies with high-dimensional phenotypes
- The mid p-value in exact tests for Hardy-Weinberg equilibrium
- General power and sample size calculations for high-dimensional genomic data
- A graphical model method for integrating multiple sources of genome-scale data
- Highly efficient factorial designs for cDNA microarray experiments: use of approximate theory together with a step-up step-down procedure
- Bayesian genomic models for the incorporation of pathway topology knowledge into association studies
- Improving the efficiency of genomic selection
- Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions
Articles in the same Issue
- Masthead
- Masthead
- Genome-wide association studies with high-dimensional phenotypes
- The mid p-value in exact tests for Hardy-Weinberg equilibrium
- General power and sample size calculations for high-dimensional genomic data
- A graphical model method for integrating multiple sources of genome-scale data
- Highly efficient factorial designs for cDNA microarray experiments: use of approximate theory together with a step-up step-down procedure
- Bayesian genomic models for the incorporation of pathway topology knowledge into association studies
- Improving the efficiency of genomic selection
- Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions