Abstract
The statistical methodology developed in this study was motivated by our interest in studying neurodevelopment using the mouse brain RNA-Seq data set, where gene expression levels were measured in multiple layers in the somatosensory cortex across time in both female and male samples. We aim to identify differentially expressed genes between adjacent time points, which may provide insights on the dynamics of brain development. Because of the extremely small sample size (one male and female at each time point), simple marginal analysis may be underpowered. We propose a Markov random field (MRF)-based approach to capitalizing on the between layers similarity, temporal dependency and the similarity between sex. The model parameters are estimated by an efficient EM algorithm with mean field-like approximation. Simulation results and real data analysis suggest that the proposed model improves the power to detect differentially expressed genes than simple marginal analysis. Our method also reveals biologically interesting results in the mouse brain RNA-Seq data set.
Acknowledgments
We thank Matthew W. State for the financial support of the first author. All computations except that in Figure 4 were performed on the Yale University Biomedical High Performance Computing Center. This study was supported by National Institutes of Health [GM59507, CA154295, MH106934 and NS051869 (N.S.)], and National Science Foundation [DMS 1106738].
Conflict of interest statement: None declared.
References
Anders, S. and W. Huber (2010): “Differential expression analysis for sequence count data,” Genome Biol., 11, R106.Search in Google Scholar
Besag, J. (1986): “On the statistical analysis of dirty pictures,” J. R. Stat. Soc. B Methodol., 259–302.10.1111/j.2517-6161.1986.tb01412.xSearch in Google Scholar
Celeux, G., F. Forbes and N. Peyrard (2003): “Em procedures using mean field-like approximations for markov model-based image segmentation,” Pattern Recogn., 36, 131–144.Search in Google Scholar
Chandler, D. (1987): Introduction to modern statistical mechanics, Oxford University Press, Oxford, UK, Vol. 5, pp. 119–158.Search in Google Scholar
Chen, M., J. Cho and H. Zhao (2011): “Incorporating biological pathways via a markov random field model in genome-wide association studies,” PLoS Genet., 7, e1001353.Search in Google Scholar
Efron, B. (2004): “Large-scale simultaneous hypothesis testing,” J. Am. Stat. Assoc., 99, 96–104.Search in Google Scholar
Fernández-Medarde, A., A. Porteros, J. De Las Rivas, A. Núñez, J. Fuster and E. Santos (2007): “Laser microdissection and microarray analysis of the hippocampus of ras-grf1 knockout mice reveals gene expression changes affecting signal transduction pathways related to memory and learning,” Neuroscience, 146, 272–285.10.1016/j.neuroscience.2007.01.022Search in Google Scholar PubMed
Fertuzinhos, S., M. Li, Y. I. Kawasawa, V. Ivic, D. Franjic, D. Singh, M. Crair and N. Šestan (2014): “Laminar and temporal expression dynamics of coding and noncoding rnas in the mouse neocortex,” Cell Rep., 6, 938–950.Search in Google Scholar
Geschwind, D. H. and P. Levitt (2007): “Autism spectrum disorders: developmental disconnection syndromes,” Curr. Opin. Neurobiol., 17, 103–111.Search in Google Scholar
Glaus, P., A. Honkela and M. Rattray (2012): “Identifying differentially expressed transcripts from rna-seq data with biological variation,” Bioinformatics, 28, 1721–1728.10.1093/bioinformatics/bts260Search in Google Scholar PubMed PubMed Central
Grimm, J., M. Sachs, S. Britsch, S. Di Cesare, T. Schwarz-Romond, K. Alitalo and W. Birchmeier (2001): “Novel p62dok family members, dok-4 and dok-5, are substrates of the c-ret receptor tyrosine kinase and mediate neuronal differentiation,” J. Cell Biol., 154, 345–354.10.1083/jcb.200102032Search in Google Scholar PubMed PubMed Central
Huang, D., B. T. Sherman and R. A. Lempicki (2008): “Systematic and integrative analysis of large gene lists using david bioinformatics resources,” Nat. Protoc., 4, 44–57.Search in Google Scholar
Kwan, K. Y., N. Sestan and E. Anton (2012): “Transcriptional co-regulation of neuronal migration and laminar identity in the neocortex,” Development, 139, 1535–1546.10.1242/dev.069963Search in Google Scholar PubMed PubMed Central
Leng, N., J. A. Dawson, J. A. Thomson, V. Ruotti, A. I. Rissman, B. M. Smits, J. D. Haag, M. N. Gould, R. M. Stewart and C. Kendziorski (2013): “Ebseq: an empirical bayes hierarchical model for inference in rna-seq experiments,” Bioinformatics, 29, 1035–1043.10.1093/bioinformatics/btt087Search in Google Scholar PubMed PubMed Central
Li, B. and C. N. Dewey (2011): “Rsem: accurate transcript quantification from rna-seq data with or without a reference genome,” BMC bioinformatics, 12, 323.10.1186/1471-2105-12-323Search in Google Scholar PubMed PubMed Central
Li, C., Z. Wei and H. Li (2010a): “Network-based empirical bayes methods for linear models with applications to genomic data,” J. Biopharm. Stat., 20, 209–222.10.1080/10543400903572712Search in Google Scholar PubMed PubMed Central
Li, H., Z. Wei and J. Maris (2010b): “A hidden markov random field model for genome-wide association studies,” Biostatistics, 11, 139–150.10.1093/biostatistics/kxp043Search in Google Scholar PubMed PubMed Central
Lin, Z., S. J. Sanders, M. Li, N. Sestan, M. W. State and H. Zhao (2015): “A markov random field-based approach to characterizing human brain development using spatial-temporal transcriptome data,” Annal. Appl. Stat., 9, 429–451.Search in Google Scholar
McCarthy, D. J., Y. Chen and G. K. Smyth (2012): “Differential expression analysis of multifactor rna-seq experiments with respect to biological variation,” Nuc. Acids Res., 40, 4288–4297.Search in Google Scholar
Nariai, N., O. Hirose, K. Kojima and M. Nagasaki (2013): “Tigar: transcript isoform abundance estimation method with gapped alignment of rna-seq data by variational bayesian inference,” Bioinformatics, 29, 2292–2299.10.1093/bioinformatics/btt381Search in Google Scholar PubMed
Newton, M. A., C. M. Kendziorski, C. S. Richmond, F. R. Blattner and K.-W. Tsui (2001): “On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data,” J. Comput. Biol., 8, 37–52.Search in Google Scholar
Nicolae, M., S. Mangul, I. I. Mandoiu and A. Zelikovsky (2011): “Estimation of alternative splicing isoform frequencies from rna-seq data.” Algorithms Mol. Biol., 6, 9.Search in Google Scholar
Pletikos, M., A. M. Sousa, G. Sedmak, K. A. Meyer, Y. Zhu, F. Cheng, M. Li, Y. I. Kawasawa and N. Sestan (2014): “Temporal specification and bilaterality of human neocortical topographic gene expression,” Neuron, 81, 321–332.10.1016/j.neuron.2013.11.018Search in Google Scholar PubMed PubMed Central
Robinson, M. D. and G. K. Smyth (2007): “Moderated statistical tests for assessing differences in tag abundance,” Bioinformatics, 23, 2881–2887.10.1093/bioinformatics/btm453Search in Google Scholar PubMed
Robinson, M. D. and G. K. Smyth (2008): “Small-sample estimation of negative binomial dispersion, with applications to sage data,” Biostatistics, 9, 321–332.10.1093/biostatistics/kxm030Search in Google Scholar PubMed
Robinson, M. D., D. J. McCarthy and G. K. Smyth (2010): “Edger: a bioconductor package for differential expression analysis of digital gene expression data,” Bioinformatics, 26, 139–140.10.1093/bioinformatics/btp616Search in Google Scholar PubMed PubMed Central
Rossell, D., C. S.-O. Attolini, M. Kroiss and A. Stöcker (2014): “Quantifying alternative splicing from paired-end rna-sequencing data,” Ann. Appl. Stat., 8, 309.Search in Google Scholar
Sestan, N. and M. W. State (2012): “The emerging biology of autism spectrum disorders,” Science, 337, 1301.10.1126/science.1224989Search in Google Scholar PubMed PubMed Central
Sherman, B. T., R. A. Lempicki and W. Huang da (2009): “Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists,” Nuc. Acids Res., 37, 1–13.Search in Google Scholar
Trapnell, C., D. G. Hendrickson, M. Sauvageau, L. Goff, J. L. Rinn and L. Pachter (2013): “Differential analysis of gene regulation at transcript resolution with rna-seq,” Nat. Biotechnol., 31, 46–53.Search in Google Scholar
Trapnell, C., B. A. Williams, G. Pertea, A. Mortazavi, G. Kwan, M. J. Van Baren, S. L. Salzberg, B. J. Wold and L. Pachter (2010): “Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation,” Nat Biotechnol., 28, 511–515.Search in Google Scholar
Walsh, C. A., E. M. Morrow and J. L. Rubenstein (2008): “Autism and brain development,” Cell, 135, 396–400.10.1016/j.cell.2008.10.015Search in Google Scholar PubMed PubMed Central
Wang, Z., M. Gerstein and M. Snyder (2009): “Rna-seq: a revolutionary tool for transcriptomics,” Nat. Rev. Genet., 10, 57–63.Search in Google Scholar
Wei, Z. and H. Li (2007): “A markov random field model for network-based analysis of genomic data,” Bioinformatics, 23, 1537–1544.10.1093/bioinformatics/btm129Search in Google Scholar PubMed
Wei, Z. and H. Li (2008): “A hidden spatial-temporal markov random field model for network-based analysis of time course gene expression data,” Ann. Appl. Stat., 2, 408–429.Search in Google Scholar
Zhang, J. (1992): “The mean field theory in em procedures for markov random fields,” IEEE T. Signal Proces., 40, 2570–2583.Search in Google Scholar
Zhou, X., H. Lindsay and M. D. Robinson (2014): “Robustly detecting differential expression in rna sequencing data using observation weights,” Nuc. Acids Res., 42, e91.Search in Google Scholar
Supplemental Material:
The online version of this article (DOI: 10.1515/sagmb-2015-0070) offers supplementary material, available to authorized users.
©2016 by De Gruyter
Articles in the same Issue
- Frontmatter
- Research Articles
- What if we ignore the random effects when analyzing RNA-seq data in a multifactor experiment
- A graph theoretical approach to data fusion
- Resistant multiple sparse canonical correlation
- A Markov random field-based approach for joint estimation of differentially expressed genes in mouse transcriptome data
- AGGrEGATOr: A Gene-based GEne-Gene interActTiOn test for case-control association studies
- Comparing five statistical methods of differential methylation identification using bisulfite sequencing data
Articles in the same Issue
- Frontmatter
- Research Articles
- What if we ignore the random effects when analyzing RNA-seq data in a multifactor experiment
- A graph theoretical approach to data fusion
- Resistant multiple sparse canonical correlation
- A Markov random field-based approach for joint estimation of differentially expressed genes in mouse transcriptome data
- AGGrEGATOr: A Gene-based GEne-Gene interActTiOn test for case-control association studies
- Comparing five statistical methods of differential methylation identification using bisulfite sequencing data