Abstract
Correlation analysis is widely used in biological studies to infer molecular relationships within biological networks. Recently, single-cell analysis has drawn tremendous interests, for its ability to obtain high-resolution molecular phenotypes. It turns out that there is little overlap of co-expressed genes identified in single-cell level investigations with that of population level investigations. However, the nature of the relationship of correlations between single-cell and population levels remains unclear. In this manuscript, we aimed to unveil the origin of the differences between the correlation coefficients at the single-cell level and that at the population level, and bridge the gap between them. Through developing formulations to link correlations at the single-cell and the population level, we illustrated that aggregated correlations could be stronger, weaker or equal to the corresponding individual correlations, depending on the variations and the correlations within the population. When the correlation within the population is weaker than the individual correlation, the aggregated correlation is stronger than the corresponding individual correlation. Besides, our data indicated that aggregated correlation is more likely to be stronger than the corresponding individual correlation, and it was rare to find gene-pairs exclusively strongly correlated at the single-cell level. Through a bottom-up approach to model interactions between molecules in a signaling cascade or a multi-regulator-controlled gene expression, we surprisingly found that the existence of interaction between two components could not be excluded simply based on their low correlation coefficients, suggesting a reconsideration of connectivity within biological networks which was derived solely from correlation analysis. We also investigated the impact of technical random measurement errors on the correlation coefficients for the single-cell level and the population level. The results indicate that the aggregated correlation is relatively robust and less affected. Because of the heterogeneity among single cells, correlation coefficients calculated based on data of the single-cell level might be different from that of the population level. Depending on the specific question we are asking, proper sampling and normalization procedure should be done before we draw any conclusions.
Funding source: Scientific research funding of the First Affiliated Hospital of Guangdong Pharmaceutical University
Award Identifier / Grant number: KYQDJF202016
Funding source: National Key Clinical Specialty Construction Project (Clinical Pharmacy) and High Level Clinical Key Specialty (Clinical Pharmacy) in Guangdong Province
Funding source: The Construction Project of NMPA Key Laboratory for Technology Research and Evaluation of Pharmacovigilance
Funding source: Medical Scientific Research Foundation of Guangdong Province, China
Award Identifier / Grant number: A2022182
Funding source: Cultivation Fund of National Natural Science Foundation of China,School of Clinical Pharmacy, Guangdong Pharmaceutical University
Award Identifier / Grant number: SCP2022-07
-
Author contribution: GW: Conceptualization, implementation, investigation, writing, editing and revising the manuscript. YL: Investigation, editing and revising the manuscript. All authors read and approved the final manuscript.
-
Research funding: This study was supported by Medical Scientific Research Foundation of Guangdong Province of China (A2022182). This study was also supported by Scientific research funding of the First Affiliated Hospital of Guangdong Pharmaceutical University (KYQDJF202016). This study was also supported by National Key Clinical Specialty Construction Project (Clinical Pharmacy) and High Level Clinical Key Specialty (Clinical Pharmacy) in Guangdong Province. This study was also supported by the Construction Project of NMPA Key Laboratory for Technology Research and Evaluation of Pharmacovigilance. This study was also supported by Cultivation Fund of National Natural Science Foundation of China, School of Clinical Pharmacy, Guangdong Pharmaceutical University (SCP2022-07).
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article conflict of interest exits in the submission of this manuscript.
-
Data accessibility statement: The data that support the findings of this study are available from the corresponding author upon reasonable request
References
Abbas, T. and Dutta, A. (2009). p21 in cancer: intricate networks and multiple activities. Nat. Rev. Cancer 9: 400–414, https://doi.org/10.1038/nrc2657.Search in Google Scholar PubMed PubMed Central
Albert, R. (2005). Scale-free networks in cell biology. J. Cell Sci. 118: 4947–4957, https://doi.org/10.1242/jcs.02714.Search in Google Scholar PubMed
Aubin-Frankowski, P.C. and Vert, J.P. (2020). Gene regulation inference from single-cell RNA-seq data with linear differential equations and velocity inference. Bioinformatics 36: 4774–4780, https://doi.org/10.1093/bioinformatics/btaa576.Search in Google Scholar PubMed
Bartlett, T.E., Muller, S., and Diaz, A. (2017). Single-cell Co-expression subnetwork analysis. Sci. Rep. 7: 15066, https://doi.org/10.1038/s41598-017-15525-z.Search in Google Scholar PubMed PubMed Central
Cancer Genome Atlas Research Network. (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455: 1061–1068, https://doi.org/10.1038/nature07385.Search in Google Scholar PubMed PubMed Central
Chan, T.E., Stumpf, M.P.H., and Babtie, A.C. (2017). Gene regulatory network inference from single-cell data using multivariate information measures. Cell Syst 5: 251–267 e253, https://doi.org/10.1016/j.cels.2017.08.014.Search in Google Scholar PubMed PubMed Central
Crow, M., Paul, A., Ballouz, S., Huang, Z.J., and Gillis, J. (2016). Exploiting single-cell expression to characterize co-expression replicability. Genome Biol. 17: 101, https://doi.org/10.1186/s13059-016-0964-6.Search in Google Scholar PubMed PubMed Central
Ferrell, J.E.Jr. (2002). Self-perpetuating states in signal transduction: positive feedback, double-negative feedback and bistability. Curr. Opin. Cell Biol. 14: 140–148, https://doi.org/10.1016/s0955-0674(02)00314-9.Search in Google Scholar PubMed
Gao, Q., Chen, K., Gao, L., Zheng, Y., and Yang, Y.G. (2016). Thrombospondin-1 signaling through CD47 inhibits cell cycle progression and induces senescence in endothelial cells. Cell Death Dis. 7: e2368, https://doi.org/10.1038/cddis.2016.155.Search in Google Scholar PubMed PubMed Central
Hu, X., Hu, Y., Wu, F., Leung, R.W.T., and Qin, J. (2020). Integration of single-cell multi-omics for gene regulatory network inference. Comput. Struct. Biotechnol. J. 18: 1925–1938, https://doi.org/10.1016/j.csbj.2020.06.033.Search in Google Scholar PubMed PubMed Central
Khanna, P., Chung, C.Y., Neves, R.I., Robertson, G.P., and Dong, C. (2014). CD82/KAI expression prevents IL-8-mediated endothelial gap formation in late-stage melanomas. Oncogene 33: 2898–2908, https://doi.org/10.1038/onc.2013.249.Search in Google Scholar PubMed
Kim, S. (2015). Ppcor: an R package for a fast calculation to semi-partial correlation coefficients. Commun. Stat. Appl. Methods 22: 665–674, https://doi.org/10.5351/CSAM.2015.22.6.665.Search in Google Scholar PubMed PubMed Central
Maier, T., Guell, M., and Serrano, L. (2009). Correlation of mRNA and protein in complex biological samples. FEBS Lett. 583: 3966–3973, https://doi.org/10.1016/j.febslet.2009.10.036.Search in Google Scholar PubMed
Malekpour, S.A., Alizad-Rahvar, A.R., and Sadeghi, M. (2020). LogicNet: probabilistic continuous logics in reconstructing gene regulatory networks. BMC Bioinf. 21: 318, https://doi.org/10.1186/s12859-020-03651-x.Search in Google Scholar PubMed PubMed Central
Maniatis, C., Vallejos, C.A., and Sanguinetti, G. (2021). SCRaPL: hierarchical Bayesian modelling of associations in single cell multi-omics data. 2021.2005.2013.443959.10.1101/2021.05.13.443959Search in Google Scholar
Matsumoto, H., Kiryu, H., Furusawa, C., Ko, M.S.H., Ko, S.B.H., Gouda, N., Hayashi, T., and Nikaido, I. (2017). SCODE: an efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation. Bioinformatics 33: 2314–2321, https://doi.org/10.1093/bioinformatics/btx194.Search in Google Scholar PubMed PubMed Central
McQuin, C., Goodman, A., Chernyshev, V., Kamentsky, L., Cimini, B.A., Karhohs, K.W., Doan, M., Ding, L., Rafelski, S.M., Thirstrup, D., et al.. (2018). CellProfiler 3.0: next-generation image processing for biology. PLoS Biol. 16: e2005970, https://doi.org/10.1371/journal.pbio.2005970.Search in Google Scholar PubMed PubMed Central
Moerman, T., Aibar Santos, S., Bravo Gonzalez-Blas, C., Simm, J., Moreau, Y., Aerts, J., and Aerts, S. (2019). GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks. Bioinformatics 35: 2159–2161, https://doi.org/10.1093/bioinformatics/bty916.Search in Google Scholar PubMed
Nguyen, Q.H., Pervolarakis, N., Blake, K., Ma, D., Davis, R.T., James, N., Phung, A.T., Willey, E., Kumar, R., Jabart, E., et al.. (2018). Profiling human breast epithelial cells using single cell RNA sequencing identifies cell diversity. Nat. Commun. 9: 2028, https://doi.org/10.1038/s41467-018-04334-1.Search in Google Scholar PubMed PubMed Central
Ostroff, C. (1993). Comparing correlations based on individual-level and aggregated data. J. Appl. Psychol. 78: 569–582, https://doi.org/10.1037/0021-9010.78.4.569.Search in Google Scholar
Papili Gao, N., Ud-Dean, S.M.M., Gandrillon, O., and Gunawan, R. (2018). SINCERITIES: inferring gene regulatory networks from time-stamped single cell transcriptional expression profiles. Bioinformatics 34: 258–266, https://doi.org/10.1093/bioinformatics/btx575.Search in Google Scholar PubMed PubMed Central
Patel, A.P., Tirosh, I., Trombetta, J.J., Shalek, A.K., Gillespie, S.M., Wakimoto, H., Cahill, D.P., Nahed, B.V., Curry, W.T., Martuza, R.L., et al.. (2014). Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344: 1396–1401, https://doi.org/10.1126/science.1254257.Search in Google Scholar PubMed PubMed Central
Qiu, X., Rahimzamani, A., Wang, L., Ren, B., Mao, Q., Durham, T., McFaline-Figueroa, J.L., Saunders, L., Trapnell, C., and Kannan, S. (2020). Inferring causal gene regulatory networks from coupled single-cell expression dynamics using scribe. Cell Syst 10: 265–274, https://doi.org/10.1016/j.cels.2020.02.003.Search in Google Scholar PubMed PubMed Central
Sanchez-Castillo, M., Blanco, D., Tienda-Luna, I.M., Carrion, M.C., and Huang, Y. (2018). A Bayesian framework for the inference of gene regulatory networks from time and pseudo-time series data. Bioinformatics 34: 964–970, https://doi.org/10.1093/bioinformatics/btx605.Search in Google Scholar PubMed
Satija, R., Farrell, J.A., Gennert, D., Schier, A.F., and Regev, A. (2015). Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33: 495–502, https://doi.org/10.1038/nbt.3192.Search in Google Scholar PubMed PubMed Central
Schindelin, J., Arganda-Carreras, I., Frise, E., Kaynig, V., Longair, M., Pietzsch, T., Preibisch, S., Rueden, C., Saalfeld, S., Schmid, B., et al.. (2012). Fiji: an open-source platform for biological-image analysis. Nat. Methods 9: 676–682, https://doi.org/10.1038/nmeth.2019.Search in Google Scholar PubMed PubMed Central
Sekula, M., Gaskins, J., and Datta, S. (2020). A sparse Bayesian factor model for the construction of gene co-expression networks from single-cell RNA sequencing count data. BMC Bioinf. 21: 361, https://doi.org/10.1186/s12859-020-03707-y.Search in Google Scholar PubMed PubMed Central
Sigal, A., Milo, R., Cohen, A., Geva-Zatorsky, N., Klein, Y., Liron, Y., Rosenfeld, N., Danon, T., Perzov, N., and Alon, U. (2006). Variability and memory of protein levels in human cells. Nature 444: 643–646, https://doi.org/10.1038/nature05316.Search in Google Scholar PubMed
Specht, A.T. and Li, J. (2017). LEAP: constructing gene co-expression networks for single-cell RNA-sequencing data using pseudotime ordering. Bioinformatics 33: 764–766, https://doi.org/10.1093/bioinformatics/btw729.Search in Google Scholar PubMed PubMed Central
Spencer, S.L., Gaudet, S., Albeck, J.G., Burke, J.M., and Sorger, P.K. (2009). Non-genetic origins of cell-to-cell variability in TRAIL-induced apoptosis. Nature 459: 428–432, https://doi.org/10.1038/nature08012.Search in Google Scholar PubMed PubMed Central
Sun, T., Song, D., Li, W.V., and Li, J.J. (2021). scDesign2 A transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured. Genome Biol. 22: 163, https://doi.org/10.1186/s13059-021-02367-2.Search in Google Scholar PubMed PubMed Central
Team. (2020). R.C.R: a language and environment for statistical computing, Vol 1.Search in Google Scholar
Tsoucas, D., Dong, R., Chen, H., Zhu, Q., Guo, G., and Yuan, G.C. (2019). Accurate estimation of cell-type composition from gene expression data. Nat. Commun. 10: 2975, https://doi.org/10.1038/s41467-019-10802-z.Search in Google Scholar PubMed PubMed Central
Van de Sande, B., Flerin, C., Davie, K., De Waegeneer, M., Hulselmans, G., Aibar, S., Seurinck, R., Saelens, W., Cannoodt, R., Rouchon, Q., et al.. (2020). A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat. Protoc. 15: 2247–2276, https://doi.org/10.1038/s41596-020-0336-2.Search in Google Scholar PubMed
Vivian Li, W. and Li, Y. (2021). scLink: inferring sparse gene Co-expression networks from single-cell expression data. Dev. Reprod. Biol. 19: 475–492, https://doi.org/10.1016/j.gpb.2020.11.006.Search in Google Scholar PubMed PubMed Central
Wang, Z., Oron, E., Nelson, B., Razis, S., and Ivanova, N. (2012). Distinct lineage specification roles for NANOG, OCT4, and SOX2 in human embryonic stem cells. Cell Stem Cell 10: 440–454, https://doi.org/10.1016/j.stem.2012.02.016.Search in Google Scholar PubMed
Wang, J., Xia, S., Arand, B., Zhu, H., Machiraju, R., Huang, K., Ji, H., and Qian, J. (2016). Single-cell Co-expression analysis reveals distinct functional modules, Co-regulation mechanisms and clinical outcomes. PLoS Comput. Biol. 12: e1004892, https://doi.org/10.1371/journal.pcbi.1004892.Search in Google Scholar PubMed PubMed Central
Wang, N., Zheng, J., Chen, Z., Liu, Y., Dura, B., Kwak, M., Xavier-Ferrucio, J., Lu, Y.C., Zhang, M., Roden, C., et al.. (2019). Single-cell microRNA-mRNA co-sequencing reveals non-genetic heterogeneity and mechanisms of microRNA regulation. Nat. Commun. 10: 95, https://doi.org/10.1038/s41467-018-07981-6.Search in Google Scholar PubMed PubMed Central
Woodhouse, S., Piterman, N., Wintersteiger, C.M., Gottgens, B., and Fisher, J. (2018). SCNS: a graphical tool for reconstructing executable regulatory networks from single-cell genomic data. BMC Syst. Biol. 12: 59, https://doi.org/10.1186/s12918-018-0581-y.Search in Google Scholar PubMed PubMed Central
Yao, G., Lee, T.J., Mori, S., Nevins, J.R., and You, L. (2008). A bistable Rb-E2F switch underlies the restriction point. Nat. Cell Biol. 10: 476–482, https://doi.org/10.1038/ncb1711.Search in Google Scholar PubMed
Supplementary Material
The online version of this article offers supplementary material (https://doi.org/10.1515/sagmb-2022-0015).
© 2022 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Review Article
- Challenges for machine learning in RNA-protein interaction prediction
- Research Articles
- Distinct characteristics of correlation analysis at the single-cell and the population level
- pwrBRIDGE: a user-friendly web application for power and sample size estimation in batch-confounded microarray studies with dependent samples
- Use of SVM-based ensemble feature selection method for gene expression data analysis
- A robust association test with multiple genetic variants and covariates
- Estimation of the covariance structure from SNP allele frequencies
- GMEPS: a fast and efficient likelihood approach for genome-wide mediation analysis under extreme phenotype sequencing
- Sparse latent factor regression models for genome-wide and epigenome-wide association studies
Articles in the same Issue
- Review Article
- Challenges for machine learning in RNA-protein interaction prediction
- Research Articles
- Distinct characteristics of correlation analysis at the single-cell and the population level
- pwrBRIDGE: a user-friendly web application for power and sample size estimation in batch-confounded microarray studies with dependent samples
- Use of SVM-based ensemble feature selection method for gene expression data analysis
- A robust association test with multiple genetic variants and covariates
- Estimation of the covariance structure from SNP allele frequencies
- GMEPS: a fast and efficient likelihood approach for genome-wide mediation analysis under extreme phenotype sequencing
- Sparse latent factor regression models for genome-wide and epigenome-wide association studies