Abstract
Objectives
The urgency of screening patients for the potential benefit of immunotherapy has prompted explorations of predictive biomarkers or stratification for colorectal cancer (CRC). This study aims to explore the molecular subtypes associated with immune-related genes (IRGs) in CRC and to further elucidate the underlying molecular mechanisms.
Methods
IRGs were downloaded for non-negative matrix factorization (NMF) clustering. Kaplan-Meier survival analyses were used to assess survival differences. Gene set variation analysis was conducted for the exploration of potential biological mechanisms and pathways. The hub differentially expressed genes (DEGs) were selected via machine learning. Immunohistochemistry was performed to determine the expression of the hub gene and the infiltration of immune components.
Results
A preliminary classification of CRC subclasses was conducted through the NMF consensus clustering. Cluster 1 was found to exhibit worse overall survival and progression-free survival than Cluster 2. The two clusters demonstrated different enriched pathways, tumor immune scores, and differential gene expression. The intersection of screening characteristic genes pertained to two genes, SPP1 and GUCA2A. Immunohistochemistry indicated that SPP1 was overexpressed in CRC tissues. Patients with high SPP1 expression demonstrated a poorer prognosis, whether at the transcriptional or protein level. Immunohistochemistry revealed that high SPP1 expression was associated with more neutrophil and less CD8+ T cell infiltration.
Conclusions
NMF algorithm categorized CRC patients into two immunophenotypically distinct clusters with prognostic implications and a unique profile. SPP1 emerged as a crucial indicator for assessing tumor immune status, providing novel insights to enhance prognostic assessments and therapeutic strategies tailored to individual CRC patients.
Introduction
Representing one of the major contributors to cancer-related mortality worldwide, colorectal cancer (CRC) accounts for over 1.9 million new cases and 0.9 million deaths annually [1]. Surgery is the only curative approach for CRC, and patients with advanced inoperable disease often receive a multidisciplinary, combined treatment model based on chemotherapy. A diverse array of therapeutic options can now be contemplated, inter alia, radiotherapy, chemotherapy, immunotherapy, and targeted therapy. Immunotherapy has significantly changed the therapeutic pattern for CRC by manipulating the patient’s intrinsic immune system to combat the tumor. Several inhibitors targeting different checkpoints, such as programmed cell death 1 (PD-1), programmed death-ligand 1 (PD-L1), and cytotoxic T lymphocyte antigen 4 (CTLA4), have been approved by the Food and Drug Administration (FDA) for immunotherapy of CRC [2]. In CRC, however, whether immune checkpoint blockades (ICBs) can be beneficial for CRC patients depends on the microsatellite instability (MSI) or DNA mismatch repair deficiency (dMMR) status of the tumor, while CRC with high MSI or dMMR accounts for only about 15 % of patients [3]. These tumors are capable of activating the immune system and responding to ICBs due to their high mutation burden and neoantigen expression [4].
Despite advancements in treatment for CRC, a high percentage of patients still do not experience tumor shrinkage or survival prolongation, with the average 5-year survivorship of advanced CRC far from satisfactory over the past several decades [5], 6]. Amidst this evolving landscape and expanded array of treatment options, a crucial, yet unanswered question pertains to the screening of patients who would benefit from immunotherapy. There are many novel strategies being explored to optimize immunotherapy for CRC, yet a comprehensive and in-depth understanding of the complex and dynamic immune regulatory mechanisms in the CRC tumor microenvironment (TME), as well as a precise characterization of the immune profile of CRC across different subtypes and stages, is still absent. There is a pressing need for effective and reliable prognostic and predictive biomarkers to guide individualized and precise immunotherapy selection. The American Joint Committee on Cancer (AJCC) TNM staging system has long been recognized as the primary clinical determinant of prognosis [7]. However, tumor heterogeneity due to differences in the molecular biology of CRC often leads to dramatically different outcomes in CRC patients with similar TNM characteristics [8], 9]. The current molecular classification systems for CRC are not sufficient to capture the complexity and variability of the immune microenvironment. Hence, the prediction of clinical outcomes and immunotherapeutic efficacy by a novel immune-associated molecular stratification method that can integrate multiple types of data and provide more accurate and comprehensive information remains an unmet need in CRC.
The use of large-scale gene sequencing technologies has improved the understanding of the genetic heterogeneity and genomic complexity of CRC, resulting in numerous molecular risk stratifications and transcriptomic subtypes characterizations of CRC [10]. Key genes in CRC have been demonstrated to influence the response to targeted drugs [11], 12]. As the understanding of the genomic and transcriptomic subtypes of CRC advances, the co-development of CRC biomarkers and their corresponding targeted agents is bound to have a transformative impact on the precision medicine of CRC. Since there are no generally accepted clinical risk stratification approaches, it may be of great significance to identify a unique molecular stratification for predicting the prognosis and response to immunotherapy in patients with CRC.
The aim of this study is to establish and validate an immune-associated molecular stratification model based on gene expression data from CRC samples. We will also explore the biological and functional characteristics of different immune-associated molecular subtypes and their associations with immune cell infiltration, immune checkpoint expression, tumor mutation burden, and neoantigen load. This study will provide new insights into the immune-related molecular mechanisms of CRC and facilitate the development of personalized immunotherapy strategies for CRC patients.
Materials and Methods
Data acquisition and preprocessing
Transcription RNA sequencing data of CRC patients and the corresponding clinical data were downloaded from the Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/). CRC dataset consisting RNA sequencing data and associated prognostic information were obtained from the GSE39582 dataset of Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE39582). The GSE39582 dataset was chosen for its large sample size, rich clinical annotations, and prior validation in CRC transcriptomic research [13], [14], [15]. Patients without survival time (>30 days) or key clinical parameters or corresponding transcriptome data were excluded from all cohorts. TCGA CRC and GSE39582 CRC data were merged and consolidated by Rank-In. Rank-In is constructed to analyze the consolidated cancer transcriptomics data cross microarray and RNA-seq technologies (http://www.badd-cao.net/rank-in/index.html). A total of 763 CRC patients were ultimately included for subsequent analysis.
NMF clustering
Utilizing the “NMF” R package (https://cran.r-project.org/web/packages/NMF/index.html), non-negative matrix factorization (NMF) clustering was applied to 2,483 immune-related genes (IRGs) downloaded from the ImmPort database (https://immport.niaid.nih.gov). This approach aimed to delineate distinct molecular subtypes based on IRG expression [16]. Prior to clustering, a filtering step was performed. A Cox proportional hazards model was applied using the “survival” R package (https://cran.r-project.org/web/packages/survival/index.html) to identify IRGs with a significant prognostic value (p<0.01) for overall survival (OS). The optimal number of clusters was then determined by the cophenetic correlation coefficient, selecting the point at which its magnitude began to decrease [17]. To assess the distinctness of the identified subtypes, differences in gene expression were visualized via principal component analysis (PCA). Kaplan-Meier survival analysis was subsequently performed to compare the survival outcomes between these subtypes.
Selection of characteristic genes via machine learning algorithms
Differentially expressed genes (DEGs) between the subclasses were identified using the “limma” R package (https://www.bioconductor.org/packages/release/bioc/html/limma.html). This analysis was performed on tumor-only samples, with no normal tissues included. Genes with an adjusted p-value<0.05 and an absolute log2 fold change (|log2FC|)>0.5 were considered significant. To identify candidate hub genes, a multi-step approach was employed after the initial filtration of DEGs. This included applying several machine learning algorithms: Least absolute shrinkage and selection operator (Lasso), Random Forest (RF), Support Vector Machine-Recursive Feature Elimination (SVM-RFE) and eXtreme Gradient Boosting (XGBoost) methods. The diagnostic efficacy of the selected genes was then evaluated using receiver operating characteristic (ROC) curves and their corresponding area under the curve (AUC) values.
Gene set variant analysis and tumor immune microenvironment analysis
Gene Set Variation Analysis (GSVA), an unsupervised and nonparametric gene set enrichment method [18], was employed to generate the corresponding enrichment score of specific pathways using the “GSVA” R package (https://www.bioconductor.org/packages/release/bioc/html/GSVA.html). Utilizing the “limma” R package (https://www.bioconductor.org/packages/release/bioc/html/limma.html), we then conducted a differential expression analysis on the GSVA scores between the identified clusters. Significant pathways were defined by a |log2FC|>0.1 and an adjusted p-value<0.05. To further characterize the TME differences among these clusters, the ESTIMATE (Estimation of STromal and Immune cells in MAlignant Tumour tissues using Expression data, https://bioinformatics.mdanderson.org/estimate/rpackage.html) was employed to calculate stromal and immune scores, while the CIBERSORT (Cell-type Identification By Estimating Relative Subsets Of RNA Transcripts) R packages [19] were used to quantify the relative abundance of 21 immune cell types within each cluster.
Immunohistochemistry (IHC)
CRC tissues were obtained from the Fifth Affiliated Hospital of Sun Yat-sen University with an informed consent waiver, approved by the Ethics Committee of the Fifth Affiliated Hospital of Sun Yat-sen University (K170-1). Specimens were fixed in 10 % formalin and embedded in paraffin. The blocks were sectioned into 5 µm slices and followed by dewaxing and rehydration in xylene and gradient alcohol. Antigen retrieval was performed by heating the samples in a boiler with citrate buffer (pH 6.0). Subsequently, the samples were blocked with goat serum at room temperature for 30 min and incubated with anti-SPP1 antibody (1:100; 22952-1-AP, Proteintech, Chicago, USA), anti-MPO antibody (1:250; 22225-1-AP, Proteintech, Chicago, USA), or anti-CD8 antibody (1:300; bs-0648R, Bioss, Woburn, USA) at 4 °C overnight. The second day, the slices were incubated with horseradish peroxidase (HRP)-coupled goat anti-rabbit secondary antibody (ready to use, MaxVision, Fuzhou, China). Chromogenic reaction was performed using 3,3′-diaminobenzidine (DAB) and hematoxylin. After dehydration and clearing in xylene, the slices were scanned by a slide scanner system (3DHISTECH, Budapest, Hungary). Immunohistochemistry scores (H-scores) (ranging from 0 to 300) were quantified to reflect staining intensity by 3DHISTECH QuantCenter software (3DHISTECH, Budapest, Hungary) according to previous criteria [20]. Infiltrating immune cells were determined by averaging the number of positive cells in the three 400× magnification fields with the highest positive rate at each 100× magnification.
Statistical analysis
Bioinformatics analyses were carried out using R (version 4.1.1) with appropriate packages. Continuous data were tested using a two-tailed Student’s t-test or the Wilcoxon test, while categorical variables were compared by the chi-square test and the Fisher test. The Kaplan-Meier survival curves were tested for statistical differences with the log-rank test. The independent prognostic factors were identified utilizing multivariate and univariate Cox and Cox regression analyses. LASSO regression was performed for overfitting screening. Spearman’s Rank-Order Correlation test was used for correlation analysis, and the correlation coefficient was obtained, and a p-value was corrected for false discovery rate (FDR). Unless otherwise stated, all analyses were considered statistically significant at p<0.05.
Results
NMF classifies CRC into two subclasses
A preliminary classification of CRC subclasses was conducted through the NMF consensus clustering, with two subclasses being stratified reasonably. The clinicopathological parameters between the two clusters are illustrated in Supplementary Table S1. To find the optimal k value, cophenetic correlation coefficients were calculated. The consensus matrix heatmap displayed a distinct and well-defined boundary, underscoring the stable and robust clustering of the samples (Figure 1A). Data showed that the cophenetic correlation coefficient fell sharply when k=2 (Figure 1B). Additionally, the PCA (Figure 1C and D) analyses confirmed that CRC patients in different risk clusters were distributed in two directions. As shown in Figure 1E and F, Cluster 1 (C1) was found to exhibit a worse OS and progression-free survival (PFS) than Cluster 2 (C2) (p<0.001).

Construction of an NMF subtype based on the differentially expressed immune-related genes in the colorectal cancer (CRC) cohort. (A) CRC subclasses classification through NMF consensus clustering. (B) The relationship between cophenetic coefficients concerning the number of clusters (k=2 to 10). (C, D) Plane and three-dimensional principal components analysis (PCA) between the two clusters. (E, F) Kaplan-Meier survival curves showing the (E) OS and (F) PFS of the two molecular subgroups in the CRC cohort.
Differences in potential biological functions, pathways, and immune infiltration between the two clusters
We conducted a more in-depth exploration of potential biological mechanisms and established pathways between C1 and C2 using GSVA, which enabled us to identify the biological pathways perturbed and to establish the relationships between these disrupted processes. Our findings revealed that genes within C1 were enriched in the chemokine and immune function pathway. We proceeded to calculate the absolute abundance of 10 distinct cell populations between the two clusters, encompassing both the TME score and immune cell types. As illustrated in Figure 2A, the results showed that C1 had a significantly higher enrichment in the KRAS signaling, inflammatory response, IL6/JAK/STAT3 signaling, IL2/STAT5 signaling, TNFA/NFKB signaling, TGFβ signaling, and other immune-related signaling pathways, while C2 had a significantly higher enrichment in the MYC, oxidative phosphorylation, fatty acid metabolism, and peroxisome pathways. To further characterize the immune landscape, we compared the TME scores and the infiltration levels of 21 immune cell types across the two clusters. The two clusters exhibited significant differences in stromal, immune, and ESTIMATE scores, with all differences reaching a high level of statistical significance (p<0.001, Figure 2B). Furthermore, a comparative analysis of immune cell infiltration revealed significant variations in the expression levels of key cell populations, including specific CD4 T cell subsets (resting and activated memory), delta gamma T cells, naive and memory B cells, activated NK cells, monocytes, M0, M1, and M2 macrophages, as well as resting and activated dendritic cells and resting mast cells (Figure 2C). C2 exhibited distinct immunophenotypic features, including enrichment in anti-tumor immune pathways, elevated infiltration of activated CD8+ T cells and memory CD4+ T cells, and overall lower stromal scores. These characteristics were associated with improved OS and PFS compared to C1.

Differences in potential biological functions, pathways, and immune infiltration between the two clusters. (A) Heatmap of gene set variation analysis between the molecular subgroups in the colorectal cancer (CRC) cohort. (B) The variations in immune cell infiltration between the molecular subgroups. (C) The abundance of each TME infiltrating cells among the two clusters. *p<0.05, **p<0.01, ***p<0.001.
Gene expression difference between the two clusters
A heat map and a volcano plot were plotted to compare DEGs between the two clusters (Figure 3A). The results showed that NMF classification resulted in 56 significant gene expression changes in the two clusters (Figure 3B, including 43 up-regulated genes and 13 down-regulated genes listed in Supplementary Table S2). Leveraging the entire dataset, we employed four well-established machine learning approaches (SVM-RFE, Lasso, XGBoost, and RF) to identify significant genes related to CRC subclasses, with 31, 30, 20, and 16 genes being yielded, respectively. As illustrated by Venn diagrams in Figure 3C, the intersection of screening characteristic genes via four types of machine learning algorithms and univariate Cox regression analysis among DEGs all pertained to two genes, SPP1 and GUCA2A, implicating their potentially essential role in CRC. Interestingly, it showed that the AUC of SPP1 was 0.902, significantly higher than that of GUCA2A (Figure 3D and E).

Screening for differentially expressed hub genes by machine learning. (A) Heatmap and (B) volcano plot showing the gene expression differences between the two molecular subgroups in the colorectal cancer (CRC) cohort. (C) Venn diagram depicting the intersection of prognosis-related genes across different approaches. (D, E) Receiver operating characteristic curves of the hub gene to identify CRC subclasses.
High expression of hub gene SPP1 was associated with poor CRC prognosis
Our subsequent investigation focused on SPP1. To further validate the expression of SPP1 in CRC patients, tumor specimens derived from patients were collected and used for IHC staining of SPP1. As shown in the representative IHC staining images, SPP1 showed varying degrees of expression in CRC tumors. The overexpression rate of SPP1 was quantified to be 90.62 % (58/64), while 35.94 % (23/64) of the patients showed strong positive expression of SPP1 (Figure 4A). As depicted in Supplementary Table S3, the expression of SPP1 did not demonstrate any significant correlation with clinicopathologic parameters, including age, gender, tumor location, lymph node metastasis, neoadjuvant chemotherapy, AJCC stages, and tumor differentiation (p>0.05). As shown in Supplementary Figure S1, high expression of SPP1 was associated with more advanced staging of CRC. CRC cohorts were divided into high and low expression groups with respect to different SPP1 expression levels. As shown in the Kaplan-Meier survival curves, patients with high SPP1 expression in the tumor tissues demonstrated a poorer prognosis for OS (Figure 4B) and PFS (Figure 4C). As a validation, we also classified CRC patients into different expression groups based on the SPP1 protein expression. As illustrated in Figure 4D, high SPP1 expression was associated with a poor prognosis in CRC (p<0.01). These findings implied that SPP1 may play a crucial role and serve as a prognostic predictor for CRC patients.

High expression of SPP1 was associated with poor colorectal cancer (CRC) prognosis. (A) Representative immunohistochemical staining images of SPP1 in sections derived from CRC patients. Scale bar: below (50 μm); above (200 μm). The bar graph indicates the proportion of patients with different expression levels. (B, C) Kaplan-Meier survival curves for OS and PFS of CRC cohorts with respect to different SPP1 expression levels. (D) Kaplan-Meier survival curves for PFS of CRC patients with respect to different SPP1 protein expression.
SPP1 expression correlates with CRC immune infiltration
To further investigate the relationship between SPP1 and immune components, we first estimated the immune cell fractions in the CRC cohort. Patients with high SPP1 expression were correlated with high InfiltrationScore (Figure 5A). In addition, correlation analysis revealed that SPP1 was positively correlated with neutrophil infiltration, whereas it was negatively correlated with CD8+ T cell infiltration (Figure 5B and C). To further corroborate this finding, IHC was carried out to determine the infiltration of CD8+ T cells and neutrophils in CRC tissues. As indicated by representative immunohistochemical staining, patients with high SPP1 expression were associated with more neutrophil infiltration and less CD8+ T cell infiltration (Figure 5D–G). Taken together, these results suggested that there is an association between SPP1 expression and immune cell fractions.

SPP1 expression correlates with colorectal cancer (CRC) immune infiltration. (A) The abundance of each TME infiltrating cells among the two clusters. (B, C) Spearman correlation between SPP1 expression and immune infiltration in CRC. (D, E) Representative images of immunohistochemical staining of (C) neutrophils and (D) CD8+ T cells in tumor specimens from CRC patients with different SPP1 expression. Scale bars: above, 200 μm; below, 20 μm. (F, G) Immunohistochemistry scores (H-scores) quantification of immunohistochemical staining of (F) neutrophils and (G) CD8+ T cells in tumor specimens from CRC patients with different SPP1 expression. *p<0.05, **p<0.01, ***p<0.001.
Discussion
Despite the unprecedented achievements of cancer immunotherapy in recent years, the benefits are not available to every patient, and the mechanisms that lead to non-response have yet to be fully elucidated [21]. The molecular heterogeneity of CRC predicts not only clinical outcomes but also treatment sensitivity, especially for tumor characterization of the TME and immune profile [8], 9]. TNM staging, relying solely on macroscopic information, falls short in capturing the tumor heterogeneity arising from molecular biological variations in CRC. A “one-size-fits-all” approach to treating tumors of identical pathological and histological type and clinical stage is no longer aligned with the demands of precision medicine [22]. It was only with the widespread adoption of sequencing technologies that we gained insight into the variations in genomic, transcriptomic, and epigenetic characteristics among tumors of the same classification [23]. As an illustration, the CRC Subtyping Consortium has systematically categorized CRC into four consensus molecular subtypes (CMS), incorporating diverse pathological features and molecular subtypes [10], 24]. This nuanced classification acknowledges tumor heterogeneity, thereby opening avenues for personalized prognostic and therapeutic strategies. Recognizing molecular distinctions has, in turn, expedited the adoption of targeted therapeutic regimens, leading to improved outcomes across a wide spectrum of tumors. In our study, we calculated CRC immune-related gene index scores to classify CRC patients into two immune phenotypes with different prognoses. We assessed the prognostic significance of the two clusters and their associations with enriched pathways, tumor immune scores, and differential gene expression. Notably, these features demonstrated robust prognostic value and exhibited strong correlations with clinical metrics. Through effective stratification of CRC patients, key genes selected based on machine learning demonstrated their potential in predicting prognosis and presented a significant promise in advancing the development of targeted drugs and enhancing treatment approaches for CRC patients.
To identify the most crucial genes related to CRC, four supervised machine learning algorithms, SVM-RFE, Lasso, XGBoost, and RF, were utilized here to pinpoint the hub genes associated with CRC. As a proven machine learning algorithm for classification, SVM-RFE facilitates the optimal feature selection through recursive feature elimination [25]. Lasso, as a regression analysis method that concurrently conducts feature selection and regularization, aggregates the relative risks associated with significant variables and maximizes their predictive power. Its efficacy has been demonstrated through successful applications in numerous clinical studies [26], [27], [28]. XGBoost has demonstrated exceptional performance across a spectrum of tasks in recent years, owing to its user-friendly interface, ease of parallelization, and remarkable predictive accuracy [29]. Similarly, RF is a supervised non-parametric machine learning method well-suited for addressing both classification and regression issues, encompassing tasks such as gene screening and disease diagnosis [30]. In this study, using univariate Cox regression, SVM-RFE, Lasso, XGBoost, and RF algorithms, GUCA2A and SPP1 were identified as the potential biomarkers associated with the prognosis of CRC. Each of the four algorithms demonstrates distinct strengths across various computational scenarios. In the context of medical research, particularly oncology, it remains challenging to determine a universally superior method in terms of accuracy and precision. However, integrated machine learning models that combine multiple algorithms have shown promise in generating clinically meaningful predictions and improving prognostic accuracy in cancer studies.
Although formal cross-validation or external test set validation was not implemented in this study, several design strategies were adopted to reduce the risk of overfitting. First, feature selection was based on the intersection of four different machine learning models and univariate Cox analysis, ensuring consistency across multiple independent algorithms. The identification of SPP1 as the most robust gene was further corroborated by its high AUC and biological relevance. Second, SPP1’s prognostic and immunological significance was validated at both transcriptomic and protein levels, providing functional support for its selection. Finally, the integration of transcriptomic data from two independent cohorts (TCGA and GSE39582) across different platforms (RNA-seq and microarray) offers an implicit level of cross-dataset generalizability [31]. Although we integrated TCGA and GSE39582 datasets using the Rank-In method to improve robustness, the clustering was performed on the merged dataset rather than on completely separate cohorts. This is also one of the limitations of our study. Future studies should validate the reproducibility and clinical relevance of these subtypes in independent datasets to confirm their generalizability and potential for clinical translation.
SPP1 (secreted phosphoprotein 1), also known as osteopontin, is implicated in a number of physiological and pathological events, including tissue remodeling, inflammation, carcinogenesis and autoimmunity [32]. A growing body of evidence has demonstrated that SPP1 is overexpressed in a variety of tumors and is associated with poor tumor prognosis [32], [33], [34], [35]. SPP1 has also been found to activate intracellular pathways leading to the modulation of immune cells in the tumor immune microenvironment. In our study, we provided a preliminary dissection of the potential mechanisms underlying the role of SPP1 in the CRC immune microenvironment. The study suggested that SPP1 expression significantly affects the abundance of immune cell infiltration. In particular, our analysis revealed that high SPP1 expression was associated with decreased CD8+ T cell infiltration and vice versa in neutrophils. One intriguing finding that was found to be significantly associated with SPP1 was the infiltration of macrophages (Figure 5). This study supported evidence from previous observations, which showed that SPP1 macrophages act as tumor-specific macrophages contributing to the poor prognosis of CRC patients through pro-angiogenesis and formation of an immune-excluded desmoplastic structure [36], 37].
Although direct mapping to CMS was not performed in this study, the immune and stromal characteristics observed in our clusters suggest potential correspondence: Cluster 1 resembles features of CMS1/CMS4 (immune-high and mesenchymal), while Cluster 2 aligns more closely with CMS2/CMS3 (immune-low, epithelial) [10]. Future studies incorporating CMS annotations may further validate and contextualize our immune-based classification for clinical translation. Collectively, the infiltration of immune cells in CRC serves as an indicator of the immune status of patients and could, to some extent, potentially explain the disparity in survival outcomes observed between the two groups. All these results revealed the relevance of SPP1 to the elaboration of immune cells in the CRC microenvironment and highlighted the potential value of identifying and developing therapeutic strategies against this target and other molecules involved in its interactions to overcome immunosuppression and increase the response of CRC to immunotherapy.
Although GUCA2A was also identified as a candidate gene, its relatively lower AUC and limited prognostic impact led us to focus on SPP1 for downstream validation. GUCA2A (Guanylate cyclase activator 2A), a peptide hormone secreted by intestinal epithelial cells, plays a key regulatory role in guanylate cyclase 2C (GUCY2C) signaling through both autocrine and paracrine pathways [38], [39], [40], [41]. Loss of GUCA2A expression at the mRNA and protein levels represents one of the most frequently observed molecular alterations in CRC [41]. Silencing of its receptor, GUCY2C, has been associated with tumorigenic processes such as cellular transformation, unchecked proliferation, and genomic instability [42]. Despite its apparent relevance, current research on GUCA2A remains limited, and the underlying mechanisms through which it contributes to CRC pathogenesis have yet to be fully elucidated. Further exploration of GUCA2A’s functional role in CRC immune modulation may be warranted.
Despite the encouraging results obtained, there are several critical issues that require attention in the present study. One of the limitations encountered in our study stems from the extended median survival of CRC patients, coupled with the majority of enrolled patients for immunohistochemical validation being hospitalized in 2019. Consequently, we were unable to assess the long-term survival of these patients, which constitutes a noteworthy limitation. In addition, the data analyzed in our study were sourced from a public database. However, it is important to emphasize that further validation through multi-center clinical trials is imperative for establishing the robustness and generalizability of our findings in future research. Although only one external dataset (GSE39582) was used, future studies incorporating multiple independent cohorts are warranted to further validate and refine the proposed immune-associated classification. While our study has shed light on the potential role of the key genes in CRC, further investigation through basic experimental exploration is warranted to gain a deeper understanding of their functional significance and mechanisms of action.
Conclusions
We employed the NMF algorithm to categorize CRC patients into two immunophenotypically distinct clusters with prognostic implications and validated the unique profile associated with each. The identification of SPP1, a pivotal prognostic gene identified through machine learning, emerged as a crucial indicator for assessing tumor immune status, providing novel insights to enhance prognostic assessments and therapeutic strategies tailored to individual CRC patients.
Funding source: National Natural Science Foundation of China
Award Identifier / Grant number: 82073283
Funding source: Science and Technology Project of Guangdong Province
Award Identifier / Grant number: 2022A1515012149
Funding source: China Postdoctoral Science Foundation
Award Identifier / Grant number: 2024M763794
Funding source: Postdoctoral Fellowship Program of CPSF
Award Identifier / Grant number: GZC20242103
-
Research ethics: Approval for this retrospective cohort study was obtained from the Ethics Committee of the Fifth Affiliated Hospital of Sun Yat-sen University (K170-1) with an informed consent waiver.
-
Informed consent: Given the retrospective nature of the study and the use of de-identified data, the requirement for informed consent was waived by the Institutional Review Board.
-
Author contributions: Yitai Xiao: Methodology, Validation, Data curation, Writing – original draft; Guixiong Zhang: Conceptualization, Validation, Writing – review & editing; Yingqi Xiao: Methodology, Data curation; Zhihong Li: Methodology, Validation; Hang Liu: Methodology, Validation; Longjun He: Conceptualization, Funding acquisition; Jianjun Li: Conceptualization, Project administration, Writing – review & editing.
-
Use of Large Language Models, AI and Machine Learning Tools: None declared.
-
Conflict of interest: The authors declare no competing financial interests.
-
Research funding: This work was supported by the Postdoctoral Fellowship Program of CPSF under Grant Number of GZC20242103, the China Postdoctoral Science Foundation under Grant Number of 2024M763794, the National Natural Science Foundation of China (NSFC) under Grant Number of 82073283, and the Science and Technology Project of Guangdong Province under Grant Number of 2022A1515012149.
-
Data availability: The raw data can be obtained on request from the corresponding author. The validity of this study has been confirmed through the submission of raw research data and records to the Research Data Deposit (www.researchdata.org.cn), a comprehensive platform facilitating the documentation of research data for medical researchers.
References
1. Bray, F, Laversanne, M, Sung, H, Ferlay, J, Siegel, RL, Soerjomataram, I, et al.. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229–63. https://doi.org/10.3322/caac.21834.Search in Google Scholar PubMed
2. Johdi, NA, Sukor, NF. Colorectal cancer immunotherapy: options and strategies. Front Immunol 2020;11:1624. https://doi.org/10.3389/fimmu.2020.01624.Search in Google Scholar PubMed PubMed Central
3. Ganesh, K Optimizing immunotherapy for colorectal cancer. Nat Rev Gastroenterol Hepatol 2022;19:93–4. https://doi.org/10.1038/s41575-021-00569-4.Search in Google Scholar PubMed PubMed Central
4. Zaborowski, AM, Winter, DC, Lynch, L. The therapeutic and prognostic implications of immunobiology in colorectal cancer: a review. Br J Cancer 2021;125:1341–9. https://doi.org/10.1038/s41416-021-01475-x.Search in Google Scholar PubMed PubMed Central
5. Shin, AE, Giancotti, FG, Rustgi, AK. Metastatic colorectal cancer: mechanisms and emerging therapeutics. Trends Pharmacol Sci 2023;44:222–36. https://doi.org/10.1016/j.tips.2023.01.003.Search in Google Scholar PubMed PubMed Central
6. Singh, M, Morris, VK, Bandey, IN, Hong, DS, Kopetz, S. Advancements in combining targeted therapy and immunotherapy for colorectal cancer. Trends Cancer 2024;10:598–609. https://doi.org/10.1016/j.trecan.2024.05.001.Search in Google Scholar PubMed
7. Weiser, MR. AJCC 8th edition: colorectal cancer. Ann Surg Oncol 2018;25:1454–5. https://doi.org/10.1245/s10434-018-6462-1.Search in Google Scholar PubMed
8. Buikhuisen, JY, Torang, A, Medema, JP. Exploring and modelling colon cancer inter-tumour heterogeneity: opportunities and challenges. Oncogenesis 2020;9:66. https://doi.org/10.1038/s41389-020-00250-6.Search in Google Scholar PubMed PubMed Central
9. Marisa, L, Blum, Y, Taieb, J, Ayadi, M, Pilati, C, Le Malicot, K, et al.. Intratumor CMS heterogeneity impacts patient prognosis in localized Colon cancer. Clin Cancer Res 2021;27:4768–80. https://doi.org/10.1158/1078-0432.ccr-21-0529.Search in Google Scholar
10. Guinney, J, Dienstmann, R, Wang, X, de Reynies, A, Schlicker, A, Soneson, C, et al.. The consensus molecular subtypes of colorectal cancer. Nat Med 2015;21:1350–6. https://doi.org/10.1038/nm.3967.Search in Google Scholar PubMed PubMed Central
11. Dienstmann, R, Vermeulen, L, Guinney, J, Kopetz, S, Tejpar, S, Tabernero, J. Consensus molecular subtypes and the evolution of precision medicine in colorectal cancer. Nat Rev Cancer 2017;17:79–92. https://doi.org/10.1038/nrc.2016.126.Search in Google Scholar PubMed
12. Kyrochristos, ID, Roukos, DH. Comprehensive intra-individual genomic and transcriptional heterogeneity: evidence-based colorectal cancer precision medicine. Cancer Treat Rev 2019;80:101894. https://doi.org/10.1016/j.ctrv.2019.101894.Search in Google Scholar PubMed
13. Li, ZW, Sun, B, Gong, T, Guo, S, Zhang, J, Wang, J, et al.. GNAI1 and GNAI3 reduce colitis-associated tumorigenesis in mice by blocking IL6 signaling and down-regulating expression of GNAI2. Gastroenterology 2019;156:2297–312. https://doi.org/10.1053/j.gastro.2019.02.040.Search in Google Scholar PubMed PubMed Central
14. Marisa, L, de Reynies, A, Duval, A, Selves, J, Gaub, MP, Vescovo, L, et al.. Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value. PLoS Med 2013;10:e1001453. https://doi.org/10.1371/journal.pmed.1001453.Search in Google Scholar PubMed PubMed Central
15. Torang, A, Kirov, AB, Lammers, V, Cameron, K, Wouters, VM, Jackstadt, RF, et al.. Enterocyte-like differentiation defines metabolic gene signatures of CMS3 colorectal cancers and provides therapeutic vulnerability. Nat Commun 2025;16:264. https://doi.org/10.1038/s41467-024-55574-3.Search in Google Scholar PubMed PubMed Central
16. Possemato, R, Marks, KM, Shaul, YD, Pacold, ME, Kim, D, Birsoy, K, et al.. Functional genomics reveal that the serine synthesis pathway is essential in breast cancer. Nature 2011;476:346–50. https://doi.org/10.1038/nature10350.Search in Google Scholar PubMed PubMed Central
17. Brunet, JP, Tamayo, P, Golub, TR, Mesirov, JP. Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci U S A 2004;101:4164–9. https://doi.org/10.1073/pnas.0308531101.Search in Google Scholar PubMed PubMed Central
18. Hanzelmann, S, Castelo, R, Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 2013;14:7. https://doi.org/10.1186/1471-2105-14-7.Search in Google Scholar PubMed PubMed Central
19. Chen, B, Khodadoust, MS, Liu, CL, Newman, AM, Alizadeh, AA. Profiling tumor infiltrating immune cells with CIBERSORT. Methods Mol Biol 2018;1711:243–59. https://doi.org/10.1007/978-1-4939-7493-1_12.Search in Google Scholar PubMed PubMed Central
20. Xiao, Y, Zhang, G, Wang, L, Liang, M. Exploration and validation of a combined immune and metabolism gene signature for prognosis prediction of colorectal cancer. Front Endocrinol 2022;13:1069528. https://doi.org/10.3389/fendo.2022.1069528.Search in Google Scholar PubMed PubMed Central
21. Ganesh, K, Stadler, ZK, Cercek, A, Mendelsohn, RB, Shia, J, Segal, NH, et al.. Immunotherapy in colorectal cancer: rationale, challenges and potential. Nat Rev Gastroenterol Hepatol 2019;16:361–75. https://doi.org/10.1038/s41575-019-0126-x.Search in Google Scholar PubMed PubMed Central
22. Zheng, X, Ma, Y, Bai, Y, Huang, T, Lv, X, Deng, J, et al.. Identification and validation of immunotherapy for four novel clusters of colorectal cancer based on the tumor microenvironment. Front Immunol 2022;13:984480. https://doi.org/10.3389/fimmu.2022.984480.Search in Google Scholar PubMed PubMed Central
23. Palucka, AK, Coussens, LM. The basis of oncoimmunology. Cell 2016;164:1233–47. https://doi.org/10.1016/j.cell.2016.01.049.Search in Google Scholar PubMed PubMed Central
24. Dagogo-Jack, I, Shaw, AT. Tumour heterogeneity and resistance to cancer therapies. Nat Rev Clin Oncol 2018;15:81–94. https://doi.org/10.1038/nrclinonc.2017.166.Search in Google Scholar PubMed
25. Ding, C, Bao, TY, Huang, HL. Quantum-inspired support vector machine. IEEE Trans Neural Netw Learn Syst 2022;33:7210–22. https://doi.org/10.1109/tnnls.2021.3084467.Search in Google Scholar PubMed
26. Reichling, C, Taieb, J, Derangere, V, Klopfenstein, Q, Le Malicot, K, Gornet, JM, et al.. Artificial intelligence-guided tissue analysis combined with immune infiltrate assessment predicts stage III colon cancer outcomes in PETACC08 study. Gut 2020;69:681–90. https://doi.org/10.1136/gutjnl-2019-319292.Search in Google Scholar PubMed PubMed Central
27. Zhang, G, Xiao, Y, Zhang, X, Fan, W, Zhao, Y, Wu, Y, et al.. Dissecting a hypoxia-related angiogenic gene signature for predicting prognosis and immune status in hepatocellular carcinoma. Front Oncol 2022;12:978050. https://doi.org/10.3389/fonc.2022.978050.Search in Google Scholar PubMed PubMed Central
28. Zhang, G, Xiao, Y, Tan, J, Liu, H, Fan, W, Li, J. Elevated SLC1A5 associated with poor prognosis and therapeutic resistance to transarterial chemoembolization in hepatocellular carcinoma. J Transl Med 2024;22:543. https://doi.org/10.1186/s12967-024-05298-1.Search in Google Scholar PubMed PubMed Central
29. Ma, B, Meng, F, Yan, G, Yan, H, Chai, B, Song, F. Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data. Comput Biol Med 2020;121:103761. https://doi.org/10.1016/j.compbiomed.2020.103761.Search in Google Scholar PubMed
30. Wang, H, Yang, F, Luo, Z. An experimental study of the intrinsic stability of random forest variable importance measures. BMC Bioinf 2016;17:60. https://doi.org/10.1186/s12859-016-0900-5.Search in Google Scholar PubMed PubMed Central
31. Kim, Z, Lee, J, Yoon, YE, Yun, JW. Unveiling prognostic RNA biomarkers through a multi-cohort study in colorectal cancer. Int J Mol Sci 2024;25:3317. https://doi.org/10.3390/ijms25063317.Search in Google Scholar PubMed PubMed Central
32. Shi, L, Wang, X. Role of osteopontin in lung cancer evolution and heterogeneity. Semin Cell Dev Biol 2017;64:40–7. https://doi.org/10.1016/j.semcdb.2016.08.032.Search in Google Scholar PubMed
33. Zhao, H, Chen, Q, Alam, A, Cui, J, Suen, KC, Soo, AP, et al.. The role of osteopontin in the progression of solid organ tumour. Cell Death Dis 2018;9:356. https://doi.org/10.1038/s41419-018-0391-6.Search in Google Scholar PubMed PubMed Central
34. Wei, R, Wong, J, Lyu, P, Xi, X, Tong, O, Zhang, SD, et al.. In vitro and clinical data analysis of osteopontin as a prognostic indicator in colorectal cancer. J Cell Mol Med 2018;22:4097–105. https://doi.org/10.1111/jcmm.13686.Search in Google Scholar PubMed PubMed Central
35. Kahles, F, Findeisen, HM, Bruemmer, D. Osteopontin: a novel regulator at the cross roads of inflammation, obesity and diabetes. Mol Metab 2014;3:384–93. https://doi.org/10.1016/j.molmet.2014.03.004.Search in Google Scholar PubMed PubMed Central
36. Qi, J, Sun, H, Zhang, Y, Wang, Z, Xun, Z, Li, Z, et al.. Single-cell and spatial analysis reveal interaction of FAP(+) fibroblasts and SPP1(+) macrophages in colorectal cancer. Nat Commun 2022;13:1742. https://doi.org/10.1038/s41467-022-29366-6.Search in Google Scholar PubMed PubMed Central
37. Liu, Y, Zhang, Q, Xing, B, Luo, N, Gao, R, Yu, K, et al.. Immune phenotypic linkage between colorectal cancer and liver metastasis. Cancer Cell 2022;40:424–37. https://doi.org/10.1016/j.ccell.2022.02.013.Search in Google Scholar PubMed
38. Jalali, P, Aliyari, S, Etesami, M, Saeedi, NM, Taher, S, Kavousi, K, et al.. GUCA2A dysregulation as a promising biomarker for accurate diagnosis and prognosis of colorectal cancer. Clin Exp Med 2024;24:251. https://doi.org/10.1007/s10238-024-01512-y.Search in Google Scholar PubMed PubMed Central
39. Zhang, H, Du, Y, Wang, Z, Lou, R, Wu, J, Feng, J. Integrated analysis of oncogenic networks in colorectal cancer identifies GUCA2A as a molecular marker. Biochem Res Int 2019;2019:6469420–13. https://doi.org/10.1155/2019/6469420.Search in Google Scholar PubMed PubMed Central
40. Pattison, AM, Merlino, DJ, Blomain, ES, Waldman, SA. Guanylyl cyclase C signaling axis and colon cancer prevention. World J Gastroenterol 2016;22:8070–7. https://doi.org/10.3748/wjg.v22.i36.8070.Search in Google Scholar PubMed PubMed Central
41. Wilson, C, Lin, JE, Li, P, Snook, AE, Gong, J, Sato, T, et al.. The paracrine hormone for the GUCY2C tumor suppressor, guanylin, is universally lost in colorectal cancer. Cancer Epidemiol Biomarkers Prev 2014;23:2328–37. https://doi.org/10.1158/1055-9965.epi-14-0440.Search in Google Scholar
42. Li, P, Schulz, S, Bombonati, A, Palazzo, JP, Hyslop, TM, Xu, Y, et al.. Guanylyl cyclase C suppresses intestinal tumorigenesis by restricting proliferation and maintaining genomic integrity. Gastroenterology 2007;133:599–607. https://doi.org/10.1053/j.gastro.2007.05.052.Search in Google Scholar PubMed
Supplementary Material
This article contains supplementary material (https://doi.org/10.1515/oncologie-2025-0248).
© 2025 the author(s), published by De Gruyter on behalf of Tech Science Press (TSP)
This work is licensed under the Creative Commons Attribution 4.0 International License.