Prediction of muscular-invasive bladder cancer using multi-view fusion self-distillation model based on 3D T2-Weighted images

Yuan Zou; Jie Yu; Lingkai Cai; Chunxiao Chen; Ruoyu Meng; Yueyue Xiao; Xue Fu; Xiao Yang; Peikun Liu; Qiang Lu

doi:10.1515/bmt-2024-0333

Article Open Access

Prediction of muscular-invasive bladder cancer using multi-view fusion self-distillation model based on 3D T2-Weighted images

Yuan Zou , Jie Yu , Lingkai Cai , Chunxiao Chen , Ruoyu Meng , Yueyue Xiao , Xue Fu , Xiao Yang , Peikun Liu and Qiang Lu

Published/Copyright: November 6, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Biomedical Engineering / Biomedizinische Technik Volume 70 Issue 1

Abstract

Objectives

Accurate preoperative differentiation between non-muscle-invasive bladder cancer (NMIBC) and muscle-invasive bladder cancer (MIBC) is crucial for surgical decision-making in bladder cancer (BCa) patients. MIBC diagnosis relies on the Vesical Imaging-Reporting and Data System (VI-RADS) in clinical using multi-parametric MRI (mp-MRI). Given the absence of some sequences in practice, this study aims to optimize the existing T2-weighted imaging (T2WI) sequence to assess MIBC accurately.

Methods

We analyzed T2WI images from 615 BCa patients and developed a multi-view fusion self-distillation (MVSD) model that integrates transverse and sagittal views to classify MIBC and NMIBC. This 3D image classification method leverages z-axis information from 3D MRI volume, combining information from adjacent slices for comprehensive features extraction. Multi-view fusion enhances global information by mutually complementing and constraining information from the transverse and sagittal planes. Self-distillation allows shallow classifiers to learn valuable knowledge from deep layers, boosting feature extraction capability of the backbone and achieving better classification performance.

Results

Compared to the performance of MVSD with classical deep learning methods and the state-of-the-art MRI-based BCa classification approaches, the proposed MVSD model achieves the highest area under the curve (AUC) 0.927 and accuracy (Acc) 0.880, respectively. DeLong’s test shows that the AUC of the MVSD has statistically significant differences with the VGG16, Densenet, ResNet50, and 3D residual network. Furthermore, the Acc of the MVSD model is higher than that of the two urologists.

Conclusions

Our proposed MVSD model performs satisfactorily distinguishing between MIBC and NMIBC, indicating significant potential in facilitating preoperative BCa diagnosis for urologists.

Keywords: deep learning; bladder cancer; multi-view; self-distillation; T2-weighted image

Introduction

Bladder cancer (BCa) is the most prevalent malignant tumor of the urinary system and is the eighth most lethal cancer [1], 2]. According to statistics released by the World Health Organization in 2021, there were approximately 400,000 new cases of BCa worldwide each year, resulting in up to 180,000 deaths. In clinical diagnosis, approximately 75 % of patients are diagnosed with non-muscle-invasive BCa (NMIBC), while the remaining 25 % of patients have muscle-invasive BCa (MIBC) [3]. NMIBC and MIBC patients have distinct therapeutic objectives and strategies [4], 5]. For patients with NMIBC, the primary aim of treatment is to mitigate the risk of cancer recurrence and prevent progression to a more advanced stage. For those with MIBC, therapeutic decisions regarding whether the bladder should be removed or preserved to improve the possibility of curing while minimizing the impact on survival are critical [6]. Therefore, the accurate preoperative distinction between MIBC and NMIBC is pivotal for making treatment decisions in BCa patients.

Magnetic resonance imaging (MRI) is a non-invasive diagnostic method, offers superior soft-tissue resolution, and holds advantages over computed tomography (CT) in the preoperative staging diagnosis of BCa [7]. In 2018, the Vesical Imaging-Reporting and Data System (VI-RADS) was proposed to judge muscle invasion of BCa by combining three sequences: T2-weighted imaging (T2WI), diffusion-weighted imaging (DWI), and Dynamic contrast-enhanced imaging (DCE). However, the VI-RADS has limitations as it relies on the experience of urologists and has fuzzy diagnostic thresholds for muscle-invasive BCa [8], 9]. Furthermore, ensuring the complete acquisition of multiparametric MRI (mp-MRI) sequences, including T2WI, DWI, and DCE, for every patient during data collection can be challenging due to the high cost of mp-MRI and potential patient allergies to contrast agents. Therefore, it is worth paying attention to developing a technique that leverages the existing T2WI sequence to accurately assess the invasion situation of BCa.

Some researchers have used machine learning techniques based on radiomics for BCa prediction [10], [11], [12]. However, the feature extraction process requires radiologists to label a large amount of lesion information, and feature selection could be challenging and time-consuming. Deep Learning (DL) methods are well known for their strong feature extraction capabilities and have shown great potential in BCa diagnosis, even surpassing professional urologists. Zhang et al. [13] proposed a DL model based on CT images, which exhibited promising performance across the internal validation cohort (AUC:0.861) and the external validation cohort (AUC:0.791). Zhou et al. [14] proposed PENet, an innovative deep neural network that incorporates prior evidence to classify MRI images for BCa staging, aligning with clinical knowledge. In our previous work, we developed a multi-task BCa muscular invasion prediction (MBMIP) model based on T2WI images [15]. The model achieved AUC values of 0.876, 0.856, and 0.628 in the retrospective, prospective, and multi-center test sets, respectively.

DL-based BCa prediction primarily relies on 2D transverse view slice-based models due to the complexity of 3D models and the scarcity of samples. Unfortunately, relying solely on a transverse view may not provide complete information about the area of the lesion, and urologists also combine coronal and sagittal images in their clinical diagnosis. Another limitation of using 2D slices to train the model is the reliance on the urologist’s experience to judge whether each slice reflects the muscle invasion of BCa, which may introduce potential errors. Although 3D models outperform 2D models, they require substantial computational and time costs. Many model compression and acceleration methods have been proposed to solve this problem, such as pruning [16] and knowledge distillation [17]. Nonetheless, knowledge distillation has limitations in knowledge transfer and teacher model selection. Self-distillation proposed by Zhang et al. [18], 19] enables accurate and compact networks with reduced training overhead.

To address the aforementioned limitations and challenges in existing approaches, this paper proposes a multi-view fusion self-distillation (MVSD) model that integrates the transverse view and sagittal view to accurately assess the invasion of BCa. The main contributions can be summarized as follows:

3D MRI volume data containing multiple slices can make full use of the information of adjacent slices and allows pathology results to be used as a gold label for each patient, without the need for doctors to judge each slice.
Transverse and sagittal views can restrict and complement each other, so as to consider space consistency and obtain global spatial information to facilitate the fusing process.
Self-distillation is a one-stage training method where the teacher model and student model are deployed on the same network. Knowledge is distilled between different layers of the network to achieve better results and acceleration.

Methods

Patient cohorts

The retrospective study involved 615 BCa patients recruited from the Hospital between February 2012 and September 2022. The study was approved by the ethics committee of the hospital (ethics review number: 2021-SR-409).

The inclusion criteria for patients with BCa were as follows: (i) Patients who underwent MRI before transurethral resection of bladder tumor (TURBT) or bladder biopsy; (ii) Patients who had clear pathological findings as a criterion for classification. Patients were excluded if: (i) Patients received surgical treatment such as neoadjuvant chemotherapy and radical cystectomy before MRI; (ii) There were no clear transverse and sagittal T2WI slices that meet the diagnostic requirements of the urologists; (iii) No distinct or evident tumor is identified.

The 615 eligible BCa patients recruited in this study included 443 NMIBC and 172 MIBC. All 615 cases were randomly divided into five groups to use a five-fold cross-validation approach. All the transverse and sagittal images were jointly reviewed by two experienced urologists. Any disagreement was resolved through consensus. In baseline characteristics of the patient, a one-way analysis of variance was used to analyze the patient’s age and chi-square tests were used to analyze the patient’s gender and pathological stage. The results showed that there were no significant differences (p>0.05) in age, gender, and pathological staging among the five folds.

Proposed model architecture

As shown in Figure 1, the MVSD model consists of a multi-view fusion architecture and a self-distillation framework. The multi-view fusion architecture employs ResNet50 [20] as the backbone and incorporates two branches, receiving input from the transverse and sagittal views, respectively. The feature vectors generated by the two branches are concatenated and subsequently fed into the final teacher classifier. The self-distillation framework divides the ResNet50 backbone into four parts based on the ResBlock structure, with three student classifiers assigned to the shallow layers and one teacher classifier assigned to the deepest layer. In the training stage, knowledge is distilled from the teacher classifier to the student classifier, which improves the final classification accuracy without changing the backbone structure. The teacher classifier not only outputs classification results but also provides the probability for each category, known as soft labels. In knowledge distillation, the student classifier learns by minimizing the difference with these soft labels. Compared to one-hot encoded hard labels, soft labels contain richer relational information between categories. The remaining part of this section will provide a detailed introduction to the proposed MVSD model.

Figure 1:

The overall architecture of the MVSD model.

Multi-view fusion method

The MVSD model proposed a multi-view fusion architecture that integrates two kinds of inputs generated from the transverse view and sagittal view of MRI data for fusion. It consists of two parallel branches, each receiving distinct plane slice images, with an improved 3D ResNet50 serving as the backbone. Take the transverse branch as an example, the input size is 2 × 12 × 224 × 224, where 2 represents the number of channels, 12 represents the number of slices, and 224 represents the height and width of the slice. The input is fed into a 3D convolutional layer, followed by a batch normalization layer and a rectified linear unit activation function, and finally a maximum pooling layer. Then four ResBlocks consisting of a Conv Block and several Identity Blocks complete feature extraction with a total of 49 layers of convolution. The multi-channel feature map of four ResBlocks outputs is transformed into a 2048-dimensional feature vector using 3D adaptive averaging pooling. The sagittal branch follows the same structure as described above.

The feature vectors generated by the two branches are concatenated and then fed to the teacher classifier containing only one fully connected layer to obtain the final prediction result. Related studies have demonstrated that in multi-view fusion tasks, the concatenation method employed in this research outperforms other fusion methods, such as mean-pooling and elementwise max-pooling [21], 22]. This is attributed to the ability of the feature concatenation method to preserve more information from multiple views. By combining the global information from two different views, they complement and constrain each other, leading to more accurate classification results.

To accelerate network convergence and improve performance, four ResBlocks load parameters from the MedicalNet [23] pre-trained model in the way of transfer learning and continued training. In addition, weights are shared between the four ResBlocks of the two branches, allowing the network to extract more information faster without adding a lot of computation.

Self-distillation framework

The proposed self-distillation framework shown in Figure 2 extracts intermediate feature maps from each shallow ResBlock and feeds them into the student classifier without modifying the two-branch ResNet50 backbone structure. In self-distillation, deep classifiers are considered teacher models, while shallow classifiers are considered student models. The teacher model guides the learning of the student model through the Kullback-Leibler (KL) divergence loss.

Figure 2:

The proposed self-distillation framework.

Given N training samples X={(x1x2j)}j=1N, where x1j and x2j ∈ RN represent the transverse view and sagittal view inputs respectively. Let F:RN↦RM be the mapping from X^j to its corresponding predicted value P^j and F=f1·f2·f3·f4, where f_i represents the intermediate mapping between the various sub-nodes of the backbone structure that have been prematurely exited and sent to the student classifier C_i (i=1, 2,3). Assuming that ω1i and ω2i are the intermediate feature maps at the sub-nodes of the two branches, then we have the following form:

(1)ω11=f1(x1),

(2)ω12=ω11·f2=f1(x1)·f2,

(3)ω13=ω12·f3=f1(x1)·f2·f3.

ω2i is calculated in the same way as ω1i. The structure of the proposed student classifier C₁ is shown in the lower left corner of Figure 1. In each feature extraction stage i, ω1i and ω2i derived from the sub-nodes are respectively sent into three alignment layers composed of 3D convolution, batch normalization and rectified linear unit activation function, and finally concatenated after average pooling. Fⁱ is represented as the aligned fusion-feature map obtained by ω1i and ω2i after the above operations, which is defined as.

(4)Fi=gi(ω1i,ω2i ),

where gⁱ is the feature alignment fusion module. The role of gⁱ is to fuse ω1i and ω2i into Fⁱ whose feature size is the same as the final reference feature F⁴ sent to the teacher classifier C₄ at each stage i. Let the output of Fⁱ after the full connection layer be Ωⁱ. A softmax layer with temperature is set to smooth Ωⁱ.

(5)q(Ωki,T)=exp (Ωki/T)∑exp (Ωi/T)

Here q(Ωki,T) is the k_th class probability of student classifier C_i (i=1, 2, 3). The temperature of distillation T, generally an integer greater than one, reduces the discrepancy in predicted values between target and non-target classes [24]. A lager T results in a smoother probability distribution for predicted labels. Let pⁱ represent the output of each classifier after the softmax layer, in other words, the probability q(Ωki,T) of all classes. Specifically, p^t,which is utilized to supervise pⁱ, represents the output after the softmax layer in the teacher classifier C₄.

During the training period, the supervision of pⁱ comes not only from the label, but also from the p^t, achieving the purpose of knowledge distillation. By learning the prediction results of the teacher classifier C₄, the student classifier C_i (i=1, 2, 3) can make the shallow feature extractors obtain better fused features. In the inference period, all student classifiers are dropped, leaving only the teacher classifier, so no additional computational expense is required.

Loss functions

A total of three kinds of loss are introduced in the training stage of the MVSD model. The first loss is the cross-entropy loss between the output pⁱ of each classifier after the softmax layer and corresponding label y. LCE stands for the supervision provided by the true labels on both the teacher classifier and each student classifier, which has the following form:

(6)LCE=∑i=14CrossEntropy(pi,y).

The second loss LKL is the KL divergence distillation loss between each student classifier’s pⁱ (i=1, 2,3) and the teacher classifier’s p^t. LKL is a measurement of the difference between predicted probability distributions, with the purpose of distilling knowledge from the teacher model to the student model by narrowing the degree of difference between the distributions of pⁱ and p^t. LKL has the following form:

(7)LKL=∑i=13KL(pi,pt).

Inspired by center loss in Deep Face Recognition [25], the MVSD model introduces a third loss, LCEN. Cross entropy loss, which is commonly used in image classification tasks, enabling the features to be accurately classified by the classifier. However, when the features from different classes lack obvious distinctions, the classifier may struggle to determine the class of the sample. In other words, cross entropy loss can only make the features of different classes more separable. LCEN can narrow the distance between features belonging to the same class and enlarge the distances between different features, which makes it easier to classify the features. LCEN. is calculated with the fused feature f_i as follows:

(8)LCEN=∑i=14∑k=12∥Fki−Cki∥22,

where Fki denotes the k_th class’s feature of the i_th classifier and Cki denotes the k_th class’s feature center of the i_th classifier. Feature center Cki means the average feature of Fki in a batch which needs to be constantly updated after each iteration.

In conclusion, the total loss of the proposed MVSD model is the weighted sum of the three losses:

(9)Ltotal=(1−α)⋅LCE+λ⋅LCEN+α⋅LKL=∑i=14((1−α)·CrossEntropy(pi,y)+λ·∑k=12∥Fki−Cki∥22)+α·∑i=13KL(pi,pt),

where α and λ are the coefficients of LCE and LCEN to balance the three losses, respectively.

Experiments and results

Image pre-processing

This study adopts 3D MRI volume data classification. All transverse view and sagittal view images containing the tumor were selected after verification by the radiologist. The Cascade Path Augmentation Unet (CPA-Unet) [26] proposed by our research group was used to segment bladder and tumor to remove redundant information in MRI images. We reserved the rectangular area of the largest bladder as the first channel and the rectangular area of the largest tumor area as the second channel. All double-channel blocks were interpolated into patches of 2 × 12 × 224 × 224 by bilinear interpolation, and the pixel spacing was uniform at the same time. To mitigate the issue of overfitting due to the limited sample size, we augmented the training set by tripling it through horizontal and vertical flipping.

Experiment settings and evaluation

The MVSD model was implemented in Pytorch. The pretrained 3D ResNet50 model without distillation was finetuned on our BCa dataset. All the experiments were performed using a computer equipped with an Intel Core i7 2.6 GHz CPU and an RTX3060 12 GB GPU. The Adam optimizer was used to minimize the total loss with a mini-batch of four and an initial learning rate of 0.001. The loss coefficient of α and λ were both set to 0.1 as discussed in Section 4.3. Five-fold cross-validation was used to train and evaluate models in the experiment below. The performance was evaluated by the following metrics: accuracy (Acc), sensitivity (Sens), specificity (Spec), AUC and F1 score.

Comparisons with existing methods

We evaluated the performance of MVSD and four of the classical DL-based image classification methods, including VGG16, MobileNet, DenseNet, and ResNet50. We also compared our approach to the latest MRI-based BCa classification methods, including the 3D residual network [27], the MBMIP model [15], and the CMS model [28]. Furthermore, DeLong’s test was employed to compare the AUC value of the models in pairs. As shown in Table 1, our method achieved a higher AUC of 0.927, demonstrating better diagnostic performance compared to the VGG 16, Densenet, ResNet50, and 3D residual network models, with statistically significant differences (AUC=0.927 vs. 0.847, 0.873, 0.908, 0.906, p=0.003, 0.002, 0.004, 0.041) and are shown as bold in Table 1. Although the AUC value is statistically insignificant compared to the MobileNet, MBMIP, and CMS models, the Acc improves by 6.2 , 4, and 1 %, respectively, indicating clinical value with some intent.

Table 1:

Cross validation results of different BCa classification methods.

Method	Evaluation metrics (mean±STD)				p-Value
Method	AUC	Acc	Sens	Spec	p-Value
VGG16	0.847 ± 0.085	0.767 ± 0.081	0.890 ± 0.078	0.720 ± 0.105	0.003
MobileNet	0.883 ± 0.049	0.818 ± 0.044	0.860 ± 0.082	0.801 ± 0.056	0.095
Densenet	0.873 ± 0.041	0.802 ± 0.060	0.860 ± 0.073	0.799 ± 0.101	0.002
ResNet50	0.908 ± 0.029	0.823 ± 0.064	0.918 ± 0.082	0.785 ± 0.115	0.004
3D residual network	0.906 ± 0.031	0.836 ± 0.049	0.913 ± 0.042	0.806 ± 0.073	0.041
MBMIP model	0.891 ± 0.042	0.840 ± 0.014	0.810 ± 0.071	0.851 ± 0.018	0.072
CMS model	0.925 ± 0.018	0.870 ± 0.021	0.826 ± 0.053	0.887 ± 0.046	0.270
Ours	0.927 ± 0.026	0.880 ± 0.024	0.901 ± 0.068	0.871 ± 0.033	Reference

Discussion

This paper proposed a novel 3D deep learning model based on T2WI to differentiate between MIBC and NMIBC, aiming to assist urologists in selecting appropriate treatment strategies for patients with BCa. Our proposed MVSD model exhibits favorable classification performance due to the fusion of 3D image features from multiple views and implementation knowledge distillation. The evaluation of performance enhancements brought by specific parts of the MVSD model, the weight coefficient setting of loss, and the comparison between 2D classification and 3D classification will be discussed in the following sections. Furthermore, we also compared the predictive performance of the MVSD model and the urologists, based on the T2WI sequence and the mp-MRI in 3D data, respectively.

Advantages of multi-view fusion and self-distillation

To further verify the role of multi-view fusion and self-distillation, ablation experiments were conducted, and the results are shown in Table 2. We compared the results obtained by using only transverse or sagittal planes with the results after multi-view fusion. The performance of the transverse plane alone (AUC:0.908, Acc:0.842) in BCa classification is better than that of the sagittal plane (AUC:0.901, Acc:0.847), which is also the reason why urologists prefer to observe transverse view in diagnosis. However, after multi-view fusion (AUC:0.915, Acc:0.862), AUC and Acc increased by about 1 and 2 %, respectively which indicates that information between different planes can mutually constrain and complement each other. Therefore, multi-view fusion can extract more complete global information, resulting in a more accurate classification ability.

Table 2:

Ablation study on the pre-trained ResNet50 with/without multi-view fusion and self-distillation.

Method	AUC	Acc	Sens	Spec
Tra-ResNet50	0.908 ± 0.030	0.842 ± 0.064	0.883 ± 0.082	0.826 ± 0.115
Sag-ResNet50	0.901 ± 0.033	0.847 ± 0.042	0.860 ± 0.099	0.842 ± 0.070
MV-ResNet50	0.915 ± 0.019	0.862 ± 0.040	0.867 ± 0.033	0.860 ± 0.056
MVSD	0.927 ± 0.026	0.880 ± 0.024	0.901 ± 0.068	0.871 ± 0.033

tra, transverse; sag, sagittal; MV, multi-view.

Also, after adding self-distillation to the multi-view fusion network, AUC increased from 0.915 to 0.927, Acc increased from 0.862 to 0.880, with Sens exceeding 90 %. Self-distillation helps the shallow feature extractor learn from the deep experience and enables the shallow classifier to generalize better and avoid overfitting, leading to a more precise classification result.

Effect of center loss and weight coefficient setting

In the global loss function (Equation (9)), the proposed MVSD model introduces LCEN and LKL.To demonstrate the effectiveness of different losses, an ablation study was conducted, as presented in Table 3. The baseline for the ablation study refers to retaining only the cross-entropy loss. Table 3 reveals that when either of the two additional losses is added, AUC and Acc are both improved compared to the baseline model. Notably, the MVSD model using all three loss functions simultaneously, achieves the best performance in predicting MIBC.

Table 3:

Ablation study of the loss function.

Loss	Baseline	–	–	MVSD
LCE	√	√	√	√
LCEN	×	√	×	√
LKL	×	×	√	√
AUC	0.907	0.915	0.912	0.927
Acc	0.836	0.862	0.847	0.880

LKL was added to introduce supervision from self-distillation. Its coefficient weight α was set to 0.1 according to the related literature [18], 19]. The use of center loss aims at pushing the feature vectors from the same class together around a learnable class center. Figure 3(a) illustrates the addition of LCEN significantly facilitates the aggregation of features and assists LCE to realize inter-class separation and intra-class aggregation, so that the model is more capable of distinguishing the features of MIBC and NMIBC. To explore the influence of hyper-parameter λ on the performance of MVSD model, we compare LCEN with different weight coefficients. As shown in Figure 3(b), when λ is set to 0.1, the model scores the highest AUC and Acc, so the α and λ was set to 0.1 in global loss function of MVSD model.

Figure 3:

(a) PCA (principal component analysis) visualization of feature distribution under the supervision of center loss. The red dots present NMIBC, while the blue ones present MIBC. (b) The influence of hyper-parameter λ on the performance of MVSD Model.

Comparisons between 2D and 3D classification methods

The MVSD model adopts 3D image classification, which differs from our previously proposed 2D image classification model MBMIP in several aspects, as follows.

Accurate label: In the 3D classification task, all MRI data of a patient is considered as a whole, and the postoperative pathological results are used as the gold standard label for the model. However, in the 2D classification task, experienced urologists are required to judge the specific conditions of each slice, which may introduce errors.
Richer Information: 3D MRI volume data can provide information on the z-axis to improve the classification performance. In addition, the MVSD model also combines different view data to identify BCa by fusing the features of the transverse and sagittal views. Two particular cases in our previous BCa study were misclassified using the 2D classification model, but were accurately classified using the 3D MVSD model. Figure 4 shows the heat maps of these two cases using the 2D and 3D classification models. In Case 1, it can be observed that the 2D classification model erroneously identified the prostate (yellow circle) as a tumor, which resulted in overlooking the tumor invading the muscle. In contrast, the 3D MVSD classification model, by combining information from adjacent slices, focused more on the tumor in the anterior wall of the bladder (red circle). Additionally, the research by Y. Arita [29] incorporated 3D fast spin-echo (FSE) T2WI acquisitions into the VI-RADS scoring system, further improving the specificity for the T2WI VI-RADS score. It indicates that 3D volumetric data has advantages in diagnosing MIBC. In case 2, the tumor located on the top wall of the bladder and longitudinally distributed, according to urologists’ experience, needs to be observed on the sagittal plane to determine the tumor type more clearly. The heat maps show that the 3D classification model combining multi-view information focused more on the tumor site of the sagittal plane and predicted it correctly.
Larger model and data volume: 3D classification models with 3D convolutions have a large number of parameters, which may cause difficulty in model convergence and overfitting, and require higher computational costs and more data for training. Hence, in the MVSD model, transfer learning was used to reduce training costs, and self-distillation was introduced to prevent model overfitting.

Figure 4:

Comparison of 2D and 3D classification models using heat maps. Case 1 is MIBC and case 2 is NMIBC. Both cases were misclassified in the 2D classification but classified correctly in the 3D classification.

Comparisons between the T2WI and the mp-MRI classification methods in 3D data

In clinical practice, urologists employ the VI-RADS scoring system based on T2WI, DCE, and DWI sequences (mp-MRI) to determine the likelihood of muscle-invasive bladder cancer. To validate the MVSD model’s and urologists’ predictive performance based on the T2WI (single-sequence) and mp-MRI, we compared their diagnostic performance based on both approaches in 3D data. The two experienced urologists were recruited to evaluate the muscle invasion status of BCa patients blindly. The interrater reliability of the two urologists was assessed using the Kappa-test, which yielded a significant difference (p<0.01) with a kappa score of 0.467, indicating moderate consistency.

Since DWI and DCE sequences are clinically available only in the transverse plane, the mp-MRI experiments were conducted using transverse plane images, and the experiments with only the T2WI sequence used the transverse and sagittal plane images. Table 4 presents the results of predicting muscle-invasive bladder cancer based on the single-sequence T2WI and the mp-MRI between the MVSD model and urologists.

Table 4:

Comparisons between the T2WI and mp-MRI classification methods in 3D data.

Methods	Sequence	AUC	Acc	Sens	Spec
MVSD	T2WI	0.927 ± 0.026	0.880 ± 0.024	0.901 ± 0.068	0.871 ± 0.033
MVSD	Mp-MRI	0.921 ± 0.035	0.879 ± 0.025	0.853 ± 0.067	0.876 ± 0.025
Urologist A	T2WI	–	0.842 ± 0.036	0.767 ± 0.092	0.871 ± 0.031
Urologist A	Mp-MRI	–	0.872 ± 0.016	0.802 ± 0.063	0.898 ± 0.030
Urologist B	T2WI	–	0.779 ± 0.054	0.686 ± 0.112	0.815 ± 0.047
Urologist B	Mp-MRI	–	0.854 ± 0.034	0.791 ± 0.052	0.878 ± 0.041

As shown in Table 4, although the two urologists achieved higher Acc for the diagnosis of MIBC based on the mp-MRI compared to the T2WI sequence alone, the proposed MVSD model predictions indicate comparable Acc between single-sequence T2WI (0.880) and mp-MRI (0.879). The AUC value for the T2WI sequence is 0.6 % higher than that of the mp-MRI. It suggests that the model’s performance in extracting features from the transverse and sagittal planes of the T2WI sequence for predicting MIBC may be more advantageous than extracting features from only the transverse plane of the mp-MRI sequences.

We recommend using only the T2WI sequence with transverse and sagittal planes to diagnose MIBC, primarily considering practicality and cost. While the roles of DWI and DCE in diagnosis are widely recognized, implementing these two sequences requires the injection of a contrast agent and longer scanning time, which may present certain limitations in practical applications.

Furthermore, our study focuses on developing a classification model (MVSD) based on T2WI sequence to predict the MIBC. Meanwhile, the MVSD model also holds the potential for applications in addressing other issues related to BCa. In BCa, another classification approach, such as pure urothelial carcinomas (pure UC) and variant histology urothelial carcinomas (variant UC). Whereas variant UCs are classified as high-grade tumors and exhibit a higher likelihood of disease recurrence than pure UC, necessitating accurate preoperative assessment of MIBC in variant UCs [30]. Additionally, regarding BCa prognosis, S. Yajima [31] found the inchworm sign on the DWI sequence as an imaging biomarker for predicting tumor recurrence, where the absence of the inchworm sign is a significant indicator of T1-stage bladder cancer progression. Several studies [32], 33] have highlighted the potential of mp-MRI in predicting preoperative recurrence of BCa. These issues fundamentally fall within the classification issues, aligning with our research on muscle invasion classification. However, given the practical limitations, such as contrast agent allergies and economic constraints, which may lead to the absence of some sequences, the simplicity and cost-effectiveness of the T2WI sequence make it a viable option in resource-limited settings. The MVSD model demonstrated promising performance in predicting MIBC based on multi-view T2WI sequence. Likewise, with the acquisition of labeled data for relevant issues in the future, we can further explore the performance of the MVSD model for predicting MIBC in variant UC and the prognosis of BCa.

Limitation and future work

Although experiments have shown the effectiveness of our proposed method, there are still certain aspect that need to be improved in future work.

Firstly, the generalizability of the model is limited. The data for this study were from a single center, which may not cover all case types, imaging variations, and clinical scenarios. In the future, collaboration with other medical centers to collect more diverse datasets will enhance the model’s generalizability and robustness. Additionally, promoting and implementing standardized data collection and processing procedures will ensure data comparability across different centers, thereby improving the model’s applicability in various clinical settings.

Secondly, since it is a 3D classification network, although the model adopts transfer learning to reduce training time, there are also problems such as difficult model fitting, large parameter quantity, and high computational cost. We will also continue to focus on model light-weighting to unleash more potential of the model in diagnosing diseases.

Conclusions

This study introduces a novel MVSD model that integrates transverse and sagittal T2WI images for personalized preoperative differential diagnosis of MIBC and NMIBC. The proposed 3D image classification methodology maximizes the utilization of z-axis information present in 3D MRI volume data, while leveraging the complementarity and contextual information exchange between adjacent slices to extract more comprehensive features. The multi-view fusion capitalizes on the distinctive characteristics of transverse and sagittal planes, enabling the acquisition of a more comprehensive understanding of global information. Additionally, the incorporation of self-distillation techniques empowers shallow classifiers to glean valuable insights from deeper layers, thereby enhancing the feature extraction capabilities of the backbone network and yielding improved classification performance. The promising classification results obtained by the proposed model suggest its potential as a valuable tool for clinical diagnostics and prognostics in predicting MIBC.

Corresponding author: Chunxiao Chen, Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Sub-box 269 of Main Post Box 159, No. 169 Sheng Tai West Road, Jiang Ning District, [211100], Nanjing, People’s Republic of China, E-mail: ccxbme@nuaa.edu.cn; and Qiang Lu, Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu, People’s Republic of China, E-mail: doctorlvqiang@sina.com

Yuan Zou, Jie Yu and Lingkai Cai have contributed equally to this work.

Funding source: National Natural Science Foundation of China

Award Identifier / Grant number: 12071215

Research ethics: The research approval was granted by the ethics committee of First Affiliated Hospital of Nanjing Medical University.
Informed consent: Not applicable.
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission
Use of Large Language Models, AI and Machine Learning Tools: Used pytorch to build a training network.
Conflict of interest: The authors state no conflict of interest.
Research funding: This work was supported by the National Natural Science Foundation of China [grant number 12071215].
Data availability: Not applicable.

References

1. Siegel, RL, Miller, KD, Fuchs, HE, Jemal, A. Cancer statistics. Ca Cancer J Clin 2022;72:7–33. https://doi.org/10.3322/caac.21708.Search in Google Scholar PubMed

2. Sung, H, Ferlay, J, Siegel, RL, Laversanne, M, Soerjomataram, I, Jemal, A, et al.. Global cancer statistics 2020: globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Ca Cancer J Clin 2021;71:209–49. https://doi.org/10.3322/caac.21660.Search in Google Scholar PubMed

3. Witjes, JA, Bruins, HM, Cathomas, R, Comperat, EM, Cowan, NC, Gakis, G, et al.. European association of urology guidelines on muscle-invasive and metastatic bladder cancer: summary of the 2020 guidelines. Eur Urol 2021;79:82–104. https://doi.org/10.1016/j.eururo.2020.03.055.Search in Google Scholar PubMed

4. Miller, KD, Nogueira, L, Devasia, T, Mariotto, AB, Yabroff, KR, Jemal, A, et al.. Cancer treatment and survivorship statistics. Ca Cancer J Clin 2022;72:409–36. https://doi.org/10.3322/caac.21731.Search in Google Scholar PubMed

5. Powles, T, Bellmunt, J, Comperat, E, De Santis, M, Huddart, R, Loriot, Y, et al.. Bladder cancer: esmo clinical practice guideline for diagnosis, treatment and follow-up. Ann Oncol 2022;33:244–58. https://doi.org/10.1016/j.annonc.2021.11.012.Search in Google Scholar PubMed

6. Anonymous, R. european association of urology guidelines on muscle-invasive and metastatic bladder cancer: summary of the 2020 guidelines editorial comment. J Urol 2022;207:1153–4. https://doi.org/10.1097/JU.0000000000002460.Search in Google Scholar PubMed

7. Panebianco, V, Narumi, Y, Barchetti, G, Montironi, R, Catto, J. Should we perform multiparametric magnetic resonance imaging of the bladder before transurethral resection of bladder? Time to reconsider the rules. Eur Urol 2019;76:57–8. https://doi.org/10.1016/j.eururo.2019.03.046.Search in Google Scholar PubMed

8. Yuan, B, Cai, L, Cao, Q, Wu, Q, Zhuang, J, Sun, X, et al.. Role of vesical imaging-reporting and data system in predicting muscle-invasive bladder cancer: a diagnostic meta-analysis. Int J Urol 2022;29:186–95. https://doi.org/10.1111/iju.14748.Search in Google Scholar PubMed

9. Feng, Y, Zhong, K, Chen, R, Zhou, W. Diagnostic accuracy of vesical imaging-reporting and data system (vi-rads) for the detection of muscle-invasive bladder cancer: a meta-analysis. Abdom Radiol 2022;47:1396–405. https://doi.org/10.1007/s00261-022-03449-w.Search in Google Scholar PubMed

10. Gandi, C, Vaccarella, L, Bientinesi, R, Racioppi, M, Pierconti, F, Sacco, E. Bladder cancer in the time of machine learning: intelligent tools for diagnosis and management. Urologia 2021;88:94–102. https://doi.org/10.1177/0391560320987169.Search in Google Scholar PubMed

11. Garapati, SS, Hadjiiski, L, Cha, KH, Chan, HP, Caoili, EM, Cohan, RH, et al.. Urinary bladder cancer staging in ct urography using machine learning. Med Phys 2017;44:5814–23. https://doi.org/10.1002/mp.12510.Search in Google Scholar PubMed PubMed Central

12. Wang, H, Xu, X, Zhang, X, Liu, Y, Ouyang, L, Du, P, et al.. Elaboration of a multisequence mri-based radiomics signature for the preoperative prediction of the muscle-invasive status of bladder cancer: a double-center study. Eur Radiol 2020;30:4816–27. https://doi.org/10.1007/s00330-020-06796-8.Search in Google Scholar PubMed

13. Zhang, G, Wu, Z, Xu, L, Zhang, X, Zhang, D, Mao, L, et al.. Deep learning on enhanced ct images can predict the muscular invasiveness of bladder cancer. Front Oncol 2021;11:654685. https://doi.org/10.3389/fonc.2021.654685.Search in Google Scholar PubMed PubMed Central

14. Zhou, X, Yue, X, Xu, Z, Denoeux, T, Chen, Y. Penet: prior evidence deep neural network for bladder cancer staging. Methods 2022;207:20–8. https://doi.org/10.1016/j.ymeth.2022.08.010.Search in Google Scholar PubMed

15. Zou, Y, Cai, L, Chen, C, Shao, Q, Fu, X, Yu, J, et al.. Multi-task deep learning based on t2-weighted images for predicting muscular-invasive bladder cancer. Comput Biol Med 2022;151:106219. https://doi.org/10.1016/j.compbiomed.2022.106219.Search in Google Scholar PubMed

16. Liu, J, Zhuang, BH, Zhuang, ZW, Guo, Y, Huang, JZ, Zhu, JH, et al.. Discrimination-aware network pruning for deep model compression. IEEE Trans Pattern Anal Mach Intell 2022;44:4035–51. https://doi.org/10.1109/TPAMI.2021.3066410.Search in Google Scholar PubMed

17. Gou, JP, Yu, BS, Maybank, SJ, Tao, DC. Knowledge distillation: a survey. Int J Comput Vis 2021;129:1789–819. https://doi.org/10.1007/s11263-021-01453-z.Search in Google Scholar

18. Zhang, LF, Song, JB, Gao, AN, Chen, JW, Bao, CL, Ma, KS. Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: 2019 International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE; 2019:3712–22 pp.10.1109/ICCV.2019.00381Search in Google Scholar

19. Zhang, LF, Bao, CL, Ma, KS. Self-distillation: towards efficient and compact neural networks. IEEE Trans Pattern Anal Mach Intell 2022;44:4388–403. https://doi.org/10.1109/TPAMI.2021.3067100.Search in Google Scholar PubMed

20. H. K., Z. X., R. S., S. J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016:770–8 pp.Search in Google Scholar

21. Wang, C, Pelillo, M, Siddiqi, K. Dominant set clustering and pooling for multi-view 3d object recognition. Arxiv 2019;12. arXiv:1906.01592.Search in Google Scholar

22. Su, H, Maji, S, Kalogerakis, E, Learned-Miller, E. multi-view convolutional neural networks for 3d shape recognition.. In: 2015 International Conference On Computer Vision (ICCV). Santiago, Chile: IEEE; 2015:945–53 pp.10.1109/ICCV.2015.114Search in Google Scholar

23. Chen, S, Ma, K, Zheng, Y. Med3d: transfer learning for 3d medical image analysis. Arxiv 2019;arXiv:1904.00625v4. https://doi.org/arXiv:1904.00625.Search in Google Scholar

24. Hinton, G, Vinyals, O, Dean, J. Distilling the knowledge in a neural network. Comput Sci 2015;14:38–9.Search in Google Scholar

25. Wen, YD, Zhang, KP, Li, ZF, Qiao, Y. A discriminative feature learning approach for deep face recognition.. In: Leibe, B, Matas, J, Sebe, N, Welling, M, editors. COMPUTER VISION – ECCV 2016, PT VII, 14th European Conference on Computer Vision (ECCV). Amsterdam, the Netherlands: Berlin: Springer; 2016:499–515 pp.10.1007/978-3-319-46478-7_31Search in Google Scholar

26. Yu, J, Cai, LK, Chen, CX, Fu, X, Wang, L, Yuan, BR, et al.. Cascade path augmentation unet for bladder cancer segmentation in mri. Med Phys 2022;49:4622–31. https://doi.org/10.1002/mp.15646.Search in Google Scholar PubMed

27. Li, JP, Cao, KY, Lin, HX, Deng, L, Yang, SQ, Gao, Y, et al.. Predicting muscle invasion in bladder cancer by deep learning analysis of mri: comparison with vesical imaging-reporting and data system. Eur Radiol 2023;33:2699–709. https://doi.org/10.1007/s00330-022-09272-7.Search in Google Scholar PubMed

28. Yu, J, Cai, L, Chen, C, Zou, Y, Xiao, Y, Fu, X, et al.. A novel predict method for muscular invasion of bladder cancer based on 3d mp-mri feature fusion. Phys Med Biol 2024;69. https://doi.org/10.1088/1361-6560/ad25c7.Search in Google Scholar PubMed

29. Arita, Y, Shigeta, K, Akita, H, Suzuki, T, Kufukihara, R, Kwee, TC, et al.. Clinical utility of the vesical imaging-reporting and data system for muscle-invasive bladder cancer between radiologists and urologists based on multiparametric mri including 3d fse t2-weighted acquisitions. Eur Radiol 2021;31:875–83. https://doi.org/10.1007/s00330-020-07153-5.Search in Google Scholar PubMed

30. Arita, Y, Woo, S, Kwee, TC, Shigeta, K, Ueda, R, Nalavenkata, S, et al.. Pictorial review of multiparametric mri in bladder urothelial carcinoma with variant histology: pearls and pitfalls. Abdom Radiol 2024;49:2797–811. https://doi.org/10.1007/s00261-024-04397-3.Search in Google Scholar PubMed

31. Yajima, S, Yoshida, S, Takahara, T, Arita, Y, Tanaka, H, Waseda, Y, et al.. Usefulness of the inchworm sign on dwi for predicting pt1 bladder cancer progression. Eur Radiol 2019;29:3881–8. https://doi.org/10.1007/s00330-019-06119-6.Search in Google Scholar PubMed

32. Xu, X, Wang, H, Du, P, Zhang, F, Li, S, Zhang, Z, et al.. A predictive nomogram for individualized recurrence stratification of bladder cancer using multiparametric mri and clinical risk factors. J Magn Reson Imag 2019;50:1893–904. https://doi.org/10.1002/jmri.26749.Search in Google Scholar PubMed PubMed Central

33. Xu, X, Huang, Y, Liu, Y, Cai, Q, Guo, Y, Wang, H, et al.. Multiparametric mri-based vi-rads: can it predict 1- to 5-year recurrence of bladder cancer? Eur Radiol 2024;34:3034–45. https://doi.org/10.1007/s00330-023-10387-8.Search in Google Scholar PubMed

Received: 2024-07-03

Accepted: 2024-09-02

Published Online: 2024-11-06

Published in Print: 2025-02-25

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/bmt-2024-0333

Keywords for this article

deep learning; bladder cancer; multi-view; self-distillation; T2-weighted image

Creative Commons

BY 4.0