Combination of edge enhancement and cold diffusion model for low dose CT image denoising

Yinglin Du; Yi Liu; Han Wu; Jiaqi Kang; Zhiguo Gui; Pengcheng Zhang; Yali Ren

doi:10.1515/bmt-2024-0362

40% Rabatt

auf Fachbücher bei De Gruyter Brill *

Artikel Open Access

Combination of edge enhancement and cold diffusion model for low dose CT image denoising

Yinglin Du , Yi Liu , Han Wu , Jiaqi Kang , Zhiguo Gui , Pengcheng Zhang und Yali Ren

Veröffentlicht/Copyright: 6. November 2024

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Biomedical Engineering / Biomedizinische Technik Band 70 Heft 2

Abstract

Objectives

Since the quality of low dose CT (LDCT) images is often severely affected by noise and artifacts, it is very important to maintain high quality CT images while effectively reducing the radiation dose.

Methods

In recent years, the representation of diffusion models to produce high quality images and stable trainability has attracted wide attention. With the extension of the cold diffusion model to the classical diffusion model, its application has greater flexibility. Inspired by the cold diffusion model, we proposes a low dose CT image denoising method, called CECDM, based on the combination of edge enhancement and cold diffusion model. The LDCT image is taken as the end point (forward) of the diffusion process and the starting point (reverse) of the sampling process. Improved sobel operator and Convolution Block Attention Module are added to the network, and compound loss function is adopted.

Results

The experimental results show that CECDM can effectively remove noise and artifacts from LDCT images while the inference time of a single image is reduced to 0.41 s.

Conclusions

Compared with the existing LDCT image post-processing methods, CECDM has a significant improvement in all indexes.

Keywords: CBAM; cold diffusion; edge enhancement; improved sobel operator; LDCT denoising

Introduction

CT imaging is widely used in clinical examination and treatment. However, with the wide application of CT, more and more attention has been paid to its potential threat to the health of the subject. As a kind of ionizing radiation, X-ray may cause human gene damage and induce cancer, and this probability increases with the radiation dose [1].

Low-Dose CT (LDCT) reduces the damage to the subject’s body by reducing the radiation intensity of X-rays, while reducing the X-ray radiation dose will lead to the reduction of CT image quality, non-stationary fringe artifacts and increased noise [2]. The loss of important details such as potential lesion information in CT images will affect the outcome of clinical diagnosis. Therefore, the purpose of LDCT image denoising is to accurately analyze the noise area and small structure texture under low radiation dose. While efficiently denoising the noise area, the image structure and texture details are preserved as much as possible, so as to obtain CT images with similar quality to that of normal dose CT.

Over the past few decades, researchers have proposed a number of LDCT image denoising methods, which can be roughly divided into three categories: projection domain pre-processing algorithm, image domain reconstruction algorithm and image domain post-processing algorithm, the development of the first two methods are limited by the difficulty in obtaining projection data and the high calculation cost. Image domain post-processing method directly processes reconstructed images, gets rid of the dependence on projection data and scanning equipment, is convenient, simple and has outstanding effects, so it has been widely studied.

Traditional image post-processing methods include NLM [3], BM3D [4] and so on. Ma et al. [5] used the information redundancy in the previous normal dose scan to calculate the non-local weight of LDCT image denoising based on the non-local mean (NLM). Sheng et al. proposed a low dose CT image denoising algorithm based on block matching 3D (BM3D), which enhanced the visual saliency of soft tissue by using the saliency map obtained from residual texture information after BM3D denoising [6].

The LDCT artifact suppression method proposed by Chen et al. [7] based on large-scale neighborhood non-local mean filtering takes into account the similarity of pixel structures in large-scale windows, and has good artifact suppression effect. Wang et al. [8] proposed a novel anisotropic fourth-order diffusion (NAFOD) model, which overcomes the shortcomings of AFOD [9] model, which produces blocky effect and ignores texture and detail protection. In addition, there are many explorations of LDCT image noise reduction in transform domain. For example, LDCT image denoising algorithm based on multi-scale singularity detection [10], 3D block matching filtering algorithm [4] which improves NLM method in wavelet domain, and methods based on sparse representation and dictionary learning [11] are also research hotspots in LDCT image denoising field in recent years. Traditional denoising methods can significantly improve the quality of LDCT images, but the principle of the algorithm relies too much on manual design, which is difficult to adjust flexibly, and its performance has great limitations. During denoising, there are problems of excessive smoothness and residue, and the texture and information of CT images cannot be well preserved.

In recent years, with the development of deep learning technology, LDCT denoising method based on deep learning has been widely used because of its strong feature learning ability, wide adaptability and high execution speed. More and more studies have explored the role of convolutional neural networks (CNNS) in LDCT image processing. Chen et al. [12] proposed a de-noising network RED-CNN, which combines the structure of ResNet and Autoencoder, and optimized the network to denoise LDCT images by minimizing the pixel loss between denoising and normal dose CT (NDCT) images. Fan et al. [13] replace the convolutional layer and deconvolution layer with Quadratic filters, and construct the encoder-decoder structure with quadratic neurons, called Quadratic autoencoder (QAE), which improves the model efficiency of LDCT denoising. In [14], Liang et al. proposed EDCNN based on edge enhancement and dense connectivity. Benefit from the trainable sobel convolution, the quality of CT images processed by EDCNN is significantly improved. Shan et al. [15] introduced a Modular Adaptive Processing Neural Network (MAP-NN) for LDCT imaging based on progressive denoising, which introduces the Sobel filtering difference between the real image and the estimated image to measure the edge incoherence, which induces the generated denoised image to retain more texture information, reduce the noise and enhance the edges. Although these methods have excellent denoising performance, they often result in excessive smoothing of the image, and to mitigate this problem, some methods use generative adduction networks (GANs) to retain more texture and detail as close as possible to the NDCT image [16]. For example, Yang et al. combined Wasserstein GAN and perceptual loss (WGAN-VGG) to produce a more realistic denoised image [17]. However, due to the adversarial nature of GANs, they are often difficult to train and require careful design optimization and network architecture to ensure convergence. Li et al. [18] proposed a low-dose CT denoising method based on multi-scale feature fusion network (MSFLNet), which combines features of context information from multiple scales while maintaining spatial details of denoised CT images. Wang et al. [19] proposed the first Convolution free Token2Token Dilated Vision Transformer (CTformer) for LDCT denoising. By extending and moving the feature map, the information interaction in a longer range is obtained, and the overlap inference mechanism is introduced to effectively eliminate the common boundary artifacts in the codec model. Kang et al. proposed FSformer model [20], which combines the local information extracted by CNN with the global information extracted by Transformer, complements the high and low frequency information, better simulates the global, regional and local range dependence of capture, and effectively reduces LDCT image noise. Yan et al. [21] combined convolutional dictionary learning with convolutional neural networks (CNN) to propose a transfer learning dense connected convolutional dictionary learning (TLD-CDL) framework, which improved the quality of CT images with complex noise and artifacts.

In 2020, Jonathan Ho et al. first proposed Denoising Diffusion Probabilistic Models (DDPM) [22], which has received widespread attention for its excellent image generation performance. Diffusion model has great potential in improving the imaging performance of LDCT. However, the biggest problem of diffusion model-based techniques is the high computational effort and long inference time, so most of the previous work has focused on accelerating the reverse process and improving its sampling speed. Based on [22], Xia et al. used a fast ordinary differential equation solver (ODE) to accelerate DDPM for LDCT image denoising, requiring only 50 sampling steps [23]. Liu et al. [24] developed a low-dose CT image denoising algorithm Dn-Dp based on diffusion priors. In the reverse process of diffusion model, multiple MAP estimation problems were iteratively solved, but the network memory consumption was large, and it took 10.11 s to process single LDCT image of 256 × 256. The problem of long inference time of diffusion model has not been improved significantly. Qi et al. proposed a context error modulated generalized diffusion model, called CoreDiff [25], and designed a one-time learning framework, which can make the trained model adapt to the new dose level quickly and easily. Unlike the traditional diffusion model, the inference time is too long and only 10 sampling steps are required. Although CoreDiff has greatly improved the application speed of the diffusion model, it still needs to be improved in the preservation of image edges, textures and other details.

Inspired by the cold diffusion model [26], we propose a low dose CT image denoising method, called CECDM, based on the combination of edge enhancement and cold diffusion model. NDCT image is taken as the starting point of the diffusion process, LDCT image is taken as the diffusion end point, and edge enhancement is further optimized on the basis of the network architecture proposed in CoreDiff. While ensuring the application speed of diffusion model, focusing on the performance of LDCT image denoising makes the image have better denoising effect while retaining more texture details.

In general, the contributions of this paper can be summarized as follows:

The cold diffusion model is used in low-dose CT image denoising tasks, which only requires 10 sampling steps compared with the classical diffusion model, thus increasing the application speed.
Improved sobel operator and CBAM convolutional attention module are added to the network framework of the recovery network, which enables the model to extract edge information more accurately, overcome the smoothing problem of the denoised image, enhance the texture details of the image, and improve the visual quality.
The compound loss of MSE+Charbonnier loss is introduced to train the recovery network, improve the generalization ability of the model, enrich the depth features in the network, and improve the model performance.

The overall structure of this paper can be summarized as follows: in Section “Introduction”, we focus on the current state of research on LDCT denosing; in Section “Related work”, we describe the basic principles and related work of the diffusion model involved in the introduction; in Section “The proposed CECDM”, we introduce the complex loss function of CECDM network structure, module design and network optimization in detail; and in Section “Experiments and results”, the experimental configuration and corresponding experimental results and analysis are given; we make some concluding statements and future directions in Section “Conclusion and discussion”.

Related work

Denoising diffusion probabilistic models (DDPM)

DDPM refers to a generation model that learns target data distribution from samples [22]. DDPM consists of two Markov processes: a fixed forward process and a learning-based reverse process, as shown in Figure 1.

Figure 1:

Diagram of denoising diffusion probabilistic models. The Markov chain of forward (reverse) diffusion process of generating a sample by slowly adding (removing) noise.

Forward process: Starting from a sample x₀∼q(x₀) of a clean image, Gaussian noise is gradually added according to the following probability:

(1) q ( x t | x t − 1 ) : = N ( x t ; 1 − β t x t − 1 , β t Ι ) ,

Type of N ( · ) said the gaussian probability density function, β_1:T complying with β_t∈(0,1) all of the t=1, ⋯, T. The latent variable x_1:T has the same dimension as the original image sample x 0 ∈ R n , and when T is large enough and an appropriate β_t scheduling is selected, the latent variable x_T approaches a Gaussian distribution. By changing the α_t:=1−β_t and α ‾ t = ∏ s = 1 t α s parameters, x_t can be written as a linear combination of the noise ϵ and x₀:

(2) x t = α ‾ t x 0 + 1 − α ‾ t ϵ ,

where ϵ ∼ N ( 0 , Ι ) . This allows for a closed expression for the marginal distribution of x_t samples given x₀:

(3) q ( x t | x 0 ) : = N ( x t ; α ‾ t x 0 , ( 1 − α ‾ t ) Ι ) ,

Reverse process: Since the inverse procedure q(x_t−1|x_t) depends on the entire data distribution and is difficult to handle, a neural network can be used to learn the parameterized Gaussian probability p_θ(x_t−1|x_t), as follows:

(4) p θ ( x t − 1 | x t ) = N ( x t − 1 ; μ θ ( x t , t ) , σ t 2 Ι ) ,

μ_θ(x_t,t) refers to the learned mean. It is worth noting that initially Ho et al. set the variance σ_t to a fixed constant value. By factorizing μ_θ into a linear combination of x_t and the noise approximation ϵ_θ, x_t and x₀ can be associated by equations (2) and (3). More specifically, we have x t = α ‾ x 0 + 1 − α ‾ ϵ and ϵ ∼ N ( 0 , Ι ) can be trained network ϵ_θ theta as denoising to predict. During sampling, we can use a simple substitution to get μ_θ(x_t,t) from the network prediction ϵ_θ(x_t,t):

(5) x t − 1 = 1 α t ( x t − 1 − α t 1 − α ‾ t ϵ θ ( x t , t ) ) + σ t z ,

where z ∼ N ( 0 , Ι ) . Since the model learns a reverse Markov chain running in inverse time from x_T to x₀, we call it a reverse process to estimate a clean image x₀ from a partially noisy image x_T.

In the reverse process, x_t in equation (5) can be replaced by (2) to obtain the relation of x_t-1 represented by x₀ and noise, which can also be interpreted as that each step of reverse sampling actually predicts the initial image. Then the noise map in the current state is linearly superimposed on the predicted image, resulting in a new image that serves as the image for the next iteration.

Cold diffusion

The cold diffusion model is a generalized diffusion model that extends the diffusion and sampling of Gaussian noise to any type of degradation, such as adding various types of noise, blurring, downsampling, etc. Specifically, given an image x₀ from the training data distribution Q, the image x₀ (cold state) is gradually degraded to an image x_T sampled from a random initial distribution P (hot state) using a specific degradation operator D(·), such as a Gaussian distribution where T is the total number of time steps of diffusion. Then, the image x_t of any time step t in the diffusion process is defined as x_t=D(x₀,x_t,t), where t corresponds to the degree of degradation, and for any t, the operator D(·) should be continuous.

In LDCT image denoising, the NDCT image is taken as the starting point x₀ and the LDCT image is used as the end point of the diffusion process which is x_T. The degradation operator D(·) is introduced to ensure that the middle image x_t of the diffusion process maintains the same expectation as x₀, without introducing additional CT number shift, which is defined as follows [25]:

(6) x t = D ( x 0 , x T , t ) = α t x 0 + ( 1 − α t ) x T ,

where x_T is a random noise with a known distribution and α_t<α_t-1, ∀1≤t≤T.

In the reverse process, x_T is first sampled from P, and then the diffusion process is reversed using the recovery operator R(·), which can be expressed as:

(7) x ˆ 0 = R ( x t , t ) ≈ x 0 ,

In practice, R(·) is a neural network parameterized to θ, which can be optimized by the following objective functions:

(8) min θ Ε x 0 ∼ Q , x T ∼ P ∥ R θ ( D ( x 0 , x T , t ) , t ) − x 0 ∥ ,

It is worth noting that for any t, R_θ(·) can directly generate the recovered image x ˆ 0 from x_T. However, such a one-step prediction may produce a blurry image x ˆ 0 and serious detail loss.

In order to solve this problem, the cold diffusion model follows the annealing sampling algorithm in the classical diffusion model, adopts the “restoration-redegradation” sampling algorithm, and gradually generates the image through T sampling steps. From the predicted value x ˆ 0 , the image x ˆ t − 1 of time step t−1 can be calculated as follows:

(9) x ˆ t − 1 = D ( x ˆ 0 , x T , t − 1 ) ,

Although this iterative sampling algorithm can produce a sharper image than the single step prediction, the prediction error between x₀ and x ˆ 0 may cause the error between the true value x_t−1 and x ˆ t − 1 of the time step t−1 to increase. Errors accumulate gradually during the sampling process, and the prediction bias of R_θ(·) may be further worsened. Therefore, Bansal et al. [26] proposed an improved sampling algorithm to reduce this cumulative error:

(10) x t − 1 = x t − D ( x ˆ 0 , x ˆ T , t ) + D ( x ˆ 0 , x ˆ T , t − 1 ) ,

where x ˆ T = ( x t − α t x ˆ 0 ) ∕ 1 − α t . While the improved sampling algorithm in equation (10) mitigated the problem of error accumulation and was shown to produce better image quality, it did not correct the misalignment between the input and its corresponding time step, which could result in non-negligible shifts in pixel values. Therefore, Timestep adjustment module is introduced into the network in this paper to reduce the cumulative error caused by imperfect recovery network and misalignment in cold diffusion. We will introduce the details in the part of network architecture.

Convolutional block attention module

As shown in the lower left corner of Figure 2, CBAM [27] is a convolutional block attention module that combines channel and spatial. Given an input feature, CBAM can derive the attention graph along two independent dimensions, channel dimension and space dimension respectively, and then multiply the attention graph with the input feature for adaptive feature thinning.

Figure 2:

Overall architecture of CECDM network. The lower left corner is the CBAM module, and the lower right corner is timestep adjustment module.

In terms of channel attention (the blue part in Figure 2), the spatial dimension of the input image graph was compressed, the channel attention was calculated, and then aggregated. Two descriptors were obtained by means of average pooling and maximum pooling. These descriptors are forwarded to a three-layer shared multilayer perceptron (MLP) to generate a attention map. The inputs for each MLP are then summed by unit and the channel attention is calculated using the sigmoid function.

Spatial feature relationships are then used to supplement channel attention (green in Figure 2). Spatial attention is focused on the information part by applying average pooling and maximum pooling channel calculations and then concatenating the two to get a single feature descriptor. In addition, convolutional layers are used on the connected feature descriptors to generate spatial attention graphs, which are encoded to emphasize or suppress. The channel information of feature graph is aggregated by average pooling feature and maximum pooling feature, and then convolution and convolution are performed to generate two-dimensional spatial attention maps.

The proposed CECDM

Network architecture

The overall structure of the CECDM network proposed in this paper is shown in Figure 2. Each timestep in the training process is divided into two stages, the left is the first stage in the training process, the right is the second stage, the two parts of the network composition is the same. In the first stage, the input image x_t estimates x₀ through the recovery network R_θ(·), which is equivalent to one-step prediction of x₀. In the second stage, x ˆ 0 obtained in the previous step is redegenerated, and x ˆ t − 1 is calculated by equation (9) as the input of the network, and then the same network R_θ(·) is used for processing, and the output x ˆ ˆ 0 is the input x_t−1 of the next time step. This part is achieved through the Timestep Adjustment Module, as shown in the blue dashed box in the lower right part of Figure 2.

The whole recovery network R_θ(·) adopts UNet network framework. The input images are first processed by 3 × 3 sobel convolution module, then average pooling undersampling and TAM module in corresponding stage, and finally output by a combination of 3 × 3 Conv and ReLU, which is introduced into CBAM module before upsampling. The CA module uses average pooling and maximum pooling to compress the spatial dimension of the feature map and retain more texture information of LDCT image. The SA module supplements the CA mechanism and reduces the number of channels to one through the dimensionality reduction filtering of 7 × 7 convolution kernel. Finally, the SA feature map is obtained through the sigmoid activation function. Enrich the input information up-sampled. The feature is modulated by TAM module after each up and down sampling.

Timestep adjustment module

The timestep embedding feature Cond_t is obtained by sinusoidal position coding t and MLP. The output of the first stage is obtained by adding the module input and t time feature:

(11) f t = f i n p u t + C o n d t ,

The timestep input of the second stage is t−1, and x ˆ 0 and x_T are spliced in channel dimension. Two modulation factors γ and β are obtained by convolution RELU and other operations F ξ . γ is multiplied by time feature and added with β and input to get the final output f_t−1.

(12) β , γ = F ξ ( x ˆ 0 , x T ) ,

(13) f t − 1 = f i n p u t + γ · C o n d t − 1 + β ,

Sobel convolution block

In order to enrich the input information of the model, a trainable sobel convolution block is used as the edge enhancement module of the network, and some changes are made to it while retaining the excellent edge extraction ability of the sobel operator, as shown in Figure 3.

Figure 3:

Learnable sobel operator.

When processing LDCT images, it is found that the edge information extracted by using a single traditional sobel operator is incomplete. In this case, the absence of contour information is more serious. Different from the traditional fixed-valued Sobel operator [28], EDCNN [15] proposes a trainable Sobel operator, which defines four operators in the horizontal, vertical and diagonal directions as a group, and this module is also used in DEformer [29], so that the parameter values can be adaptively adjusted during the optimization training process to extract the edge information with different intensities. In order to make the CECDM model extract edge information more accurately, we make some modifications on this basis by changing the learnable parameter α in the sobel algorithm from 0 to 0.1, and other parameters remain unchanged.

Loss function

In this paper, we take the composite loss of MSE+Charbonnier loss as the final training target of the network. The output of the first stage of the network is R_θ(x_t,t), and the output of the second stage t−1 is x ˆ ˆ 0 . The MSE loss of the two stages is calculated respectively to obtain L_MSE.

(14) L M S E = min θ , ϕ Ε [ ∥ R θ ( x t , t ) − x 0 ⏟ Ⅰ ∥ 2 + ∥ x ˆ ˆ 0 − x 0 ⏟ Ⅱ ∥ 2 ] ,

Charbonnier Loss [30], 31] is an approximation of L1 loss and is used to improve the performance of the model. L_char calculates the loss between the network output x ˆ ˆ 0 and x₀ with parameter ε=1 × 10⁻³.

(15) L c h a r = ‖ x 0 − x ˆ ˆ 0 ‖ 2 + ε 2 ,

The corresponding formula for the final compound loss L C o m p o u n d _ l o s s is as follows:

(16) L C o m p o u n d _ l o s s = λ 1 L M S E + λ 2 L c h a r ,

where λ₁ and λ₂ are predefined constants, λ₁=1.5, λ₂=0.05 in this paper.

Experiments and results

In this section, we explain the data set used for training and testing the proposed model, as well as the configuration of the experiment. Then we present the experimental results and evaluate the noise reduction performance of the model in terms of visual effects and quantitative indicators.

Mayo dataset

The 2016 NIH-AAMP-Mayo Clinic low-dose CT Grand Challenge [32] provided a total of 2,378,512 × 512 abdominal CT scans from 10 patients. Under the parameter setting of 120 kVP tube voltage and 200 mAs effective dose, the image was scanned and reconstructed with CT scanner, and the image section thickness was 1 mm and 3 mm. To simulate images at low dose levels, Poisson noise was inserted into the projected data for each case to obtain LDCT images corresponding to a noise level of one-quarter of the full dose. We used 3 mm thickness slices, 2,167 images of nine patients for training, and 211 images of one patient for testing.

Piglet dataset

Real Piglet dataset [33] provided 850 CT images of 512 × 512 piglets with section thickness of 0.625 mm. Under the condition of 100 kVP tube voltage, GE scanner adjusted the tube current of 300 ∼ 15 mAs to perform CT scanning and reconstruction on piglets. Among them, 300 mAs is used as the conventional dose level, and the remaining dose level images are obtained by reducing the dose to 50 %, 25 %, 10 % and 5 % of the original dose, respectively. Four dose data of 50 %, 25 %, 10 % and 5 % were selected for training and testing. 525 Piglet images were selected for each dose as training input and 40 images were selected for testing.

Experimental detail setting

We use U-Net as the backbone of the proposed CECDM with an input size of 3 × 512 × 512. We optimized it using the Adam optimizer with a batchsize of four and a learning rate of 2 × 10⁻⁴, for a total of 100 k iterations of training. α₁, … , α_T is set to the cosine timesteps schedule [34] from 0.999 to 0, where s=0.008. The experiment was based on the Pytorch framework and trained and tested the network using a computer configured as an NVIDIA GeForce RTX 2080 SUPER GPU.

Analysis of experimental results

In order to test the effectiveness of the proposed CECDM algorithm, the proposed algorithm is compared with REDCNN [12], QAE [13], EDCNN [14], CTformer [19] and CoreDiff [25] in terms of visual assessment and quantitative assessment. Among them, objective evaluation indicators in quantitative assessment include the peak signal-to-noise ratio (PSNR) based on the grey diference of the pixel, the structural similarity measure (SSIM) based on the structural diference [35], the gradient similarity deviation (GMSD) based on the change of the gradient [36], the feature similarity measure (FSIM) based on the diference of the feature [37], and the implementation of the multiscale pixel domain (VIFs) based on visual perception [38] are used. GMSD represents the pixel similarity of the gradient amplitude map between the reference image and the processed image, and the lower the GMSD score of the processed result is. In addition to GMSD, the higher the value of other indicators, the better the quality of CT images after noise removal. The comparison experiments are derived from the official code, and their parameters are set according to the recommendations of the paper.

Mayo dataset visual assessment

Figure 4 show the processed results of two representative slices (denoted as Case 1 and Case 2) from the Mayo test dataset by different methods. All CT images of the axial view are displayed in the [−160, 240] HU window. Due to insufficient photons from incident X-rays, there are a lot of speckle noise and edge artifacts in LDCT images, making it difficult for physicians to determine clinically significant lesion information. From NDCT images (Figure 4(b1) and (b2)), clear lesions can be observed, but there are serious artifacts and noise in LDCT images, as shown in Figure 4(a1) and (a2), so it is difficult to sketch accurately. To show the noise reduction more clearly, we also zoom three ROI(region of interesting) areas (ROI1 in the lower right corner of Case 1, ROI2 in the upper left corner of Case 2, and ROI3 in the lower right corner) and marked them with red arrows and blue circles for better comparison. In general, these six methods can achieve a certain degree of denoising effect, and different denoising methods have different degrees of suppression of noise and artifacts in LDCT images.

Figure 4:

Comparison of processed images by different methods for Case 1 and Case 2. (a1, a2) LDCT, (b1, b2) NDCT, (c1, c2) REDCNN, (d1, d2) QAE, (e1, e2) EDCNN, (f1, f2) CTformer, (g1, g2) CoreDiff, (h1, h2) CECDM. ROI1 in the lower right corner of Case 1, ROI2 in the upper left corner of Case 2, and ROI3 in the lower right corner.

In contrast, REDCNN (Figure 4(c1) and (c2)), while eliminating some noise, suffers from image blurring and images that are too smooth and texture details are lost, such as the details marked by the red arrow in ROI2, the upper left corner of Figure 4(c2). QAE (Figure 4(d1) and (d2)) and CoreDiff (Figure 4(g1) and (g2)) can significantly eliminate noise, and the detail retention effect is better than REDCNN to a certain extent, but the visual effect of these two methods is blurry. EDCNN(Figure 4(e1) and (e2)) and CTformer (Figure 4(f1) and (f2)) are better at preserving image detail than the other methods, but fine striped artifacts are still visible at the red arrows of ROI1 and ROI2. In contrast, CECDM achieves better results in effectively removing noise/artifacts and preserving tissue/structure. The lesion contours of ROI1 (blue circles shown in Figure 4(e1) and (h1)) are easy to identify. It can be found that the lesion contours of EDCNN and CECDM are the clearest, while those of other methods are fuzzy, which confirms the role of edge enhancement module. In addition, CECDM has the clearest tissue edges in ROI3 in Figure 4 (h2). Compared with other images in ROI3 in Figure 4, CECDM has the strongest ability to remove noise and artifacts, while other methods have significant residual artifacts. The results show that CECDM has good texture preservation and noise suppression ability.

Mayo dataset quantitative assessment

In order to quantitatively evaluate the denoising effect of different methods, the average PSNR, SSIM, GMSD, FSIM and VIFs values of all test results of different algorithms are given in Table 1. The best values are shown in red and the next best values are shown in blue. The method in this paper has good quantitative performance on all objective indicators, and the improvement of PSNR and SSIM is also obvious.

Table 1:

Quantitative results of different algorithms on the mayo dataset (mean ± SD).

Method	PSNR ↑	SSIM ↑	GMSD ↓	FSIM ↑	VIFs ↑
LDCT	29.2443 ± 2.1138	0.8838 ± 0.0372	0.0804 ± 0.0190	0.9640 ± 0.0299	0.4834 ± 0.0531
REDCNN	33.0346 ± 1.6658	0.9188 ± 0.0271	0.0622 ± 0.0113	0.9714 ± 0.0270	0.5507 ± 0.0451
QAE	33.0647 ± 1.8117	0.9221 ± 0.0282	0.0559 ± 0.0124	0.9731 ± 0.0283	0.5481 ± 0.0486
EDCNN	32.7956 ± 1.6544	0.9188 ± 0.0274	0.0609 ± 0.0111	0.9717 ± 0.0264	0.5392 ± 0.0458
CTformer	33.0285 ± 1.7682	0.9199 ± 0.0291	0.0558 ± 0.0122	0.9720 ± 0.0298	0.5435 ± 0.0480
CoreDiff	33.1065 ± 1.8678	0.9250 ± 0.0273	0.0532 ± 0.0124	0.9758 ± 0.0260	0.5573 ± 0.0496
CECDM	33.4405 ± 1.8602	0.9278 ± 0.0266	0.0512 ± 0.0117	0.9794 ± 0.0218	0.5662 ± 0.0488

As can be seen from Table 1, the difference between REDCNN, QAE and CTformer in PSNR is small. But REDCNN had the lowest GMSD score, consistent with the visual judgment that the image was too smooth and had low contrast. EDCNN performs poorly on PSNR, GMSD, and VIFs, but has a smaller gap with other methods on other quantitative measures. QAE and CTformer are superior to the two typical CNN-based methods above on SSIM and FSIM. CoreDiff and CECDM are ahead of other methods on all indicators, which proves that diffusion models hold encouraging promise in the field of LDCT post-processing.

CECDM has the best value on several indexes, indicating that CECDM can achieve better noise/artifact suppression and feature information retention, and the results are close to the corresponding NDCT images. In summary, CECDM is superior to other comparative methods in visual effect and quantitative index analysis.

Piglet dataset visual assessment

Figure 5 shows the treatment results of representative piglet slices (Case 3) by different methods at different doses (50 %, 25 %, 10 %, 5 %). All CT images are displayed in window [−160, 240] HU. Figure 5(a1)–(a4) shows LDCT images at 150 mAs, 75 mAs, 30 mAs and 15 mAs. Figure 5(b1)–(b4) shows the corresponding NDCT images. Six comparison algorithms RED-CNN, Q-AE, EDCNN, CTformer, CoreDiff and CECDM. Comparing LDCT and NDCT images, it is found that the lower the X-ray dose, the richer the noise information in LDCT images. When the tube current drops to 15 mAs, the quality of LDCT image is significantly lower than that of NDCT image, and it is difficult to observe the tissue structure and detailed information.

Figure 5:

Image comparison of different processing methods in piglet dataset. (a1–a4) LDCT (the tube currents are in order 150 mAs, 75 mAs, 30 mAs and 15 mAs), (b1–b4) NDCT, (c1–c4) REDCNN, (d1–d4) QAE, (e1–e4) EDCNN, (f1–f4) CTformer, (g1–g4) CoreDiff, (h1–h4) CECDM.

We zoom two ROI areas (marked by red rectangles in Figure 5(b)) for better comparison, trained and tested the four doses of LDCT separately, and we can see that above methods have different performance in noise reduction effect when tested on 150 mAs and 75 mAs datasets. Similar to the results from the Mayo dataset, RED-CNN, EDCNN, CTformer, and Q-AE were effective in suppressing noise at high doses (i.e. 50 % and 25 %). However, the RED-CNN image is too smooth (ROI regions (c1), (c2) in the lower right corner of Figure 5). Although EDCNN can retain edge information and has good noise reduction effect, there is still a problem of blurred details in the image (as shown in Figure 5(e1)–(e2)). CTformer can obtain improved images of better quality, but there is a smoothing phenomenon (as shown in Figure 5(f1)–(f2) top left corner ROI regions). The results of CoreDiff and CECDM are superior to those of the above methods, and it can be seen that the method proposed in this paper has good performance in both ROI regions (Figure 5(h1)–(h2)). However, when the tube current is reduced to 30 mAs and 15 mAs respectively, the image quality is seriously polluted. The two diffusion model-based methods proposed by us, CECDM and CoreDiff, have remarkable denoising effects and are obviously superior to other methods.

Piglet dataset quantitative assessment

Figure 5 shows the PSNR and SSIM indicators for the optimal and sub-optimal values of different noise levels, denoted in red and blue respectively. As can be seen from Figure 5, the X-ray dose is closely related to the LDCT reduction results of the six algorithms on the piglet test set. The higher the X-ray dose, the better the image index, and the lower the dose, the worse the image index.

It can be seen that when the tube current is 150 mAs and 75 mAs the indexes of RED-CNN are improved, and the optimal and sub-optimal values are obtained in some indexes. However, the index gap of other methods is not obvious. When the tube currents are 30 mAs and 15 mAs, CoreDiff and CECDM methods based on cold diffusion are obviously better than other methods, and most of the indexes are optimal. In general, the CECDM method proposed in this paper has achieved good results on measurements of LDCT image denoising at different doses, and has strong robustness.

Ablation studies

In this section, we design several sets of ablation studies to analyze and verify the performance of the proposed CECDM model under different network component and compound loss. In addition, it is worth mentioning that the cross-validation strategy is no longer used in this part of the experiment, but a set of data tests from the Mayo dataset are fixed.

Diffusion timesteps T

The performance of CECDM in training and inference was evaluated at T=1, 10, 50 and 200. Figure 6 shows the images after noise reduction with different T values. When T=1, CECDM is a one-step restoration, the edge of the image after noise removal is blurred, and the texture details are not clear. With the increase of T, the organizational boundary becomes clearer. At T=200, the degree of detail retention of the image is better than that of the image at T=50, but the inference time also increases significantly.

Figure 6:

Ablation studies of CECDM with different T settings (a) LDCT, (b) NDCT, (c) T=1, (d) T=10, (e) T=50, (f) T=200.

When T=10, the PSNR and SSIM values of CECDM are the highest, and the image after noise removal is the closest to NDCT visually. When T≥50, the quantitative performance of CECDM decreased gradually. When T=200, PSNR increases to a certain extent. Therefore, considering the sharpness of the denoised image, quantitative performance, and inference time of the method, T=10 is a suitable setting for CECDM.

Network components and compound loss functions

TAM and Sobel+CBAM modules are used to improve the network structure and compound loss function is used to optimize the network performance. In order to prove the effectiveness of these improvements, a series of ablation studies are conducted to evaluate their effects on the performance of the noise reduction network. The results of quantitative analysis are shown in Table 2, and it can be seen that our proposed CECDM method is the most effective. All four different combinations of improvements have been improved to some extent, with Sobel+CBAM showing the most significant improvement. The proposed improvements have been improved to some extent, among which Sobel+CBAM has the most obvious improvement effect. Although the overall index of other improvements has not been greatly improved, there are still some gaps in some small details. In summary, the improvement strategy proposed in this paper has a positive effect on improving the quality of denoised images.

Table 2:

Quantitative analysis of different improvements.

	Network component		Loss function		PSNR	SSIM
TAM	√		√	√	34.4340	0.9251
Sobel+CBAM		√	√	√	34.6269	0.9269
MSE	√	√	√		34.6050	0.9265
CHAR	√	√		√	34.5024	0.9255
CECDM	√	√	√	√	34.7412	0.9278

Conclusion and discussion

In order to suppress the radiation dose and reduce the noise and artifacts generated by CT images, we proposed a low dose CT image denoising method based on the combination of edge enhancement and cold diffusion model, called CECDM, which replaced random gaussian noise with LDCT images as the end point of the diffusion process, simplified the forward degradation process, and greatly reduced the sampling steps. Based on Unet network structure, the trainable sobel operator and CBAM attention module are introduced, and the compound loss function of MSE+Charbonnier loss is used to guide the training process of the recovery network. Experiments on Mayo and Piglet datasets show that CECDM outperforms all comparison algorithms on both subjective visual and objective evaluation indicators, effectively removes noise and artifacts from LDCT images, and improves the clarity of reconstructed images. The results show that the model has strong generalization ability.

Although the cold diffusion model architecture used can greatly accelerate the inference process of the diffusion model, there is still a certain gap between it and other classical LDCT denoising methods (for example, the denoising time of QAE single image is 0.029 s), and it can be further optimized in this respect in the future. In addition to the limitation of inference speed, more in-depth studies are still needed to compare data sets of different clinical application scenarios and LDCT images of different radiation doses to achieve better noise reduction effects and strengthen the generalization ability of the model.

Corresponding author: Zhiguo Gui, State Key Laboratory of Dynamic Testing Technology, North University of China, Taiyuan 030051, China, E-mail: gzgtg@163.com

Funding source: the Fundamental Research Program of Shanxi Province

Award Identifier / Grant number: 202303021211148

Funding source: the Patent Transformation Project of Shanxi Province

Award Identifier / Grant number: 202302006

Research ethics: This work involved human subjects or animals in its research. Approval of all ethical and experimental procedures and protocols was granted by Biological and Medical Ethics Committee of North University of China. Application No. NUC20200827-003, and performed in line with the Research on artifact suppression algorithms for low-dose CT.
Informed consent: Not applicable.
Author contributions: The author have accepted responsibility for the entire content of this manuscript and approved its submission.
Use of Large Language Models, AI and Machine Learning Tools: None declared.
Conflict of interest: The author state no conflict of interest.
Research funding: This work was supported by the Fundamental Research Program of Shanxi Province under Grant 202303021211148; the Patent Transformation Project of Shanxi Province under Grant 202302006.
Data availability: The raw data can be obtained on request from the corresponding author.

References

1. Brenner, DJ, Hall, EJ. Computed tomography — an increasing source of radiation exposure. N Engl J Med 2007;357:2277–84. https://doi.org/10.1056/nejmra072149.Suche in Google Scholar

2. Mohd, SSV, George, SN. A review on medical image denoising algorithms. Biomed Signal Process Control 2020;61:102036. https://doi.org/10.1016/j.bspc.2020.102036.Suche in Google Scholar

3. Arabi, H, Zaidi, H. Spatially guided nonlocal mean approach for denoising of PET images. Med Phys 2020;47:1656–69. https://doi.org/10.1002/mp.14024.Suche in Google Scholar PubMed

4. Fumene Feruglio, P, Vinegoni, C, Gros, J, Sbarbati, A, Weissleder, R. Block matching 3D random noise filtering for absorption optical projection tomography. Phys Med Biol 2010;55:5401–15. https://doi.org/10.1088/0031-9155/55/18/009.Suche in Google Scholar PubMed PubMed Central

5. Ma, J, Huang, J, Feng, Q, Zhang, H, Lu, H, Liang, Z, et al.. Low-dose computed tomography image restoration using previous normal-dose scan. Med Phys 2011;38:5713–31. https://doi.org/10.1118/1.3638125.Suche in Google Scholar PubMed PubMed Central

6. Ma, Y, Wei, B, Feng, P, He, P, Yamauchi, Y, Wang, G. Low-dose CT image denoising using a generative adversarial network with a hybrid loss function for noise learning. IEEE Access 2020;8:67519–29. https://doi.org/10.1109/access.2020.2986388.Suche in Google Scholar

7. Chen, Y, Chen, W, Yin, X, Ye, X, Bao, X, Luo, L, et al.. Improving low-dose abdominal CT images by weighted intensity averaging over large-scale neighborhoods. Eur J Radiol 2011;80:e42–9. https://doi.org/10.1016/j.ejrad.2010.07.003.Suche in Google Scholar PubMed

8. Wang, L, Liu, Y, Wu, R, Liu, Y, Yan, R, Ren, S, et al.. Image processing for low-dose CT via novel anisotropic fourth-order diffusion model. IEEE Access 2022;10:50114–24. https://doi.org/10.1109/access.2022.3172975.Suche in Google Scholar

9. Hajiaboli, MR. An anisotropic fourth-order diffusion filter for image noise removal. Int J Comput Vis 2010;92:177–91. https://doi.org/10.1007/s11263-010-0330-1.Suche in Google Scholar

10. Zhong, J, Ning, R, Conover, D. Image denoising based on multiscale singularity detection for cone beam CT breast imaging. IEEE Trans Med Imag 2004;23:696–703. https://doi.org/10.1109/tmi.2004.826944.Suche in Google Scholar PubMed

11. Chen, Y, Shi, L, Feng, Q, Yang, J, Shu, H, Luo, L, et al.. Artifact suppressed dictionary learning for low-dose CT image processing. IEEE Trans Med Imag 2014;33:2271–92. https://doi.org/10.1109/tmi.2014.2336860.Suche in Google Scholar

12. Chen, H, Zhang, Y, Kalra, MK, Lin, F, Chen, Y, Liao, P, et al.. Low-dose CT with a residual encoder-decoder convolutional neural network. IEEE Trans Med Imag 2017;36:2524–35. https://doi.org/10.1109/tmi.2017.2715284.Suche in Google Scholar PubMed PubMed Central

13. Fan, F, Shan, H, Kalra, MK, Singh, R, Qian, G, Getzin, M, et al.. Quadratic autoencoder (Q-AE) for low-dose CT denoising. IEEE Trans Med Imag 2020;39:2035–50. https://doi.org/10.1109/tmi.2019.2963248.Suche in Google Scholar

14. Liang, T, Jin, Y, Li, Y, Wang, T, Feng, S, Lang, C. EDCNN: edge enhancement-based densely connected network with compound loss for low-dose CT denoising [Internet]. arXiv.org.; 2020. https://arxiv.org/abs/2011.00139 [Accessed 16 Jul 2024].Suche in Google Scholar

15. Shan, H, Padole, A, Homayounieh, F, Kruger, U, Khera, RD, Nitiwarangkul, C, et al.. Competitive performance of a modularized deep neural network compared to commercial algorithms for low-dose CT image reconstruction. Nat Mach Intell 2019;1:269–76. https://doi.org/10.1038/s42256-019-0057-9.Suche in Google Scholar PubMed PubMed Central

16. Kang, E, Koo, HJ, Yang, DH, Seo, JB, Ye, JC. Cycle‐consistent adversarial denoising network for multiphase coronary CT angiography. Med Phys 2018;46:550–62. https://doi.org/10.1002/mp.13284.Suche in Google Scholar PubMed

17. Yang, Q, Yan, P, Zhang, Y, Yu, H, Shi, Y, Mou, X, et al.. Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss. IEEE Trans Med Imag 2018;37:1348–57. https://doi.org/10.1109/tmi.2018.2827462.Suche in Google Scholar

18. Li, Z, Liu, Y, Shu, H, Lu, J, Kang, J, Chen, Y, et al.. Multi-scale feature fusion network for low-dose CT denoising. J Digit Imag 2023;36:1808–25. https://doi.org/10.1007/s10278-023-00805-0.Suche in Google Scholar PubMed PubMed Central

19. Wang, D, Fan, F, Wu, Z, Liu, R, Wang, F, Yu, H. CTformer: convolution-free Token2Token dilated vision transformer for low-dose CT denoising. Phys Med Biol 2023;68:065012–2. https://doi.org/10.1088/1361-6560/acc000.Suche in Google Scholar PubMed

20. Kang, J, Liu, Y, Zhang, P, Guo, N, Wang, L, Du, Y, et al.. FSformer: a combined frequency separation network and transformer for LDCT denoising. Comput Biol Med 2024;173:108378–8. https://doi.org/10.1016/j.compbiomed.2024.108378.Suche in Google Scholar PubMed

21. Yan, R, Liu, Y, Liu, Y, Wang, L, Zhao, R, Bai, Y, et al.. Image denoising for low-dose CT via convolutional dictionary learning and neural network. IEEE Trans Comput Imag. 2023;9:83–93. https://doi.org/10.1109/tci.2023.3241546.Suche in Google Scholar

22. Ho, J, Jain, A, Abbeel, P. Denoising diffusion probabilistic models. arxivorg [Internet]; 2020. Available from: https://arxiv.org/abs/2006.11239.Suche in Google Scholar

23. Xia, W, Lyu, Q, Wang, G. Low-dose CT using denoising diffusion probabilistic model for 20× speedup [Internet]. arXiv.org. 2022. https://arxiv.org/abs/2209.15136 [Accessed 26 Sep 2023].Suche in Google Scholar

24. Liu, X, Xie, Y, Diao, S, Tan, S, Liang, X. Diffusion probabilistic priors for zero-shot low-dose CT image denoising. Med Phys 2024. https://doi.org/10.1002/mp.17431.10.1002/mp.17431Suche in Google Scholar PubMed

25. Gao, Q, Li, Z, Zhang, J, Zhang, Y, Shan, H. CoreDiff: contextual error-modulated generalized diffusion model for low-dose CT denoising and generalization. IEEE Trans Med Imag 2024;43:745–59. https://doi.org/10.1109/tmi.2023.3320812.Suche in Google Scholar

26. Bansal, A, Borgnia, E, Chu, HM, Li, JS, Kazemi, H, Huang, F, et al.. Cold diffusion: inverting arbitrary image transforms without noise [Internet]. arXiv.org; 2022. https://arxiv.org/abs/2208.09392 [Accessed 25 Jun 2023].Suche in Google Scholar

27. Woo, S, Park, J, Lee, JY, Kweon, IS. CBAM: convolutional block attention module. Computer Vision – ECCV 2018;2018:3–19. https://doi.org/10.1007/978-3-030-01234-2_1.Suche in Google Scholar

28. Sobel, I, Feldman, G. An Isotropic 3×3 image gradient operator [Internet]. Semantic Scholar; 1990. Available from: https://www.semanticscholar.org/paper/An-Isotropic-3%C3%973-image-gradient-operator-Sobel-Feldman/1ab70add6ba3b85c2ab4f5f6dc1a448e57ebeb30.Suche in Google Scholar

29. Li, H, Yang, X, Yang, S, Wang, D, Jeon, G. Transformer with double enhancement for low-dose CT denoising. IEEE J Biomed Health Inf 2023;27:4660–71. https://doi.org/10.1109/jbhi.2022.3216887.Suche in Google Scholar

30. Lai, WS, Huang, JB, Ahuja, N, Yang, MH. Fast and accurate image super-resolution with deep laplacian pyramid networks. IEEE Trans Pattern Anal Mach Intell 2019;41:2599–613. https://doi.org/10.1109/tpami.2018.2865304.Suche in Google Scholar PubMed

31. Chen, Z, Niu, C, Gao, Q, Wang, G, Shan, H. LIT-former: linking in-plane and through-plane transformers for simultaneous CT image denoising and deblurring. IEEE Trans Med Imag 2024:1. https://doi.org/10.1109/tmi.2024.3351723.Suche in Google Scholar

32. Zhang, J, Gong, W, Ye, L, Wang, F, Zhibo, S, Cheng, Y. A Review of deep learning methods for denoising of medical low-dose CT images. Comput Biol Med 2024;171:108112–2. https://doi.org/10.1016/j.compbiomed.2024.108112.Suche in Google Scholar PubMed

33. Yi, X, Babyn, P. Sharpness-aware low-dose CT denoising using conditional generative adversarial network. J Digit Imag 2018;31:655–69. https://doi.org/10.1007/s10278-018-0056-0.Suche in Google Scholar PubMed PubMed Central

34. Nichol, A, Dhariwal, P. Improved denoising diffusion probabilistic models [Internet]. arXiv.org.; 2021. Available from: https://arxiv.org/abs/2102.09672?ref=assemblyai.com.Suche in Google Scholar

35. Wang, Z, Bovik, AC, Sheikh, HR, Simoncelli, EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 2004;13:600–12. https://doi.org/10.1109/tip.2003.819861.Suche in Google Scholar PubMed

36. Xue, W, Zhang, L, Mou, X, Bovik, AC. Gradient magnitude similarity deviation: a highly efficient perceptual image quality index. IEEE Trans Image Process 2014;23:684–95. https://doi.org/10.1109/tip.2013.2293423.Suche in Google Scholar PubMed

37. Zhang, L, Zhang, L, Mou, X, Zhang, D. FSIM: a feature similarity index for image quality assessment. IEEE Trans Image Process 2011;20:2378–86. https://doi.org/10.1109/tip.2011.2109730.Suche in Google Scholar

38. Sheikh, HR, Bovik, AC. Image information and visual quality. IEEE Trans Image Process 2006;15:430–44.‌10.1109/TIP.2005.859378Suche in Google Scholar

Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/bmt-2024-0362).

Received: 2024-07-17

Accepted: 2024-10-16

Published Online: 2024-11-06

Published in Print: 2025-04-28

This work is licensed under the Creative Commons Attribution 4.0 International License.

Supplementary Material

Artikel in diesem Heft

https://doi.org/10.1515/bmt-2024-0362

Schlagwörter für diesen Artikel

CBAM; cold diffusion; edge enhancement; improved sobel operator; LDCT denoising

Creative Commons

BY 4.0

Combination of edge enhancement and cold diffusion model for low dose CT image denoising

Artikel

Abstract

Objectives

Methods

Results

Conclusions

Introduction

Related work

Denoising diffusion probabilistic models (DDPM)

Cold diffusion

Convolutional block attention module

The proposed CECDM

Network architecture

Timestep adjustment module

Sobel convolution block

Loss function

Experiments and results

Mayo dataset

Piglet dataset

Experimental detail setting

Analysis of experimental results

Mayo dataset visual assessment

Mayo dataset quantitative assessment

Piglet dataset visual assessment

Piglet dataset quantitative assessment

Ablation studies

Diffusion timesteps T

Network components and compound loss functions

Conclusion and discussion

References

Supplementary Material

Zusatzmaterial

Artikel in diesem Heft

Artikel in diesem Heft

Artikel in diesem Heft