Startseite Automated lesion detection in endoscopic imagery for small animal models – a pilot study
Artikel Open Access

Automated lesion detection in endoscopic imagery for small animal models – a pilot study

  • Thomas Eixelberger ORCID logo EMAIL logo , Ralf Hackner , Qi Fang , Bisan Abdalfatah Zohud , Michael Stürzl ORCID logo , Elisabeth Naschberger ORCID logo und Thomas Wittenberg ORCID logo
Veröffentlicht/Copyright: 9. September 2025
Veröffentlichen auch Sie bei De Gruyter Brill

Abstract

Objectives

Small animal models, particularly mice, are crucial for studying gastrointestinal diseases like colorectal cancer. Tumor assessment via colonoscopy generates large video datasets, necessitating automated analysis due to limited resources and time-consuming manual review.

Methods

We employed a YOLOv7-based deep learning model pre-trained on human polyp images to detect tumors in mouse colonoscopy videos. Detection was enhanced using a stool detector and a color-based filter. Lesions were classified from ‘0’ (no tumor) to ‘5’ (tumor >50 % of colon diameter) using a custom ratio-based method. The system was evaluated on 150 videos from 28 mice over 6 weeks, with 125 videos containing tumors.

Results

Initial detection yielded a Precision of 0.576, Recall of 0.916, and Accuracy of 0.593. Adding the stool detector improved results to 0.932, 0.946, and 0.897, respectively. Compared to expert annotations, classification reached 0.759 Precision, 0.774 Recall, and 0.774 Accuracy over all five classes.

Conclusions

The proposed approach reliably detects and classifies colon tumors in mice, offering real-time support for preclinical endoscopic studies. Further evaluation will provide more insights into its performance.

Introduction

Small animal models have been established in the past years as key new tools to study gastro-intestinal diseases, including colitis in the gut mucosal immune system, as well as cancer development [1], [2], [3]. Such animal models, particularly mice, are also used for pre-clinical drug testing. They have provided valuable insights into the pathogenesis of various diseases and have also been used as model systems of mucosal immune responses by investigating the interplay of different immune cells or the analysis of the gut-vascular-barrier function [4]. Nevertheless, in order to analyze the tumor development, inflammatory bowel disease (IBD) or physiological changes of the colon tissue, the animals have to be sacrificed at the endpoint. Hence, small animal colonoscopy has been proven as an non-destructive protocol for medical and immunological research in order to examine the colon for inflammatory, vascular or neoplastic changes in-vivo over time [5], 6].

In the case of mice with colonic tumors or colitis, a short rigid endoscope with a small diameter is used to examine the mice’s colon. The mice are anesthetized and the endoscope is inserted via the anus. The colon is then filled with air to facilitate the examination. This procedure allows for the acquisition and storage of high-resolution (HD) colonoscopy image sequences, which can then be used for retrospective visual inspection to grade tumors and assess inflammation [4].

However, due to limited resources, including the time required to screen and evaluate the colonoscopic videos, there exists a need for automated assessment of large-scale video data obtained from small animal experiments. As recently, deep learning and machine learning approaches have been introduced in the clinical workflows to assist and support endoscopists in routine colonoscopic screening by providing visual hints about possible lesions in the colon on the video monitors [7], the goal of this work is to explore the applicability and usefulness of such deep learning approaches for assessing similar image data obtained from small animals [8].

Objectives

The primary objective of this work is to evaluate and support the detection of colorectal cancer in mouse models through the use of image analysis and deep learning methods applied to colonoscopy videos obtained from mice. This research aims to investigate and establish an AI based approach for automatic detection and classifying colorectal tumors in mice using a validated tumor scoring system. By comparing automatic detections with findings of experts, we evaluate our approach and depict the usage in cancer research.

Related work

Mouse models

Small animal resp. mice models are important tools in cancer research, providing critical insights into tumor biology, genetic and environmental influences on cancer development, and the efficacy of potential therapies [9]. Due to their genetic similarities to humans, short reproductive cycles, and the possibility to manipulate their genome, mice serve as valuable models for studying various aspects of cancer. For example, Tentler et al. [10] discuss specific Patient-derived tumor xenografts (PDTX) disease examples and give an overview of the opportunities and limitations of models n cancer drug development. Hidalgo et al. [11] also discuss challenges and limitations in the field of PDX and introduce a European consortium dedicated to PDX models. Hung et al. [12] detail the creation of a mouse models for sporadic and metastatic colon tumors, and its application in evaluating drug treatments.

Colonoscopy for mice

Colonoscopy is an essential technique in gastrointestinal patient care as well as research, it provides a direct visualization of the colon’s interior. In mouse models, colonoscopy is currently used to study various gastrointestinal diseases such as colon cancer [5], 13] and monitor the disease progression [14], [15], [16]. Olson et al. [13] have investigated the usage of colonoscopy for colonic diseases in animal models and they suggest that less mice models are needed due to serial investigations of animals. Kodani et al. [16] have developed and proposed a procedure for documenting the endoscopy of mice and its results. They introduced a novel scoring system consists of three parts: (1) assessment of the extent and severity of colorectal inflammation, (2) quantitative recording of tumor lesions, and (3) numerical sorting of clinical cases by their pathological and research relevance.

Deep-learning-based lesion detection

Early diagnosis and treatment of gastrointestinal diseases is primary based on lesion detection during an endoscopic examination of the gastrointestinal tract and the outcome of the examination is mainly depending on the expertise of the endoscopist. In the past years machine learning and deep learning approaches, and especially convolutional neural networks (DCNNs), have demonstrated their potential in automatic and enhanced lesion detection [17]. Wittenberg et al. [7] provide an overview of history and current developments in the field of automated machine-leaning-based polyp detection over the past 25 years, they outline the three development stages of AI-based lesion detection colonoscopy. Early experiments (phase 1) utilized handcrafted geometric features and simple decision schemes for the detection of prominent pedicled polyps. This phase was followed by texture-based approaches employing machine learning methods for tissue classification (phase 2). The latest advancements (phase 3) involve featureless methods, relying on deep convolutional deep neural networks for automated polyp detection and classification.

Currently a lot of research regarding novel DCNN architectures (such as YOLO networks) can be observed in this specific field, related to the wide availability of various public colonoscopy and endoscopy image data collections [18], [19], [20], [21] for network training and testing. Besides using the standard scores (F1, precision, accuracy) for evaluation, the run times of the different network types are of interest as well as the hardware used.

Recent studies in automatic polyp detection indicate that the one stage YOLO (‘you only look once’) architecture by Redmon et al. [22] is capable of achieving both high performance scores and real-time inference. Pacal et al. [23] recently improved the performance of their YOLOv3 and YOLOv4 network architectures for polyp detection by employing an advanced augmentation processes during the training step to virtually increase the amount of training data. Similarly, Li et al. [24] utilized large training datasets to achieve sensitivity, specificity, and F1-scores of 74.1 , 85.1, and 83.3 %, respectively. They also demonstrated that a YOLOv3 architecture can process up to 61.2 frames per second on a single Nvidia RTX2070. Recently, Wan et al. [25] evaluated the YOLOv5 on the Kvasir dataset as well as their own collection of colonoscopy data, achieving an F1-score of 0.907. Oliveira et al. [26] trained three versions of a YOLO network, namely YOLOv5, YOLOv7, and YOLOv8 architecture initially on the PICCOLO [18] dataset and then additionally with a comprehensive collection of polyp images. They achieved an accuracy of 92.2 %, a sensitivity of 69 %, an F1-score of 74 % and a mAP of 76.8 %.

Methods

The following section outlines the materials and methods used in conducting the experimental study, providing a comprehensive overview of the research approach and procedures employed.

Animals and protocols

Both female and male mice were included in the study. These mice were housed in specific pathogen-free conditions and had unrestricted access to a standard diet and water. They underwent routine pathogen screening following the guidelines of the Federation of European Laboratory Animal Science Associations (FELASA) [27]. All experiments were approved by the government of Lower Franconia and followed current guidelines for animal experiments.

To induce colorectal cancer (CRC), syngeneic tumor cells (KPN or VAKPT organoids) [28] were injected orthotopically into the colon wall using a high-resolution micro-endoscopy instrument (see Section “Micro endoscopy system”) while the mice were under anesthesia with Isoflurane. Following the injection (week 0), the mice were monitored on a weekly basis using the micro-endoscopy system until reaching the endpoint at week 6. During these regular follow-ups, high-definition (HD) endoscopic videos were recorded and stored digitally.

Micro endoscopy system

The experimental micro-endoscopy system (Karl-Storz, Tuttlingen, Germany) used for the injection of the syngeneic tumor cells as well as for the regular visual screening and data recording of the mice colonoscopy is depicted in Figure 1 [8]. The setup consist of a camera head (1) attached to a rigid endoscope (2), combined gas source needed for dilating the colon together with a light source (3) required for illumination which is connected by a glass fiber bundle to the endoscope, the endoscopy control system including USB storage device (4), a digital flat-screen monitor to depict the captured video data in real time (5), a Plexiglas chamber to narcotise the mice with Isoflurane gas (6), an absorption filter for the gas (7) and a Falcon tube where the mouse is exposed to the anesthesia (8).

Figure 1: 
Setup of the micro-colonoscopy system showing the equipment used for endoscopic mouse examinations camera head with (1) attached rigid endoscope (2), light and gas source (3), endoscopy system with USB storage (4), flat-screen monitor (5), Plexiglas chamber to narcotise mice with Isoflurane gas (6), absorption filter for the gas (7), and Falcon tube (8).
Figure 1:

Setup of the micro-colonoscopy system showing the equipment used for endoscopic mouse examinations camera head with (1) attached rigid endoscope (2), light and gas source (3), endoscopy system with USB storage (4), flat-screen monitor (5), Plexiglas chamber to narcotise mice with Isoflurane gas (6), absorption filter for the gas (7), and Falcon tube (8).

Image data

From the division of molecular and experimental surgery of the Universitätsklinikum Erlangen we received nseq=150 colonoscopic videos of mice, captured with the micro-endoscopy device, described in Section “Micro endoscopy system”, and prepared with the protocol, defined in Section “Animals and protocols”. The videos were captured from nmice=28 different animals and started at week 1 of the clinical study. Within this data collection ntumor=125 videos contain at least one tumor and while nnotumor=25 depict none. In total, nlesions=130 are depicted in the data. The collected videos have a total length of ttotal=1:05:08 h. The average length is t μ =26 s, with a standard deviation t σ =26 s. The median length is tmedian=18 s, with a minimum video duration of tmin=7 s and a maximum of tmax=3:30 min. All videos have a spatial resolution of 1,920 × 1,080 pixels and a frame rate of 30 frames per second. Every video was manually classified by a clinical expert according to a tumor score based on Becker et al. [6]. Each individual mouse was implanted with exactly one single tumor, and each video corresponds to one colonoscopy procedure of one mouse. Based on the video data, the experts classified the depicted tumor into one out of six possible classes c ∈ {0, 1, 2, 3, 4, 5}. To achieve this, they maneuvered the endoscope to examine the entire colon and assessed the depicted tumor’s size in relationship to the colon size. They also annotated the sub-sequence along with the corresponding areas where the tumor was visible. Figure 2 shows some example frames from the recorded videos with increasing tumor scores ranging from ‘0’ to ‘5’ (top left to bottom right).

Figure 2: 
Example images for different tumor scores (“0”–“5”) from the received video sequences; (a) tumor score ‘0’, (b) tumor score ‘1’, (c) tumor score ‘2’, (d) tumor score ‘3’, (e) tumor score ‘4’, and (f) tumor score ‘5’.
Figure 2:

Example images for different tumor scores (“0”–“5”) from the received video sequences; (a) tumor score ‘0’, (b) tumor score ‘1’, (c) tumor score ‘2’, (d) tumor score ‘3’, (e) tumor score ‘4’, and (f) tumor score ‘5’.

Figure 3 depicts the distribution of the annotated tumor scores within the video data collection. The pie chart (left) shows the distribution of every score within the whole data set. The violin plot (right) reveals the trend of the distribution from the starting point week t0=1 to the endpoint at week t5=6.

Figure 3: 
Distribution of different tumor scores of the available video sequences (left); distribution of tumor score from week 1 to 6 of the clinical study (right).
Figure 3:

Distribution of different tumor scores of the available video sequences (left); distribution of tumor score from week 1 to 6 of the clinical study (right).

Network architecture

For our proposed lesion detection approach, we utilized a YOLOv7-tiny detection network proposed by Wang et al. [29]. The YOLOv7-tiny architecture demonstrates effective performance in lesion detection while ensuring real-time processing capabilities, even on lower-end hardware [8]. It delivers a good balance between performance and efficiency.

Training of the network

The network used was originally trained on a image data collection consisting of 36,596 human colonoscopic images with annotated lesions. The training data combines both publicly available colonoscopy images from the LDPolyp data collection [30] (21,839 images) as well as a non-public data collection (14,757 images) from the Deep Colonoscopy project [31].

During the training process of the YOLOv7-tiny network, a learning rate of 0.01 was used, along with the stochastic gradient descent (SGD) optimizer with a momentum of 0.937. Data augmentation techniques were automatically applied, including random variations of the weights for the HSV color space, with fractions of hue h=0.015, saturation s=0.7, and value v=0.4. The images were also subjected to random translations (±0.1 %), scaling (±0.5 %), as well as flipping (50 %). Additionally, the YOLO framework’s own augmenting steps were employed, such as mix-up (5 %) and creating a mosaic (100 %) with input data. The batch size for the training data was set to 32. After 200 epochs of training the network was tested on various publicly available datasets, namely the CVC-ClinicDB [32], the Kvasir [19], and the PICCOLO [18] datasets. Overall, the trained network achieved for human colonoscopic lesion a precision of prec=0.92, recall of rec=0.90, and a F1-score of 0.91, using an intersection over union (IoU) threshold of 30 %. These performance metrics demonstrate the effectiveness of the trained network in accurately detecting lesions in human colonosocopic image data.

Pre- and post-processing steps

As there exist some visual differences between colonoscopic images of ‘mice and men’ (Steinbeck [33]),such differences in the textures and vascularization of the colon tissue as well as missing flexures and missing haustria in the mice colon, some pre- and post processing steps were added to the DCNN lesion detection approach (see Section “Network architecture”).

Tracking of detection results

During the continuous analysis of colonoscopy videos, the detection network may encounter false detections (‘false positives’) or fail to detect lesions in individual frames (‘false negatives’). To resolve this matter, we included the ‘Simple Online Realtime Tracking’ (SORT) algorithm as an enhancement of the neural network, originally proposed by Bewley et al. [34]. This approach can be used to track a set of multiple bounding boxes of detected objects if interest over a predefined number successive image frames. To validate the detection of lesions, we require that the related bounding boxes can be tracked over at least 10 consecutive frames and furthermore that the overlap between these bounding boxes exceeds 50 %. Internally, the SORT algorithm utilizes a Kalman filter to predict missing detections from the YOLOv7-tiny network, hence allowing for interpolation of up to three missing predictions. This helps to improve the overall accuracy and consistency of the detection results.

Out-of-colon detection

In many colonoscopy video sequences, there exist short image segments where the tip of the endoscope is still positioned outside the mouse colon. Unfortunately, the deep neural network is unable to identify these areas and thus yield false-positive detections in these sequences. To deal with this issue and prevent such false detections, we leverage the color information of the image frames. To detect such ‘outside-frames’, first, the RGB images are converted to the HSV color space. Then we extract all pixels that are not classified as ‘red’ by identifying areas in the hue channel that are not within the range of h 30,300 . Additionally, frames taken outside of the colon tend to be darker. Hence, we detect areas in the frame where the pixel value in the value channel is v<14. These two masks are then combined, and if the area of the masked region exceeds 50 % of the frame, it is classified as ‘being outside of the colon’.

Fecal debris

In some videos, there exist image segments that depict fecal debris (stool), see Figure 4b, which are often misclassified as tumors by the deep neural network, as fecal debris in mice colonoscopy depicts similarities to lesions in the human colons, hence resulting in false-positive detections. To address this issue, we included the artifact detector proposed by Kress et al. [35] into our system, which is able to detect organic artifacts such as fecal debris or blood as well as objects such as surgical instruments. Specifically, we utilized the ‘stool’ class from the segmentation network, considering only detected objects with a confidence threshold of θ conf debris = 0.01 . However, not all fecal debris parts are detected correctly by this approach. To further improve the accuracy, we applied an additional image analysis filter to identify ‘yellowish’ pixels. This was achieved by converting the RGB image to the HSV color space and extracting the pixels from the hue-channel within the range of h 20,65 . The extracted ‘yellowish’ pixels were then combined with the detections from the segmentation network. If the combined area exceeded 50 % of the bounding box area, the detection was classified as ‘stool’ instead of a ‘tumor’.

Figure 4: 
Example images of the detection network and the stool detector. The green bounding box represents the prediction of the neural network. (a) True positive detected stool (stool augmented in pink), (b) false negative stool, and (c) false positive detected tumor.
Figure 4:

Example images of the detection network and the stool detector. The green bounding box represents the prediction of the neural network. (a) True positive detected stool (stool augmented in pink), (b) false negative stool, and (c) false positive detected tumor.

Classification algorithm

According to Becker et al. [6] tumors in mice can be classified into six different groups according to the average diameter d of the lesion. For the classification of the detected lesions we approximate the average diameter d = (w + h)/2, where w is the width and h the height of the detected bounding box (see Figure 5). The approximated diameter d (in pixel) is then scaled by the diameter D of the endoscopic image yielding r=d/D. Using this ratio r, the detected lesions can be classified according to the tumor score S:

(1) S = 0 , r = 0 1 , 0 < r 0.1 2 , 0.1 < r 0.125 3 , 0.125 < r 0.25 4 , 0.25 < r 0.5 5 , r 0.5
Figure 5: 
Example images of lesion detection network and classification. The green bounding boxes represent the prediction of the neural network. Images (a–e) are correctly classified as lesions with tumor scores ‘1’ – ‘5’; (f) lesion classified falsely as tumor score ‘4’ instead of ‘3’.
Figure 5:

Example images of lesion detection network and classification. The green bounding boxes represent the prediction of the neural network. Images (a–e) are correctly classified as lesions with tumor scores ‘1’ – ‘5’; (f) lesion classified falsely as tumor score ‘4’ instead of ‘3’.

During the imaging procedure the colonocope is withdrawn through the colon and hence the tumor size depends strongly on the viewing angle and the distance between endoscope tip and lesion. To detect and classify a tumor in the mouse colon, the endoscopist tries to keep the tip of the endoscope steady to capture a comprehensive view of the colon for visual classification (relationship tumor size to colon size). This specific time span was used to calculate the tumor score for our evaluation. The class c that was most frequently predicted by the algorithm within the subsequence was utilized for evaluation.

Results

The following section presents the results and findings obtained from the evaluation of our tumor detection and classification approach.

Detection results

The complete sequence of a video, S=1, …, N, includes all frames of the video, while the sub-sequence SpredS includes all frames where the AI predicts a tumor. SlabelS is defined as sub-sequence where an expert detects a tumor. A frame is defined as correct when the Intersection over Union (IoU) between the predicted region and the annotated region in this single frame is over 50 %. Otherwise if the predicted region has a IoU ≤50 % the frame is defined as incorrect. If the experts annotated a region in a frame but the AI detects no tumor the frame is classified as overseen. Based on the available image data and AI-based methods for lesion and debris detection in colonoscopic image sequences of mice, the following groups can be identified:

  1. True-positive (TP): A sub-sequence is true-positive if more than 50 % of Slabel is classified as correct, see example in Figure 4a.

  2. False-positive (FP): A sub-sequence is false-positive if more than 50 % of Spred is incorrectly predicted. Such objects are either remains of stool or tissue folds. See Figure 4b and c for examples.

  3. False-negative (FN): A sub-sequence is false-negative if more than 50 % of Slabel is overseen.

  4. True negative (TN): A sequence is defined as true-negative if there is no Slabel and Spred in the sequence.

The performance metrics are calculated with the following equations

(2) P r e c i s i o n = T P T P + F P R e c a l l = T P T P + F N F 1 - score = 2 P r e c i s i o n R e c a l l P r e c i s i o n + R e c a l l A c c u r a c y = T P + T N T P + F P + T N + F N

The inclusion of the fecal debris detector has significantly improved the performance of the lesion detection approach, as documented in Table 1. Including the fecal debris detection achieved a precision of prec=1.000, recall of rec=0.940, and an F1-score of 0.969.

Table 1:

Results of the lesion detection approach with and without the fecal debris detector.

TP FP FN TN Precision Recall F1-score Accuracy
Detection without fecal debris detector 76 56 7 16 0.576 0.916 0.707 0.593
Detection with stool detector 123 9 7 16 0.932 0.946 0.939 0.897
  1. Best performance metrics highlighted in bold.

Classification results

The performance metrics of the classification approach are presented in Table 2, while Figure 5 showcases true-positive examples for each tumor score and one example with an incorrect classification.

Table 2:

Results of the classification algorithm, with the best performance metrics highlighted in bold.

Tumor score Count TP FP FN Precision Recall F1-score Accuracy
‘0’ 25 16 7 9 0.696 0.640 0.667 0.640
‘1’ 20 13 8 7 0.619 0.650 0.634 0.650
‘2’ 7 3 3 4 0.500 0.429 0.462 0.429
‘3’ 15 10 6 5 0.625 0.667 0.645 0.667
‘4’ 38 37 14 1 0.725 0.974 0.831 0.974
‘5’ 50 41 0 9 1.000 0.820 0.901 0.820
All classes 155 120 38 35 0.759 0.774 0.767 0.774
  1. Best performance metrics highlighted in bold.

The confusion matrix (Figure 6) provides a visual representation of the classification performance.

Figure 6: 
The confusion matrix provided depicts the classification of tumor scores, ranging from ‘0’ to ‘5’, as described by Becker et al. [6].
Figure 6:

The confusion matrix provided depicts the classification of tumor scores, ranging from ‘0’ to ‘5’, as described by Becker et al. [6].

Discussion

The lesion detection network originally has a tendency to misclassify fecal debris as tumor lesions (see Table 1 for detection results). This phenomena is based on the circumstance that the initial polyp detector was solely trained using human colonoscopic image data, where polyps covered by stool remains. Hence, the deep neural network has learned to associate stool with the presence of polyps in human data, but which is not the case with mice data. Thus, the inclusion of a fecal debris detector was a fundamental point in our research. Figure 4a shows correctly detected stool remains in mice, while Figure 4b depicts overseen fecal debris. Nevertheless, the overseen mice debris depict different colors compared to the correctly detected ones and appears more ‘reddish’ than ‘yellowish’. The applied fecal debris detector only missed three instances of stool and never falsely classifies a tumor as stool. Thus the precision of the detection network is significantly improved from prec=0.576 to prec=0.932, which is close to the precision (0.91) achieved applying it on human colonoscopic images [31]. The slightly higher value can be attributed to the fecal debris detector, which cannot be applied to human data. This is because some human polyps may be obscured by stool, and excluding these detections could lead to missed diagnoses. The value for the recall is even higher due to the lower confidence threshold ( θ conf mice = 0.2 for mice data compared to θ conf human = 0.3 for human data), hence resulting in a higher sensitivity of the deep neural network. The lower confidence threshold is used because the non-inflamed parts of the mice colon tissue depict less vascular texture compared to human colon tissue. The lesion detection network reacts less sensitive on mice colon data hence to compensate that effect the confidence needs to have a lower value. Additionally, the SORT algorithm [34] is employed to filter out single false positive detections.

Many of the false positive tumor detections in the colonoscopic mice imagery are actually related to the subtle tissue folds of the colon (see Figure 4c for example). In contrast, when detecting polyps inside the human colon, they are often located on haustras. Similar to the association made with polyps covered with stool remains, the deep neural network also tends to consider these folds as an indication of tumors. However, this is not as much of a concern in mice data since the mouse colon has fewer haustras compared to humans. To minimize these errors, a possible solution in future investigations could be to remove images of polyps on haustras from the training dataset. This step could possibly ensure that the deep neural network is trained without learning the association between haustras and tumors.

Our approach specifically demonstrates a robustness regarding the detection and classification of lesions with a high tumor scores, as indicated by Table 2 and the confusion matrix shown in Figure 6. The distribution of tumor scores is unbalanced due to the clinical study protocol described in Section “Network architecture”. At the initial stage of the study (week 1), mice exhibit no or minimal detectable tumors (score ‘0’ or ‘1’). By week 2, tumors are likely to have reached score ‘3’ or ‘4’, depending on their individual growth speed. From week 3 to 6, most tumors are classified as tumor score ‘5’ (see Figure 3 for tumor distribution). As a result, the tumor score ‘2’ is underrepresented in the dataset.

Also the classification algorithm encounters difficulties in correctly classifying tumors with a tumor score of ‘2’ and ‘3’. One such issue is illustrated in Figure 5f, where the tumor grows at an approximate angle of 45°. The deep neuronal networks used for detection is correctly not capable to rotate bounding boxes. Consequently, a small but elongated tumor results in a square-shaped bounding box, leading to an overestimated average tumor diameter. To address this, the use of rotated bounding boxes or segmenting the exact tumor area would be helpful in a next step. Additionally, the expert annotations are currently estimated visually based on the image data rather than measuring precisely or obtaining biopsies. Thus, the classification remains subjective and the borders of the classes are vague.

Conclusions

Our approach shows robustness in detecting and classifying tumors within the mouse colon, making it a potentially a valuable tool for researchers conducting clinical studies. The real-time capability would allows using our approach directly during endoscopy procedures. Further evaluation data will provide more insights into the performance of the proposed method. Based on the results obtained, re-training of the lesion detection network (e.g. without lesions on haustra) could be considered to enhance the detection rate. To improve the classification accuracy of tumor scores ‘2’ and ‘3’ the usage of the YOLOv7-mask framework (as described in Wang et al. [29]) can be useful.


Corresponding author: Thomas Eixelberger, Chair of Visual Computing, Friedrich-Alexander-University Erlangen-Nürnberg, Cauerstr. 11, 91058 Erlangen, Germany; and Digital Health and Analytics, Fraunhofer Institute for Integrated Circuits IIS, Am Wolfsmantel 33, 91058 Erlangen, Germany, E-mail:

Award Identifier / Grant number: TRR/SFB 305 B08

Award Identifier / Grant number: TRR/SFB 305 Z1

  1. Research ethics: Ethics approval and consent to participate: They underwent routine pathogen screening following the guidelines of the Federation of European Laboratory Animal Science Associations (FELASA). All experiments were approved by the government of Lower Franconia and followed current guidelines for animal experiments.

  2. Informed consent: Not applicable.

  3. Author contributions: T.E: Methodology (equal); Image processing (lead); Statistical evaluation (lead); writing-original draft (lead); Writing-review and editing (equal). R.H: Methodology (equal); Image processing (support); Q.F: Execution of animal study (equal); Ground-truth annotation (equal). B.Z: Execution of animal study (equal); Ground-truth annotation (equal). M.S: Conceptualization of animal study; Supervision (equal). E.N: Organizational execution of animal study; Writing-original draft (support); Writing-review and editing (equal); Supervision (equal). T.W: Methodology (equal); Writing-original draft (support); Writing-review and editing (equal); Supervision (equal).

  4. Use of Large Language Models, AI and Machine Learning Tools: None declared.

  5. Conflict of interest: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  6. Research funding: This work has partially been supported by the German Research Society (DFG) under the Grant TRR/SFB 305, sub projects B08 (to E.N) & Z1 (to T.W).

  7. Data availability: Not applicable.

References

1. Chen, C, Neumann, J, Kühn, F, Lee, SM, Drefs, M, Andrassy, J, et al.. Establishment of an endoscopy-guided minimally invasive orthotopic mouse model of colorectal cancer. Cancers 2020;12:3007. https://doi.org/10.3390/cancers12103007.Suche in Google Scholar PubMed PubMed Central

2. Taketo, MM, Edelmann, W. Mouse models of Colon cancer. Gastroenterology 2009;136:780–98. https://doi.org/10.1053/j.gastro.2008.12.049.Suche in Google Scholar PubMed

3. Tammariello, AE, Milner, JA. Mouse models for unraveling the importance of diet in colon cancer prevention. J Nutr Biochem 2010;21:77–88. https://doi.org/10.1016/j.jnutbio.2009.09.014.Suche in Google Scholar PubMed PubMed Central

4. Regensburger, D, Tenkerian, C, Pürzer, V, Schmid, B, Wohlfahrt, T, Stolzer, I, et al.. Matricellular protein SPARCL1 regulates blood vessel integrity and antagonizes inflammatory bowel disease. Inflamm Bowel Dis 2021;27:1491–502. https://doi.org/10.1093/ibd/izaa346.Suche in Google Scholar PubMed PubMed Central

5. Becker, C, Fantini, MC, Wirtz, S, Nikolaev, A, Kiesslich, R, Lehr, HA, et al.. In vivo imaging of colitis and colon cancer development in mice using high resolution chromoendoscopy. Gut 2005;54:950–4. https://doi.org/10.1136/gut.2004.061283.Suche in Google Scholar PubMed PubMed Central

6. Becker, C, Fantini, MC, Neurath, MF. High resolution colonoscopy in live mice. Nat Protoc 2006;1:2900–4. https://doi.org/10.1038/nprot.2006.446.Suche in Google Scholar PubMed

7. Wittenberg, T, Raithel, M. Artificial intelligence-based polyp detection in colonoscopy: where have we been, where do we stand, and where are we headed? Visc Med 2020;36:428–38. https://doi.org/10.1159/000512438.Suche in Google Scholar PubMed PubMed Central

8. Eixelberger, T, Fang, Q, Zohud, BA, Hackner, R, Jackstadt, R, Stürzl, M, et al.. Automated lesion detection in endoscopic imagery for small animal models. In: Maier, A, Deserno, TM, Handels, H, Maier-Hein, K, Palm, C, Tolxdorff, T, editors. Bildverarbeitung für die Medizin 2024. Wiesbaden: Springer Fachmedien Wiesbaden; 2024:190–5 pp.10.1007/978-3-658-44037-4_54Suche in Google Scholar

9. Zhang, W, Moore, L, Ji, P. Mouse models for cancer research. Chin J Cancer 2011;30:149–52. https://doi.org/10.5732/cjc.011.10047.Suche in Google Scholar PubMed PubMed Central

10. Tentler, JJ, Tan, AC, Weekes, CD, Jimeno, A, Leong, S, Pitts, TM, et al.. Patient-derived tumour xenografts as models for oncology drug development. Nat Rev Clin Oncol 2012;9:338–50. https://doi.org/10.1038/nrclinonc.2012.61.Suche in Google Scholar PubMed PubMed Central

11. Hidalgo, M, Amant, F, Biankin, AV, Budinská, E, Byrne, AT, Caldas, C, et al.. Patient-Derived xenograft models: an emerging platform for translational cancer research. Cancer Discov 2014;4:998–1013. https://doi.org/10.1158/2159-8290.CD-14-0001.Suche in Google Scholar PubMed PubMed Central

12. Hung, KE, Maricevich, MA, Richard, LG, Chen, WY, Richardson, MP, Kunin, A, et al.. Development of a mouse model for sporadic and metastatic colon tumors and its use in assessing drug treatment. Proc Natl Acad Sci 2010;107:1565–70. https://doi.org/10.1073/pnas.0908682107.Suche in Google Scholar PubMed PubMed Central

13. Olson, TJP, Halberg, RB. Experimental small animal colonoscopy. In: Miskovitz, P, editor. Colonoscopy. Rijeka: IntechOpen; 2011:Ch. 19 p.Suche in Google Scholar

14. Adachi, T, Hinoi, T, Sasaki, Y, Niitsu, H, Saito, Y, Miguchi, M, et al.. Colonoscopy as a tool for evaluating colorectal tumor development in a mouse model. Int J Colorectal Dis 2014;29:217–23. https://doi.org/10.1007/s00384-013-1791-9.Suche in Google Scholar PubMed

15. Soletti, RC, Alves, KZ, de Britto, MAP, de Matos, DG, Soldan, M, Borges, HL, et al.. Simultaneous follow-up of mouse colon lesions by colonoscopy and endoluminal ultrasound biomicroscopy. World J Gastroenterol 2013;19:8056–64. https://doi.org/10.3748/wjg.v19.i44.8056.Suche in Google Scholar PubMed PubMed Central

16. Kodani, T, Rodriguez-Palacios, A, Corridoni, D, Lopetuso, L, Di Martino, L, Marks, B, et al.. Flexible colonoscopy in mice to evaluate the severity of colitis and colorectal tumors using a validated endoscopic scoring system. United States; 2013.10.3791/50843-vSuche in Google Scholar

17. Hoerter, N, Gross, SA, Liang, PS. Artificial intelligence and polyp detection. Curr Treat Options Gastroenterol 2020;18:120–36. https://doi.org/10.1007/s11938-020-00274-2.Suche in Google Scholar PubMed PubMed Central

18. Sanchez-Peralta, LF, Pagador, JB, Picón, A, Calderon, AJ, Polo, F, Andraka, N, et al.. PICCOLO white-light and narrow-band imaging colonoscopic dataset: a performance comparative of models and datasets. Appl Sci 2020;10. https://doi.org/10.3390/app10238501.Suche in Google Scholar

19. Pogorelov, K, Randel, KR, Griwodz, C, Eskeland, SL, de Lange, T, Johansen, D, et al.. KVASIR: a multi-class image dataset for computer aided gastrointestinal disease detection. In: Proc’s 8th ACM on Multimedia Systems Conference. MMSys’17. Taipei, Taiwan; 2017:164–9 pp.10.1145/3083187.3083212Suche in Google Scholar

20. Wang, W, Tian, J, Zhang, C, Luo, Y, Wang, X, Li, J. An improved deep learning approach and its applications on colonic polyp images detection. BMC Med Imag 2020;20. https://doi.org/10.1186/s12880-020-00482-3.Suche in Google Scholar PubMed PubMed Central

21. Ali, S, Jha, D, Ghatwary, N, Realdon, S, Cannizzaro, R, Salem, OE, et al.. A multi-centre polyp detection and segmentation dataset for generalisability assessment. Sci Data 2023;10:75. https://doi.org/10.1038/s41597-023-01981-y.Suche in Google Scholar PubMed PubMed Central

22. Redmon, J, Divvala, S, Girshick, R, Farhadi, A. You only look once: unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016;779–88. https://doi.org/10.1109/cvpr.2016.91.Suche in Google Scholar

23. Pacal, I, Karaman, A, Karaboga, D, Akay, B, Basturk, A, Nalbantoglu, U, et al.. An efficient real-time colonic polyp detection with YOLO algorithms trained by using negative samples and large datasets. Comput Biol Med 2022;141:105031. https://doi.org/10.1016/j.compbiomed.2021.105031.Suche in Google Scholar PubMed

24. Li, JW, Chia, T, Fock, KM, Chong, KDW, Wong, YJ, Ang, TL. Artificial intelligence and polyp detection in colonoscopy: use of a single neural network to achieve rapid polyp localization for clinical use. J Gastroenterol Hepatol 2021;36:3298–307. https://doi.org/10.1111/jgh.15642.Suche in Google Scholar PubMed

25. Wan, J, Chen, B, Yu, Y. Polyp detection from colorectum images by using attentive YOLOv5. Diagnostics 2021;11. https://doi.org/10.3390/diagnostics11122264.Suche in Google Scholar PubMed PubMed Central

26. Oliveira, F, Barbosa, D, Paçal, I, Leite, D, Cunha, A. Automatic detection of polyps using deep learning. Switzerland: Springer Nature; 2024:254–63 pp.10.1007/978-3-031-60665-6_19Suche in Google Scholar

27. Guillen, J. FELASA guidelines and recommendations. JAALAS 2012;51:311–21.Suche in Google Scholar

28. Jackstadt, R, van Hooff, SR, Leach, JD, Cortes-Lavaud, X, Lohuis, JO, Ridgway, RA, et al.. Epithelial NOTCH signaling rewires the tumor microenvironment of colorectal cancer to drive poor-prognosis subtypes and metastasis. Cancer Cell 2019;36:319–36.e7. https://doi.org/10.1016/j.ccell.2019.08.003.Suche in Google Scholar PubMed PubMed Central

29. Wang, CY, Bochkovskiy, A, Liao, HYM. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023.10.1109/CVPR52729.2023.00721Suche in Google Scholar

30. Ma, Y, Chen, X, Cheng, K, Li, Y, Sun, B. LDPolypVideo benchmark: a large-scale colonoscopy video dataset of diverse polyps. In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2021; 2021:387–96 pp.10.1007/978-3-030-87240-3_37Suche in Google Scholar

31. Eixelberger, T, Wolkenstein, G, Hackner, R, Bruns, V, Mühldorfer, S, Geissler, U, et al.. YOLO networks for polyp detection: a human-in-the-loop training approach. Curr Dir Biomed Eng 2022;8:277–80. https://doi.org/10.1515/cdbme-2022-1071.Suche in Google Scholar

32. Bernal, J, Sánchez, FJ, Fernández-Esparrach, G, Gil, D, Rodríguez, C, Vilariño, F. WM-DOVA maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. Comput Med Imag Graph: Off J Comput Med Imaging Soc 2015;43:99–111. https://doi.org/10.1016/j.compmedimag.2015.02.007.Suche in Google Scholar PubMed

33. Steinbeck, J. Of mice and men. New York: Covici-Friede; 1937.Suche in Google Scholar

34. Bewley, A, Ge, Z, Ott, L, Ramos, F, Upcroft, B. Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP); 2016:3464–8 pp.10.1109/ICIP.2016.7533003Suche in Google Scholar

35. Kress, V, Wittenberg, T, Raithel, M, Bruns, V, Lehmann, E, Eixelberger, T, et al.. Automatic detection of foreign objects and contaminants in colonoscopic video data using deep learning. Curr Dir Biomed Eng 2022;8:321–4. https://doi.org/10.1515/cdbme-2022-1082.Suche in Google Scholar

Received: 2025-05-07
Accepted: 2025-08-28
Published Online: 2025-09-09

© 2025 the author(s), published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Heruntergeladen am 21.9.2025 von https://www.degruyterbrill.com/document/doi/10.1515/bmt-2025-0179/html
Button zum nach oben scrollen