Home IG-YOLOv5-based underwater biological recognition and detection for marine protection
Article Open Access

IG-YOLOv5-based underwater biological recognition and detection for marine protection

  • Jialu Huo EMAIL logo and Qing Jiang
Published/Copyright: December 29, 2023
Become an author with De Gruyter Brill

Abstract

Underwater biological detection is of great significance to marine protection. However, the traditional target detection techniques have some challenges, such as insufficient feature extraction for small targets and low feature utilization rate. To address these challenges, an underwater biological detection model IG-YOLOv5 based on the idea of feature reuse is proposed. An Improved-Ghost module with feature reuse is designed. The module adds batch normalization operations to the identity mapping branch using the Add operation with feature fusion and the Sigmoid Linear Unit activation function with smoother zeros. The proposed model uses the Improved-Ghost module to reconstruct the CSPDarknet structure of YOLOv5, so as to realize the lightweight and accuracy improvement of the model. In addition, in order to solve the problem of target size and shape change in underwater environment, the optimized loss function is Wise-IoU v3, which is used to evaluate the accuracy and robustness of detection results. The results show that the IG-YOLOv5 model performs well in the 2021URPPC data set, with 0.5 mAP reaching 74.2, 4.3% higher than that of YOLOv5 model, and 2.7 less floating-point operations. In a word, IG-YOLOv5 model has high accuracy and robustness in underwater target detection, and Wise-IoU index can evaluate the quality of target detection results more accurately, which is suitable for underwater robots, underwater monitoring, and other fields and has a practical application value.

1 Introduction

Underwater biological detection can provide a detailed observation and evaluation of marine biological community, so as to better understand the distribution, quantity, and ecological role of marine life [1,2]. This information holds substantial importance for marine resource management and preservation, offering a crucial scientific foundation for the efficient utilization of these resources. It is helpful in the rational development of underwater resources, especially marine resources, and will bring high-quality development to coastal areas [3,4]. Scientists can monitor the changes of marine biodiversity, identify rare species, monitor the number of different species, and understand their distribution. This is very important for protecting endangered species and maintaining ecological balance. In addition, underwater biological detection can also help to assess the health of the marine environment, monitor and predict the changes in the marine environment, assess the influence of human actions on marine ecosystems and the environment, and take effective management measures. Moreover, it urges the government, environmental protection organizations, and relevant parties to take action to jointly protect our precious marine resources. For example, using detection technology, we can track the migration pattern of fish, avoid overfishing, and ensure the sustainable development of fisheries. Scientists use biological detection technology to monitor the impact of marine pollution on marine life, such as plastic pollution, chemical pollution, or noise pollution. Through long-term data collection, we can better understand how pollution affects marine ecology and formulate corresponding countermeasures.

In a word, by adopting underwater biological detection technology, we can better protect marine resources, maintain ecological balance, and ensure that future generations can benefit from a healthy marine environment.

Underwater biological detection faces many technical and environmental challenges. For example, the changes of water depth, water quality, and lighting conditions will also cause troubles to underwater cameras and other visual equipment, and there is a problem of poor visibility. Compared with land environment, data transmission in underwater environment is more difficult and slow, especially wireless data transmission. The current approaches for target detection can be categorized into two groups: conventional methods and deep learning-based techniques [5].

The majority of conventional techniques revolve around region-based target detection, and a series of candidate frames are generated by candidate frame generation algorithm, and they are classified and regressed one by one [6]. This method is mainly based on hand-designed feature extractor and classifier and usually adopts the sliding window detection method. However, this method has the following limitations: (1) the feature extractor has no learnable habit, (2) the calculation of sliding window detection is large, (3) inaccurate target positioning, (4) not suitable for complex scenes, and (5) it is difficult to cope with occlusion and posture change.

In the realm of deep learning-based approaches, significant advancements have been achieved within the domain of target detection [7]. R-CNN [8], Fast R-CNN [9], and Faster-RCNN [10] belong to the two-stage target detection model, but they cannot meet the real-time detection scenarios. In terms of model architecture, more models with double improvement of speed and accuracy are proposed, such as SSD [11], RetinaNet [12], and YOLO family (YOLO [13], YOLO9000 [14], YOLOv3 [15], YOLOv4 [16], YOLOv5 [17], etc.). The proposal of multi-scale detection technology provides a direction for addressing the issue of detecting small objects and further improves the detection accuracy; there are also improvements in the direction of prediction, such as the popularization of anchor-free [18] prediction. In recent years, this method has made some progress in image recognition, data analysis, automatic detection, and model prediction, but there are still many challenges to be solved, such as too many model parameters, slow training speed, and low reasoning efficiency.

In recent years, underwater target detection has been continuously concerned, and many scholars have made in-depth research in this field.

In 2019, Li et al. [19] overcame the degradation problem of underwater image through the underwater image enhancement preprocessing algorithm for improvement using limited contrast adaptive histogram equalization. Zhao et al. [20] overcome the problem of few and single data samples by using data to enhance the Retinex algorithm of image preprocessing. In 2020, Jia and Liu [21] used MSRCR (Multi-Scale Retinex with Color Restoration) [22] algorithm to enhance underwater images. In 2021, Liu and Liang [23] introduced an enhancement approach for underwater optical images that relies on background light estimation and enhanced adaptive transmission fusion. In 2022, Hao et al. [24] used the optimized MSRCR algorithm to enhance the underwater image. In 2023, Chen et al. [25] introduced the limited contrast adaptive histogram equalization algorithm to preprocess the input image to solve the problem of weak underwater light.

The research listed above focuses on improving the detection effect by image preprocessing algorithm and enhancing underwater image data, but there is still room for improvement in extracting effective features and feature utilization. At the same time, the aforementioned methods have not solved the problems of large model and many parameters and failed to achieve both speed and accuracy. Therefore, the IG-YOLOv5 model is put forward in this study, aiming at lightening the model in the field of underwater biological detection and improving the detection performance and speed.

IG-YOLOv5 model improves accuracy, lightweight, and detection speed, which has a far-reaching impact on marine protection. First of all, accurate data is helpful to formulate more effective protected area policies, conduct population recovery assessment, and support ecological research; second, it reduces the need for direct observation of divers, thus reducing the interference of human activities on the fragile marine environment, which is particularly useful for studying protected, remote, or difficult-to-reach areas. Then, it improves the detection speed of the model; provides almost real-time monitoring; quickly identifies sudden environmental threats, such as illegal fishing, pollution leakage, or other activities that endanger biodiversity; and relevant departments can intervene more quickly, which may prevent or alleviate the damage to marine ecosystems. Finally, the sharing of IG-YOLOv5 model can improve the public’s knowledge and understanding of marine protection issues.

The IG-YOLOv5 model is experimentally proven to be effective. The innovation of this study lies in:

  1. To maintain model accuracy while reducing parameters, re-parameterize the structure for effective feature reuse. An underwater target recognition network model IG-YOLOv5 based on the idea of feature reuse is proposed.

  2. The Improved-Ghost, a new module with feature reuse, is designed, and the YOLOv5 backbone network is reconstructed. The model is lightweight and has an efficient reasoning framework.

  3. To enhance the model’s generalization and robustness for underwater target detection, a strategic allocation of gradient gain is implemented in the loss function. This results in optimizing the loss function as Wise-IoU v3 Loss, incorporating bounding box regression along with a dynamic focusing mechanism.

The remaining sections of this article are outlined as follows: Section 2 introduces the YOLOv5 structure. Section 3 discusses the theory behind the IG-YOLOv5 model. In Section 4, the performance of the IG-YOLOv5 model is assessed using the 2021 URPC data set. Section 5 concludes the article with final remarks.

2 Related technologies

2.1 YOLOv5

The first official version of YOLOv5 was released by Ultralytics on June 25, 2020, and it was improved on the basis of YOLOv4. YOLOv5 model integrates CSPDarknet53 network, multi-scale detection strategy, and Squeeze and Excitation attention [26] mechanism to enhance the model’s performance. YOLOv5 has a more flexible architecture, which provides four predefined model architectures, namely, S, M, L, and X. Among them, YOLOv5’s S model is a lightweight target detection model, which adopts a relatively small network structure, ensures high detection accuracy, has faster reasoning speed and lower memory requirements, and is suitable for embedded devices and mobile applications. This experiment is based on YOLOv5’s S model.

The YOLOv5’s S model network architecture consists of a progressive convolutional neural network. It can be segmented into four primary sections: input layer, backbone layer, neck layer, and head layer [16]. Figure 1 illustrates the network configuration of YOLOv5-6.0.

Figure 1 
                  Network architecture diagram of YOLO V5-6.0.
Figure 1

Network architecture diagram of YOLO V5-6.0.

Input module: The image input size of YOLO V5 model is 640*640, and this part adopts operations such as Mosaic data enhancement, image size adaptive processing, and anchor box computing optimization [16].

The backbone network specifically realizes the feature extraction function. It uses a convolutional neural network to realize the feature extraction of images and generally uses many different feature extraction algorithms such as ResNet [27] and DarkNet. In YOLOv5 6.0, backbone is mainly divided into CBS module (C stands for Conv, B stands for batchNorm2d, and S stands for Sigmoid Linear Unit (SiLu) activation function), CSPDarkNet53, and SPPF (Fast Spatial Pyramid Pooling) module. Compared with the previous version, the first layer of version 6.0 replaces the Focus module with a 6*6 convolution layer, which makes the whole structure more efficient on GPU devices. Conv2d, Sigmoid, and multiplication are encapsulated in CBS module. YOLOv5 draws on the idea of CSPNet and applies it to DarkNet53 backbone network. In YOLOv5-6.0 version, C3 module is used to replace the old BottleneckCSP module, which reduces the parameters and improves the detection accuracy. C3 module contains three convolution layers and multiple Bottleneck modules; YOLOv5-6.0 adopts SPPF module, and several pooled cores of reduced size are cascaded to replace a single large-sized pooled core in the old version of Spatial Pyramid Pooling module, which not only fuses the feature maps of different receptive fields, enriches the expression ability of feature maps, but also improves the running speed. SPPF consists of two CBS modules and three MaxPool2d, which can enhance the invariance of image features and reduce the dimension of information extracted from the convolution layer.

Neck module draws on the thought of feature pyramid FPN [28] and PANet [29] to realize the transmission of semantic information. YOLOv5 6.0 is similar to previous versions.

There are three detection layers in the Head network, which correspond to the three feature maps with different sizes obtained in Neck. The Head network can be subdivided into two parts: the detector head and the detector. The detector is a branch network based on the feature map, and its main function is to predict the area of each feature map. The predicted results include the position, category, and bounding box of each object. The detector post-processes the prediction results of the detector head and uses the non-maximum suppression algorithm to filter out the repeated and conflicting prediction frames to obtain the final detection results.

2.2 Ghost module

The Ghost Models were developed by Han et al. in 2020. The fundamental Ghost module separates the initial convolutional layer into two sections and employs a reduced number of filters to produce multiple inherent feature maps [30].

As depicted in Figure 2, the Ghost module comprises conventional convolution, grouped convolution, and an identity operation.

Figure 2 
                  Ghost module [30].
Figure 2

Ghost module [30].

Ordinary convolution operation is used to generate a compact-sized feature map, and the number of output channels at this time is m, which is less than the total output channels n.

Formulating the operation of a regular convolutional layer to generate m feature maps involves

(1) Y = X f ,

where * signifies the convolution operation, Y R h × w × m represents the intrinsic m-channel feature map,and f R c × k × k × m stands for the employed filters. Besides, h and w denote the output data’s height and width, while k × k corresponds to the filter f’s kernel size.

To achieve the desired n feature maps more comprehensively, group convolution re-operates the aforementioned feature map with cheap linear operation to generate more features:

(2) y i j = φ i , j ( y i ) , i = 1 , , m , j = 1 , , s ,

where y i stands for the ith intrinsic feature map within Y, φ i , j . The mentioned function pertains to the jth linear operation (excluding the final one) responsible for producing the jth ghost feature map, denoted as y ij .

Finally, as shown in Figure 2, the two parts are added by identity mapping.

The Ghost module employs standard convolution to create initial feature maps and then utilizes economical linear operations for feature enhancement and channel expansion. Because of these procedures, the Ghost module produces an equal count of feature maps as the conventional convolution layer. This allows seamless integration of the Ghost module into existing network architectures and mitigates computational expenses.

However, the efficiency of serial operator in Ghost module needs to be improved, and this study proposes a more efficient module on this basis.

3 Method

3.1 Improved-Ghost module

Improved-Ghost module is optimized on the basis of Ghost and RepGhost [31] module. Improved-Ghost continues to follow the idea of Ghost feature reuse and realizes implicit feature reuse through re-parameterization, which is convenient for the model to better understand and identify the objects in the image and ensure high detection accuracy in underwater environment. On the one hand, Improved-Ghost provides richer and more comprehensive information by combining different levels of features, which is helpful to improve the understanding and detection accuracy of IG-YOLOv5 model. By integrating multi-scale features, it is convenient for the network to obtain information from different levels, improve the recognition ability of targets in complex background, and thus maintain high detection accuracy in complex underwater environment. On the other hand, the connection mode of feature reuse makes gradient and information flow more effectively in the network, which is helpful to train deeper and stronger models.

The specific optimization contents are as follows:

  1. Add operation with feature fusion function is used to replace the traditional Concat operation, thus improving efficiency.

    In the aspect of information integration of feature graph, Concat operation splices two tensors to expand their dimensions. Add operation realizes tensor addition, i.e., the amount of information in each dimension increases, while the matrix dimension remains unchanged, which is beneficial to the final classification of images.

    Under the same conditions, the convolution kernel, Concat operation, and Add operation of independent output channels are expressed as follows:

    (3) Out concat = i = 1 n X i K i + i = 1 n Y i K i + n ,

    (4) Out add = i = 1 n ( X i + Y i ) K i ,

    where X i and Y i represent the two input channels and K i represents the dimension of stitching. From equations (3) and (4), it is clear that the Add operation offers benefits in terms of parameter and computation reduction.

  2. Replace the original ReLU function with SiLU activation function with smoother zero point, and move it after the Add operation to realize re-parameterization and speed up the reasoning time.

Relu (Corrected Linear Unit) function belongs to piecewise function, which has good computational properties and can improve the training efficiency of neural network. Compared with ReLU, SiLU function is smoother near zero. Using Sigmoid function, the output range is limited to (0,1), and better performance is achieved in visual tasks.

The specific formulas of ReLU function and SiLU function are as follows:

(5) f ReLU ( x ) = x , x > 0 0 , x 0 ,

(6) f SiLU ( x ) = x sigmoid ( x ) = x 1 + e x .

From Figures 3 and 4, the smoother and more expressive nature of the SiLU function around zero becomes apparent.

Figure 3 
                  ReLU function diagram.
Figure 3

ReLU function diagram.

Figure 4 
                  SiLU function diagram.
Figure 4

SiLU function diagram.

SiLU is a smooth function, which means that it can create a smoother feature map and help capture subtle patterns in data. Its performance on negative values enables it to transmit negative activation, which is helpful for the network to capture more complex relationships between features. In addition, it can also allow the network to adaptively adjust the slope of activation during the training process, which is helpful for more effective gradient flow and richer feature extraction. SiLU is used in the Improved-Ghost module, because it allows the gradient to flow more effectively in the deep network and helps the network learn more abundant feature representation.

  1. Add BN (Batch Normalization) operation to the identity mapping branch to introduce nonlinearity, which is convenient for fusion and fast inference. It stabilizes the input of neural network by reducing the “internal covariant shift.” Specifically, it normalizes the input at each layer of the network, making its mean close to 0 and variance close to 1. In this way, convergence is accelerated, over-fitting is reduced, and the performance of the model is further improved. BN is used in the Improved-Ghost module to increase the diversity of feature graphs and promote the module to learn more diversified feature representations.

The structure of the Improved-Ghost module is illustrated in Figure 5, featuring two branches: a shortcut branch and a connected branch responsible for implementing operators. This module is slightly different in training and reasoning stages. In the process of reasoning, the branch of operator connection is more concise, including only 1*1 convolution, conventional convolution, and SiLU activation function, which has more advantages in computation and reasoning speed.

Figure 5 
                  Structure diagram of Improved-Ghost module.
Figure 5

Structure diagram of Improved-Ghost module.

3.2 IG-YOLOv5 model

Underwater target detection needs both accuracy and efficiency. Although YOLOv5 network structure has obvious advantages in real-time detection, it still has some defects in too many model parameters, slow training speed, and low reasoning efficiency. To achieve efficient, real-time, and accurate underwater target detection, it still needs to be improved. Therefore, the YOLOv5 network is improved to improve the proposed problems.

YOLOv5-6.0 uses a special structure called CSPDarknet to extract deep features. Among them, CSP stands for Cross Stage Partial, which is a network design strategy. It improves information flow and gradient flow by sharing features between different stages of the network and partially crossing skip-connections, thus improving performance and reducing calculation cost. Darknet is a lightweight neural network foundation. Combining CSP strategy with Darknet structure can create an efficient and powerful model, which is suitable for real-time target detection and other computer vision tasks. The CSPDarknet structure splits the input feature map into two segments, one of which continues to pass forward and the other part is processed and spliced with the previous part. This structure can avoid the problem of gradient disappearance. In terms of parameters, although it is reduced compared with the previous version, it still has a large optimization space. In terms of detection effect, it is better to deal with large target detection tasks, and there exists significant potential for enhancing performance in tasks related to detecting small targets. Therefore, this article reconstructs the backbone layer of YOLOv5 through Improved-Ghost, which makes the structure lighter and has stronger learning ability.

The network constructed in this article consists of five CBS modules, four Improved-Ghost modules, and one SPPF module. The structural diagram is shown in Figure 6.

Figure 6 
                  IG-YOLOv5 model network structure diagram.
Figure 6

IG-YOLOv5 model network structure diagram.

IG-YOLOv5 uses mosaic algorithm in image processing to enhance data, which effectively simulates the visual conditions in underwater environment, and is also an effective means for the model to deal with different underwater environments, especially in underwater environment, which helps the model to adapt to the appearance of targets under different visual angles, different distances, and different light conditions. Based on the Improved-Ghost module, the backbone network is reconstructed, and using the feature reuse mechanism, the model’s understanding of complex or low-contrast underwater scenes is enhanced by copying key feature maps and enhancing and reorganizing them at multiple levels. In addition, the Improved-Ghost module introduces lightweight original features, which can capture the fine structure of data without significantly increasing the calculation cost. In the underwater environment, this can help the model to distinguish fuzzy or partially occluded objects. In summary, the aforementioned adjustments make IG-YOLOv5 more efficient and accurate in underwater target detection, especially when facing the unique visual challenges of underwater environment.

3.3 Wise-IoU loss

The effectiveness of target detection relies on the formulation of the loss function. YOLOv5-6.0 uses CIoU [32] as the loss function to help improve the accuracy and stability of target detection. CIoU is an improvement based on IoU metric and DIoU [33] metric, which considers the position difference and size difference between the prediction frame and the real frame. The distance and overlap between the pre-selected frame and the real frame are optimized during training. Compared with the traditional IoU, it has better performance in dealing with targets with different scales and shapes. There is still much room for optimization in terms of robustness and computational complexity.

Therefore, in this study, Wise-IoU [34] is used to design the loss function. Wise-IoU uses the dynamic non-monotonic focusing mechanism to replace IoU to evaluate the quality of the anchor frame and has a wise gradient gain distribution strategy. This makes it easier to be caught in the ordinary anchor frame, and the detection ability is improved as a whole. It has better robustness. Wise-IoU not only considers the position and scale information between the prediction frame and the real frame, but also considers the shape information between them. Compared with CIoU, Wise-IoU can better handle targets with different shapes, sizes, and directions, thus improving the robustness of target detection. Wise-IoU reduces the computational complexity by introducing the weight matrix. Compared with CIoU, Wise-IoU can train and reason faster and perform better on devices with limited computing resources. At the same time, it can control the importance of different parts through the weight matrix, thus improving the interpretability of the loss function. This enhancement can boost the precision and resilience of target detection in real-world scenarios.

Wise-IoU Loss has three focusing mechanisms, namely, Wise-IoU v1, Wise-IoU v2, and Wise-IoU v3. Wise-IoU v1 constructs the loss of the bounding box based on attention, and it constructs the distance attention according to the distance measure and obtains two layers of attention mechanisms, namely, RIoU and LIoU: RWIoU, ranging from 1 to E, which significantly enlarges the Liou of the ordinary mass anchor frame; LIoU, the value range is 0 to 1, which significantly reduces RWIoU of high-quality anchor frame. When the anchor frame coincides with the target frame, the distance between the center points is the focus of attention. Wise-IoU v2 constructs a monotonic focusing coefficient and introduces the average value of LIoU as a normalization factor to realize the model focusing on difficult targets, improve the classification performance, and expedite convergence during the later stages of training. Wise-IoU v3 defines outlier β to characterize the anchor frame’s quality. The coefficient of nonmonotonic focusing is constructed by β and applied to Wise-IoU v1. For the positioning ability of IG-YOLOv5 model in underwater small targets, this study optimizes the loss function as Wise-IoU v3 with dynamic focusing mechanism, and the specific formula is as follows:

(7) L WIoUv3 = r L WIoUv1 , r = β δ α β δ ,

where r represents the gradient gain.

Among them, the outlier is defined as:

(8) β = L IoU L IoU ¯ [ 0 , + ) ,

where L IoU is the monotonic focusing coefficient and L IoU ¯ is the exponential running average.

Images in underwater environment are usually affected by visual degradation, such as dim light, turbid water, and particle interference. This will seriously affect the performance of traditional IoU loss functions, because they usually cannot handle these visual artifacts well. Wise-IoU v3 can guide the model prediction more effectively by considering the “benefit degree,” thus improving the accuracy of detection in complex environment, which is helpful to predict the boundary box of the target more accurately even if the target is irregular in shape or deformed in complex underwater environment.

4 Experiments

4.1 Experimental platform and data set

The hardware environment of this experiment is Windows 11(x64) operating system, 32G memory, and NVIDIA RTX 3060 (6GB) GPU. The software environment is Pytorch 1.10.0 architecture and PyCharm platform. The specific experimental environment configuration is shown in Table 1.

Table 1

Configuration of experimental environment

Project Experimental environment
Operating system Windows 11(x64)
CPU 12th Gen Intel(R) Core(TM) i9-12900H 2.50GHz
GPU NVIDIA RTX3060(6GB)
Memory size 32GB
Python 3.9.13
Accelerated environment CUDA11.6

In this study, the pre-training weight file of the experiment is Yolov5s.pt, and the default superparameters are used to enhance the data. The learning rate is set to 0.01, the number of training periods epochs is set to 300, the training batch size is 8, the batch size is 16, and the pixels of the image are unified to 640*640. The relevant experimental parameters are configured as shown in Table 2.

Table 2

Configuration of experimental parameters

Parameter Configuration
Weights Yolov5s.pt
Hyp hyp.scratch-low.yaml
Epochs 300
Learning rate 0.01
Batch size 16
Workers 8
Image size 640*640

The data set utilized in this experiment is named 2021URPC, which was provided in the underwater vehicle target detection algorithm competition. There are 6,575 underwater images in this data set, which are divided into five categories: holothurian, echinus, scallop, starfish, and waterweeds. Figure 7 illustrates a sample from the data set.

Figure 7 
                  Data set example.
Figure 7

Data set example.

As depicted in Figure 8(a), an analysis is conducted on the count of targets within each category, among which sea urchins are the most abundant, followed by scallops, starfish, sea cucumbers, and aquatic plants. As shown in Figure 8(b), the target location map with regularization indicates a higher density of targets horizontally and comparatively more scattered vertically. In addition, in the standardized target size diagram depicted in Figure 8(c), it can be observed that target sizes are relatively focused, with a majority of them being small. In order to meet the needs of experimental training, verification, and testing, the total data set is randomly divided into three categories: training set, verification set, and test set, and the total data set is 4,141, 1,776, and 658, respectively.

Figure 8 
                  Statistical results of data set. (a) Bar chart of the number of targets in each class; (b) normalized target location map; (c) normalized target size map.
Figure 8

Statistical results of data set. (a) Bar chart of the number of targets in each class; (b) normalized target location map; (c) normalized target size map.

4.2 Evaluation indicators

In the experiment, precision, recall, average precision (mAP), and floating-point operations (FLOPs) are used as performance evaluation indexes to evaluate the target detection method. The accuracy, recall, and mAP formulas are shown in the following formulas:

(9) Pression = TP TP + FP × 100 % ,

(10) Recall = TP TP + FP × 100 % ,

(11) mAP = 1 K k = 1 K AP ( P , R , k ) ,

where TP represents the correctly detected positive samples, FP stands for incorrectly detected negative samples, and FN denotes the incorrectly detected positive samples. MAP is obtained from the PR curve. In this experiment, the average precision at an IOU threshold of 0.5 is computed, and k signifies the count of target detection categories. Within this study, K = 5.

FLOPs, floating-point operands, are generally used to measure the complexity of the model and usually only consider the count of multiplication and addition operations:

(12) FLOPs = params × H × W .

4.3 URPC data set

To validate the underwater target detection capability of the IG-YOLOv5, an evaluation was conducted comparing both the YOLOv5 and the IG-YOLOv5. The test set’s detection outcomes using the IG-YOLOv5 model are depicted in Figure 9.

Figure 9 
                  Experimental results of the IG-YOLOv5 model.
Figure 9

Experimental results of the IG-YOLOv5 model.

As shown in Figure 10, the results of the IG-YOLOv5 model show that the detection efficiency of various target categories is improved, especially the sea urchin category with a value of 91.5% 0.5_mAP. The average accuracy of the IG-YOLOv5 model is 74.2%.

Figure 10 
                  Accurate recall curve of the IG-YOLOv5 model.
Figure 10

Accurate recall curve of the IG-YOLOv5 model.

From the accurate recall curve, it can be seen that the model of IG-YOLOv5 performs well in four kinds of data: holothurian, echinus, scallop, and starfish, which all exceed or approach the average, but it still does not reach the ideal state in the data of waterweeds. See Table 3 for the comparison between the number of data sets and the evaluation indicators. As evident from the table, the evaluation indexes of sea urchins with large data and easy identification are higher than those of other species. However, the evaluation index of aquatic plants with a small number of sets and difficult shape positioning is at the end. Through analysis, the low evaluation index of aquatic plants is mainly related to the fact that the number in the training data set is too small and the shape is difficult to identify.

Table 3

Comparison between the number of data sets and evaluation indicators

Data set type Evaluation indicators
Category name quantity Precision Recall 0.5 mAP
Holothurian 6,074 0.784 0.63 0.692
Echinus 24,346 0.883 0.854 0.915
Scallop 8,687 0.776 0.709 0.777
Starfish 7,180 0.86 0.816 0.88
Waterweeds 82 0.626 0.667 0.443

IG-YOLOv5 has not reached the ideal value in identifying aquatic plants, but it has improved the accuracy by 19%, the recall by 33.34%, and the 0.5_mAP by 24.2% compared with the YOLOv5 model. The data show that IG-YOLOv5 has great advantages in organisms with few data sets and difficult to identify morphology.

The confusion matrix assesses the result accuracy of the IG-YOLOv5 model. In the confusion matrix, columns depict the predicted category proportions, while rows indicate the actual category proportions in the data, as shown in Figure 11. The findings from Figure 11 demonstrate that the accurate prediction rates for echinus, starfish, and scallop categories are 89, 85, and 77%, respectively. This suggests the IG-YOLOv5 model’s strong accuracy.

Figure 11 
                  Confusion matrix of the IG-YOLOv5 model.
Figure 11

Confusion matrix of the IG-YOLOv5 model.

In addition, this study also provides the curve of loss value, including frame loss, target loss, and classification loss. In this study, the loss function is optimized. Wise-IoU v3 is used as the loss function of the bounding box, and the lower the value, the higher the accuracy. Target loss refers to the mean of target detection loss, with lower values indicating greater accuracy. A smaller classification loss value corresponds to higher accuracy. As shown in Figure 12, with the increase of the number of iterations, the loss value shows a steady decline and finally stability, and reaches convergence after 200 iterations.

Figure 12 
                  Loss value variation curve for the data set.
Figure 12

Loss value variation curve for the data set.

4.4 Comparative experimental results and analysis

To further validate the superiority of the proposed IG-YOLOv5 model, it is trained and tested and compared with the Faster R-CNN, RTMDet [35], VarifocalNet [36], GFL [37], and YOLOv5 with their evaluation metrics over the 2021 URPC data set. Table 4 displays the comparison of specific results.

Table 4

Comparison of experimental results

Method Backbone 0.5 mAP Model size/M GFLOPs
Faster-RCNN [10] ResNet-50 57.2 41.37 128
RTMDet [35] CSPNeXt 66.6 8.89 14.84
VarifocalNet [36] ResNet-50 61.8 32.89 198.66
GFL [37] ResNet-50 56.3 32.44 122.88
YOLOv5 [17] CSPDarkNet 69.9 6.70 15.8
IG-YOLOv5 Improved-Ghost 74.2 5.77 13.1

Faster R-CNN, a typical representative of two-stage target detection, is a classic target detection model based on ResNet-50 backbone. RTMDet, which uses RoI Transformer module and feature pyramid network, can achieve State-of-the-Art effect on multiple tasks. Using CSPNeXt as the backbone, the table data show that its 0.5 mAP reaches 66.6%, which is better than Faster R-CNN. VarifocalNet is a target detection algorithm using the loss function of Varifocal Loss, and the backbone network is based on ResNet-50. GFL is a loss function algorithm to solve the problem of class imbalance in target detection, and its backbone network is also based on ResNet-50. This method does not perform well on 0.5 mAP, Model size, and GFLOPS.

By comparing the experimental results, the IG-YOLOv5 model has obvious advantages over the contrast model in mAP, model size, and FLOPs. IG-YOLOv5 is 17.9% higher than the GFL model in mAP, 35.6 M lower than the FasterRCNN model, and 1/15 of the VarifocalNet model in FLOPs.

The IG-YOLOv5 model outperforms the YOLOv5 model on all evaluation metrics. It is particularly impressive on mAP, where the IG-YOLOv5 model increases the mAP by 4.3 percentage points over the YOLOv5 model at an IoU threshold of 0.5.

The IG-YOLOv5 model can effectively improve the mAP while reducing the computational complexity, and it is more suitable for the real-time detection of underwater organisms.

4.5 Ablation experiments results and analysis

This study conducts ablation experiments to assess how various enhancements impact model performance. In this study, we reconstruct the backbone network using the designed Improved-Ghost module and subsequently optimize it with Wise-IoUv3 as the loss function. Finally to this study, algorithm is used to verify the effectiveness of different modules in improving the performance of the network. The results of the experiments are presented in Table 5.

Table 5

Comparison of ablation experiments

Model Improved-Ghost Wise IoU v3 P/% R/% 0.5 mAP/% Model size/M GFLOPS
YOLOv5 78.3 64.8 69.9 6.70 15.8
A 79.6 71.9 73.8 5.77 13.1
B 77.5 66 74.7 6.70 15.8
IG-YOLOv5 78.6 73.5 74.2 5.77 13.1

The “√“ in the table signifies the utilization of the respective enhanced method. From Table 5, it can be concluded that the Improved-Ghost module or the optimized loss function is Wise-IoUv3; compared with YOLOv5 model, the accuracy, recall, and mAP value are improved and the Improved-Ghost module reduces the model algorithm.

From the point of view of accuracy, Model A performs best in this index, reaching 79.6%. From the recall index, IG-YOLOv5 scored the highest, with 73.5%. On the mAP value, the B model and IG-YOLOv5 are basically the same, which are 74.7 and 74.2%, respectively. In terms of the model size reflecting the complexity and storage requirements of the model, both the A model and the IG-YOLOv5 model are 5.77 M, which is smaller than the YOLOv5 and B model. Type A and IG-YOLOv5 have the lowest GFLOPS, both of which are 13.1, which means that they may be more efficient in calculation.

From the functional characteristics, the IG-YOLOv5 model performs well in many performance indicators, which shows that the Improved-Ghost module and Wise-IoU v3 have brought obvious performance improvement to the model. The B-model with Improved-Ghost alone performs well on 0.5 mAP, but its P and R are slightly lower, which may mean that this technology has a positive impact on the IoU performance of the model, but the impact on other performance indicators is mixed. GFLOPS analysis shows that Type A and IG-YOLOv5 models are more efficient in calculation, which is especially important for application scenarios that require rapid response or limited computing resources.

After reconstructing the backbone structure, the model increases the mAP value by 3.9% and reduces GFLOPs by 2.7. Improved-Ghost’s reconstruction of the YOLOv5 backbone significantly bolsters the network’s functionality.

Although the mAP value after optimizing the loss function Wise-IoU v3 based on the reconstruction of the backbone structure is slightly smaller than that after optimizing the loss function Wise-IoU v3 alone, from a comprehensive point of view, the model in this study has advantages in accuracy, recall, mAP value, and model lightweight.

In this study, YOLOv5 and IG-YOLOv5 models are tested and compared with the same data set. Figure 13 shows the comparison of experimental results of the IG-YOLOv5 model, in which (a) is the real value, (b) is the YOLOv5 model, and (c) is the IG-YOLOv5 model. The figure distinctly shows the instances of missed and false detections in the YOLOv5 model. IG-YOLOv5 model reduces the false detection rate due to complex environment and improves the detection accuracy and accuracy.

Figure 13 
                  Improved comparison example of experimental results: (a) true value (left), (b) YOLOv5 (middle), and (c) IG-YOLOv5 (right).
Figure 13

Improved comparison example of experimental results: (a) true value (left), (b) YOLOv5 (middle), and (c) IG-YOLOv5 (right).

4.6 Ablation experiments for IoU

In order to illustrate the advantages of Wise-IoU in the IG-YOLOv5 model, the ablation experiments of IoU were compared. Model A uses IG-YOLOv5 whose loss function is CIoU, and model C uses SIoU.

As can be seen from Table 6, IG-YOLOv5 model has the highest recall rate, which is nearly 2% higher than the other two models. In terms of 0.5 mAP, compared with A and C models, the IG-YOLOv5 model is in the first place. Although the accuracy of C model is slightly higher, IG-YOLOv5 performs better in both recall and 0.5 mAP. Wise-IoU v3 method used by IG-YOLOv5 is more effective than CIoU and SIoU in improving recall rate and overall mAP.

Table 6

Comparison of ablation experiments for IoU

Model IoU P/% R/% 0.5 mAP/%
A CIoU 79.6 71.9 73.8
C SIoU 79.7 71.5 73.0
IG-YOLOv5 Wise IoU v3 78.6 73.5 74.2

4.7 Experimental results and analysis of adding noise

In order to further verify the robustness of IG-YOLOv5, the 2021URPC data set was denoised, and salt and pepper noise was added to 6,575 pictures in the data set. The data comparison diagram before and after adding noise is shown in Figure 14.

Figure 14 
                  Comparison chart of data before and after adding noise. Raw data (left), the data after adding noise (right).
Figure 14

Comparison chart of data before and after adding noise. Raw data (left), the data after adding noise (right).

In this study, YOLOv5 and IG-YOLOv5 models are tested and compared with the same data set. As can be seen from Table 7, the accuracy of IG-YOLOv5 is 14.3 percentage points higher than that of YOLOv5, which is a considerable improvement in the target detection task. The recall rate increased by 2.1 percentage points. Although the increase rate is not as obvious as the accuracy, it is enough to show that IG-YOLOv5 can detect more positive samples. When IoU is 0.5, the mAP of IG-YOLOv5 is 61%, which is 1.2 percentage points higher than that of YOLOv5, which is 59.8%. IG-YOLOv5 outperforms YOLOv5 in all indicators shown in Table 7. This further confirms the advantages of IG-YOLOv5 in target detection performance. From the aforementioned analysis, it can be concluded that IG-YOLOv5 is not only superior to YOLOv5 in key performance indicators, but also shows great advantages in model size and computational complexity.

Table 7

Comparison of ablation experiments after adding noise

Model P/% R/% 0.5 mAP/% Model size/M GFLOPS
YOLOv5 51 64.2 59.8 28 15.8
IG-YOLOv5 65.3 66.3 61 24 13.1

Figure 15 shows the comparison of experimental results of the IG-YOLOv5 model on the noisy data set, in which (a) is the real value, (b) is the YOLOv5 model, and (c) is the IG-YOLOv5 model. It can be seen from the figure that both IG-YOLOv5 and YOLOv5 have some false detection phenomena, but compared with YOLOv5, the IG-YOLOv5 model shows higher confidence. This makes IG-YOLOv5 have higher potential value in various application scenarios.

Figure 15 
                  Comparative example of experimental results on noisy data sets: (a) true value (left), (b) YOLOv5 (middle), and (c) IG-YOLOv5 (right).
Figure 15

Comparative example of experimental results on noisy data sets: (a) true value (left), (b) YOLOv5 (middle), and (c) IG-YOLOv5 (right).

5 Conclusion

In this study, we propose an underwater target recognition network model IG-YOLOv5 based on the idea of feature reuse. Combined with the idea of feature reuse, the Improved-Ghost module is proposed, the backbone structure of YOLOv5 is reconstructed, the model is lightweight, and the loss function is optimized as Wise-IoU, which improves the performance and accuracy of the model and keeps the efficient calculation speed. The experimental outcomes indicate the satisfactory performance of the proposed IG-YOLOv5 model on the 2021URPC public data set, and is more accurate and efficient than Faster-RCNN,RTMDet, VarifocalNet, GFL model, and YOLOv5 model in underwater target detection.

IG-YOLOv5 model has a wide range of potential practical applications. Robots can use IG-YOLOv5 to identify topographic features, such as seabed cracks, stones or other obstacles, and map seabed topography through real-time image analysis. In the search and rescue mission of missing ships or planes, underwater robots can use IG-YOLOv5 to identify and locate lost objects or other important clues. In the security monitoring of public places or private properties, IG-YOLOv5 can be used to detect and track unauthorized persons or vehicles. Of course, IG-YOLOv5 can also be trained to identify abnormal or suspicious behaviors, such as intrusion, fighting, or other security threats, and give an alarm in real time. To sum up, the high precision and computational efficiency of IG-YOLOv5 make it an ideal choice for underwater robots and monitoring systems, which can provide real-time and accurate target detection and tracking functions and meet the needs of various practical applications.

The advantages of high performance, real-time detection, and lightweight of IG-YOLOv5 model make it suitable for practical deployment in marine protection and underwater monitoring. Of course, there will be challenges or limitations when deploying in the actual underwater scene, such as the limitation of hardware equipment, long-term deployment in remote, or difficult-to-access locations may lead to maintenance problems, and the system needs to be able to automatically recover from failures and run reliably. In addition, it is necessary to (1) ensure that no private information unrelated to the research is captured during deployment; (2) ensure that technology deployment will not have a negative impact on breeding areas, migration routes of mammals, or other sensitive areas; and (3) consider how technology interacts with the current ecosystem and its possible long-term impact.

To make the IG-YOLOv5 model contribute to a wider field of marine science and protection, it is planned to provide the IG-YOLOv5 model to researchers and organizations involved in marine protection, so as to obtain data feedback in actual marine scenes. In terms of technology, it is planned to enhance it in the following aspects to further improve its performance and versatility.

  1. The speed of model processing is further improved by hardware acceleration.

  2. By training the model on more diverse data sets, the generalization ability of the model is enhanced, so that it can identify a wider range of object categories and perform well in an unprecedented environment.

  3. Considering the spatial characteristics of underwater environment, develop technologies that can understand and manipulate three-dimensional spatial information, so as to improve the understanding of dynamic objects and complex scenes.

  4. Integrating more advanced fault detection mechanism and allowing system self-diagnosis and recovery is very important for remote or difficult access deployment.

Through the work of underwater biological detection model, the ability to observe rapidly moving marine organisms or cope with dynamic environment shows that real-time detection is very important, which is the key for scenes that need rapid response, such as autonomous vehicles, emergency response systems, or real-time transaction analysis. At the same time, in order to achieve more comprehensive and more innovative solutions for complex projects, interdisciplinary collaboration is essential.

The work of this study will bring new development opportunities to the field of underwater target detection. The IG-YOLOv5 model excels not only in precise underwater target detection and identification but also holds significance across various application scenarios, such as marine resources development, marine environmental monitoring, and maritime security. Moving forward, we will persist in refining and enhancing the underwater target detection model to attain heightened accuracy and resilience. Through continuous research and improvement, we will provide more reliable technical support for underwater environment and development and create more opportunities for human beings to explore unknown fields.

  1. Funding information: This work was not supported by any found projects.

  2. Conflict of interest: The authors report no conflict of interest.

  3. Data availability statement: Data are contained within the article or Supplementary Materials.

References

[1] Wang X, Zhu Y, Li D, Zhang G. Underwater target detection based on reinforcement learning and ant colony optimization. J Ocean Univ China. 2022;21(2):323–30.10.1007/s11802-022-4887-4Search in Google Scholar

[2] Zhou X, Ding W, Jin W. Microwave-assisted extraction of lipids, carotenoids, and other compounds from marine resources. Innovative and emerging technologies in the bio-marine food sector. 2022. p. 375–94.10.1016/B978-0-12-820096-4.00012-2Search in Google Scholar

[3] Gao S, Sun H, Huang X, Hui Y, Ge S. Performance audit evaluation of marine development projects based on SPA and BP neural network model. Open Geosci. 2023;15:20220470.10.1515/geo-2022-0470Search in Google Scholar

[4] Sun H, Gao S, Liu J, Liu W. Research on comprehensive benefits and reasonable selection of marine resources development types. Open Geosci. 2022;14:141–50.10.1515/geo-2022-0341Search in Google Scholar

[5] Zhang W, Sun W. Research on small moving target detection algorithm based on complex scene. J Phys: Conf Ser. 2021;1738(1):1742–6596.10.1088/1742-6596/1738/1/012093Search in Google Scholar

[6] Fu H, Song G, Wang Y. Improved YOLOv4 marine target detection combined with CBAM. Symmetry. 2021;13(4):623.10.3390/sym13040623Search in Google Scholar

[7] Francesco P, Philip HS, Torr P, Dokania K. An impartial take to the cnn vs transformer robustness contest. Computer Vision – ECCV 2022: 17th European Conference, Tel 21 Aviv, Israel, October 23–27, 2022, Proceedings, Part XIII. Cham: Springer Nature Switzerland; 2022. p. 466–80.10.1007/978-3-031-19778-9_27Search in Google Scholar

[8] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR. 2013.10.1109/CVPR.2014.81Search in Google Scholar

[9] Girshick R. Fast R-CNN. Comput Vis Pattern Recognit. arXiv. 2015;1504:08083.10.1109/ICCV.2015.169Search in Google Scholar

[10] Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. Comput Vis Pattern Recognit, arXiv. 2015;1506:01497.Search in Google Scholar

[11] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, et al. SSD:Single shot multi-box detector. 14th European Conference on Computer Vision. Cham: Springer; 2016. p. 21–37.10.1007/978-3-319-46448-0_2Search in Google Scholar

[12] Lin T, Goyal P, Girshick R, He K, Dollar P. Focal loss for dense object detection. IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE; 2017.10.1109/ICCV.2017.324Search in Google Scholar

[13] Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE; 2016.10.1109/CVPR.2016.91Search in Google Scholar

[14] Redmon J, Ali F. YOLO9000: better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 7263–71.10.1109/CVPR.2017.690Search in Google Scholar

[15] Redmon J, Ali F. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767; 2018.Search in Google Scholar

[16] Bochkovskiy A, Wang C, Liao HM. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934; 2020.Search in Google Scholar

[17] Zhu X, Lyu S, Wang X, Zhao Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. Proceedings of the IEEE/CVF international conference on computer vision; 2021. p. 2778–88.10.1109/ICCVW54120.2021.00312Search in Google Scholar

[18] Tian Z, Shen C, Chen H, He T. FCOS: Fully convolutional one-stage object detection. Comput Vis Pattern Recognit. arXiv:1904.01355. 2019.10.1109/ICCV.2019.00972Search in Google Scholar

[19] Li Q, Li Y, Niu J. Real-time detection of underwater fish targets based on improved YOLO transfer learning. Pattern Recognit Artif Intell. 2019;32(3):193–203. in Chinese.Search in Google Scholar

[20] Zhao D, Liu X, Sun Y, Wu R, Hong J, Ruan C. Underwater crab identification method based on machine vision. J Agric Mach. 2019;50(3):151–8. (in Chinese).Search in Google Scholar

[21] Jia Z, Liu X. Target detection of marine animals based on YOLO and image enhancement. Electron Meas Technol. 2020;43(14):84–8. (in Chinese.Search in Google Scholar

[22] Teng L, Xue F, Bai Q. Remote sensing image enhancement via edge-preserving multiscale retinex. IEEE Photonics J. 2019;1–10.10.1109/JPHOT.2019.2902959Search in Google Scholar

[23] Liu K, Liang Y. Enhancement of underwater optical images based on background light estimation and improved adaptive transmission fusion. Opt Express. 2021;29(18):28307–28.10.1364/OE.428626Search in Google Scholar PubMed

[24] Hao K, Wang K, Wang B, Zhao L, Wang BB, Wang CQ. Underwater biological detection algorithm based on image enhancement and improvement of YOLOv3. J Jilin Univ (Eng Ed). 2022;52(5):1088–97. (in Chinese).Search in Google Scholar

[25] Chen YL, Dong SJ, Sun SZ, Yan KB. Improved detection algorithm of underwater biological targets in low light of YOLOv5 [J/OL]. J Beijing Univ Aeronaut Astronaut. 2023;7:1–13. in Chinese 10.13700/J.BH.1001-5965.Search in Google Scholar

[26] Xu Q, Su J, Wang Y, Zhang J, Zhong Y. Few-Shot learning based on double pooling squeeze and excitation attention. Electronics. 2023;12(1):27.10.3390/electronics12010027Search in Google Scholar

[27] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Comput Vis Pattern Recognit. arXiv. 2015;1512:03385.10.1109/CVPR.2016.90Search in Google Scholar

[28] Lin T, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ. Feature pyramid networks for object detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016. p. 936–44.10.1109/CVPR.2017.106Search in Google Scholar

[29] Shu L, Lu Q, Haifang Q, Shi J, Jia J. Path aggregation network for instance segmentation. Comput Vis Pattern Recognit. arXiv. 01534, 1803. p. 2018.Search in Google Scholar

[30] Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C. Ghostnet: More features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 1580–9.10.1109/CVPR42600.2020.00165Search in Google Scholar

[31] Chen C, Guo Z, Zeng H, Xiong P, Dong J. RepGhost: A hardware-efficient ghost module via re-parameterization. arXiv. 2022;2211:06088v1.Search in Google Scholar

[32] Zheng Z, Wang P, Ren D, Liu W, Ye R, Hu Q, et al. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. IEEE Trans Cybern. 2020;52(8):8574–86.10.1109/TCYB.2021.3095305Search in Google Scholar PubMed

[33] Zheng ZH, Wang P, Liu W, Li J, Ye R, Ren D. Distance-IoU Loss: Faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence; 2020. p. 12993–3000.10.1609/aaai.v34i07.6999Search in Google Scholar

[34] Tong Z, Chen Y, Xu Z, Yu R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. Comput Vis Pattern Recognit. arXiv. 2023;2301:10051.Search in Google Scholar

[35] Lyu CQ, Zhang WW, Huang HA, Zhou Y, Wang YD, Liu YY, et al. RTMDet: An empirical study of designing real-time object detectors. Comput Vis Pattern Recognit. arXiv. 2022;2212:07784.Search in Google Scholar

[36] Zhang H, Wang Y, Dayoub F, Sunderhauf N. VarifocalNet: An IoU-aware Dense Object Detector. Comput Vis Pattern Recognit, arXiv: 2008.13367; 2021.10.1109/CVPR46437.2021.00841Search in Google Scholar

[37] Li X, Wang W, Wu L, Chen S, Hu X, Li J, et al. Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. Comput Vis Pattern Recognit, arXiv: 2006. 04388; 2020.Search in Google Scholar

Received: 2023-08-18
Revised: 2023-10-30
Accepted: 2023-11-24
Published Online: 2023-12-29

© 2023 the author(s), published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

  1. Regular Articles
  2. Diagenesis and evolution of deep tight reservoirs: A case study of the fourth member of Shahejie Formation (cg: 50.4-42 Ma) in Bozhong Sag
  3. Petrography and mineralogy of the Oligocene flysch in Ionian Zone, Albania: Implications for the evolution of sediment provenance and paleoenvironment
  4. Biostratigraphy of the Late Campanian–Maastrichtian of the Duwi Basin, Red Sea, Egypt
  5. Structural deformation and its implication for hydrocarbon accumulation in the Wuxia fault belt, northwestern Junggar basin, China
  6. Carbonate texture identification using multi-layer perceptron neural network
  7. Metallogenic model of the Hongqiling Cu–Ni sulfide intrusions, Central Asian Orogenic Belt: Insight from long-period magnetotellurics
  8. Assessments of recent Global Geopotential Models based on GPS/levelling and gravity data along coastal zones of Egypt
  9. Accuracy assessment and improvement of SRTM, ASTER, FABDEM, and MERIT DEMs by polynomial and optimization algorithm: A case study (Khuzestan Province, Iran)
  10. Uncertainty assessment of 3D geological models based on spatial diffusion and merging model
  11. Evaluation of dynamic behavior of varved clays from the Warsaw ice-dammed lake, Poland
  12. Impact of AMSU-A and MHS radiances assimilation on Typhoon Megi (2016) forecasting
  13. Contribution to the building of a weather information service for solar panel cleaning operations at Diass plant (Senegal, Western Sahel)
  14. Measuring spatiotemporal accessibility to healthcare with multimodal transport modes in the dynamic traffic environment
  15. Mathematical model for conversion of groundwater flow from confined to unconfined aquifers with power law processes
  16. NSP variation on SWAT with high-resolution data: A case study
  17. Reconstruction of paleoglacial equilibrium-line altitudes during the Last Glacial Maximum in the Diancang Massif, Northwest Yunnan Province, China
  18. A prediction model for Xiangyang Neolithic sites based on a random forest algorithm
  19. Determining the long-term impact area of coastal thermal discharge based on a harmonic model of sea surface temperature
  20. Origin of block accumulations based on the near-surface geophysics
  21. Investigating the limestone quarries as geoheritage sites: Case of Mardin ancient quarry
  22. Population genetics and pedigree geography of Trionychia japonica in the four mountains of Henan Province and the Taihang Mountains
  23. Performance audit evaluation of marine development projects based on SPA and BP neural network model
  24. Study on the Early Cretaceous fluvial-desert sedimentary paleogeography in the Northwest of Ordos Basin
  25. Detecting window line using an improved stacked hourglass network based on new real-world building façade dataset
  26. Automated identification and mapping of geological folds in cross sections
  27. Silicate and carbonate mixed shelf formation and its controlling factors, a case study from the Cambrian Canglangpu formation in Sichuan basin, China
  28. Ground penetrating radar and magnetic gradient distribution approach for subsurface investigation of solution pipes in post-glacial settings
  29. Research on pore structures of fine-grained carbonate reservoirs and their influence on waterflood development
  30. Risk assessment of rain-induced debris flow in the lower reaches of Yajiang River based on GIS and CF coupling models
  31. Multifractal analysis of temporal and spatial characteristics of earthquakes in Eurasian seismic belt
  32. Surface deformation and damage of 2022 (M 6.8) Luding earthquake in China and its tectonic implications
  33. Differential analysis of landscape patterns of land cover products in tropical marine climate zones – A case study in Malaysia
  34. DEM-based analysis of tectonic geomorphologic characteristics and tectonic activity intensity of the Dabanghe River Basin in South China Karst
  35. Distribution, pollution levels, and health risk assessment of heavy metals in groundwater in the main pepper production area of China
  36. Study on soil quality effect of reconstructing by Pisha sandstone and sand soil
  37. Understanding the characteristics of loess strata and quaternary climate changes in Luochuan, Shaanxi Province, China, through core analysis
  38. Dynamic variation of groundwater level and its influencing factors in typical oasis irrigated areas in Northwest China
  39. Creating digital maps for geotechnical characteristics of soil based on GIS technology and remote sensing
  40. Changes in the course of constant loading consolidation in soil with modeled granulometric composition contaminated with petroleum substances
  41. Correlation between the deformation of mineral crystal structures and fault activity: A case study of the Yingxiu-Beichuan fault and the Milin fault
  42. Cognitive characteristics of the Qiang religious culture and its influencing factors in Southwest China
  43. Spatiotemporal variation characteristics analysis of infrastructure iron stock in China based on nighttime light data
  44. Interpretation of aeromagnetic and remote sensing data of Auchi and Idah sheets of the Benin-arm Anambra basin: Implication of mineral resources
  45. Building element recognition with MTL-AINet considering view perspectives
  46. Characteristics of the present crustal deformation in the Tibetan Plateau and its relationship with strong earthquakes
  47. Influence of fractures in tight sandstone oil reservoir on hydrocarbon accumulation: A case study of Yanchang Formation in southeastern Ordos Basin
  48. Nutrient assessment and land reclamation in the Loess hills and Gulch region in the context of gully control
  49. Handling imbalanced data in supervised machine learning for lithological mapping using remote sensing and airborne geophysical data
  50. Spatial variation of soil nutrients and evaluation of cultivated land quality based on field scale
  51. Lignin analysis of sediments from around 2,000 to 1,000 years ago (Jiulong River estuary, southeast China)
  52. Assessing OpenStreetMap roads fitness-for-use for disaster risk assessment in developing countries: The case of Burundi
  53. Transforming text into knowledge graph: Extracting and structuring information from spatial development plans
  54. A symmetrical exponential model of soil temperature in temperate steppe regions of China
  55. A landslide susceptibility assessment method based on auto-encoder improved deep belief network
  56. Numerical simulation analysis of ecological monitoring of small reservoir dam based on maximum entropy algorithm
  57. Morphometry of the cold-climate Bory Stobrawskie Dune Field (SW Poland): Evidence for multi-phase Lateglacial aeolian activity within the European Sand Belt
  58. Adopting a new approach for finding missing people using GIS techniques: A case study in Saudi Arabia’s desert area
  59. Geological earthquake simulations generated by kinematic heterogeneous energy-based method: Self-arrested ruptures and asperity criterion
  60. Semi-automated classification of layered rock slopes using digital elevation model and geological map
  61. Geochemical characteristics of arc fractionated I-type granitoids of eastern Tak Batholith, Thailand
  62. Lithology classification of igneous rocks using C-band and L-band dual-polarization SAR data
  63. Analysis of artificial intelligence approaches to predict the wall deflection induced by deep excavation
  64. Evaluation of the current in situ stress in the middle Permian Maokou Formation in the Longnüsi area of the central Sichuan Basin, China
  65. Utilizing microresistivity image logs to recognize conglomeratic channel architectural elements of Baikouquan Formation in slope of Mahu Sag
  66. Resistivity cutoff of low-resistivity and low-contrast pays in sandstone reservoirs from conventional well logs: A case of Paleogene Enping Formation in A-Oilfield, Pearl River Mouth Basin, South China Sea
  67. Examining the evacuation routes of the sister village program by using the ant colony optimization algorithm
  68. Spatial objects classification using machine learning and spatial walk algorithm
  69. Study on the stabilization mechanism of aeolian sandy soil formation by adding a natural soft rock
  70. Bump feature detection of the road surface based on the Bi-LSTM
  71. The origin and evolution of the ore-forming fluids at the Manondo-Choma gold prospect, Kirk range, southern Malawi
  72. A retrieval model of surface geochemistry composition based on remotely sensed data
  73. Exploring the spatial dynamics of cultural facilities based on multi-source data: A case study of Nanjing’s art institutions
  74. Study of pore-throat structure characteristics and fluid mobility of Chang 7 tight sandstone reservoir in Jiyuan area, Ordos Basin
  75. Study of fracturing fluid re-discharge based on percolation experiments and sampling tests – An example of Fuling shale gas Jiangdong block, China
  76. Impacts of marine cloud brightening scheme on climatic extremes in the Tibetan Plateau
  77. Ecological protection on the West Coast of Taiwan Strait under economic zone construction: A case study of land use in Yueqing
  78. The time-dependent deformation and damage constitutive model of rock based on dynamic disturbance tests
  79. Evaluation of spatial form of rural ecological landscape and vulnerability of water ecological environment based on analytic hierarchy process
  80. Fingerprint of magma mixture in the leucogranites: Spectroscopic and petrochemical approach, Kalebalta-Central Anatolia, Türkiye
  81. Principles of self-calibration and visual effects for digital camera distortion
  82. UAV-based doline mapping in Brazilian karst: A cave heritage protection reconnaissance
  83. Evaluation and low carbon ecological urban–rural planning and construction based on energy planning mechanism
  84. Modified non-local means: A novel denoising approach to process gravity field data
  85. A novel travel route planning method based on an ant colony optimization algorithm
  86. Effect of time-variant NDVI on landside susceptibility: A case study in Quang Ngai province, Vietnam
  87. Regional tectonic uplift indicated by geomorphological parameters in the Bahe River Basin, central China
  88. Computer information technology-based green excavation of tunnels in complex strata and technical decision of deformation control
  89. Spatial evolution of coastal environmental enterprises: An exploration of driving factors in Jiangsu Province
  90. A comparative assessment and geospatial simulation of three hydrological models in urban basins
  91. Aquaculture industry under the blue transformation in Jiangsu, China: Structure evolution and spatial agglomeration
  92. Quantitative and qualitative interpretation of community partitions by map overlaying and calculating the distribution of related geographical features
  93. Numerical investigation of gravity-grouted soil-nail pullout capacity in sand
  94. Analysis of heavy pollution weather in Shenyang City and numerical simulation of main pollutants
  95. Road cut slope stability analysis for static and dynamic (pseudo-static analysis) loading conditions
  96. Forest biomass assessment combining field inventorying and remote sensing data
  97. Late Jurassic Haobugao granites from the southern Great Xing’an Range, NE China: Implications for postcollision extension of the Mongol–Okhotsk Ocean
  98. Petrogenesis of the Sukadana Basalt based on petrology and whole rock geochemistry, Lampung, Indonesia: Geodynamic significances
  99. Numerical study on the group wall effect of nodular diaphragm wall foundation in high-rise buildings
  100. Water resources utilization and tourism environment assessment based on water footprint
  101. Geochemical evaluation of the carbonaceous shale associated with the Permian Mikambeni Formation of the Tuli Basin for potential gas generation, South Africa
  102. Detection and characterization of lineaments using gravity data in the south-west Cameroon zone: Hydrogeological implications
  103. Study on spatial pattern of tourism landscape resources in county cities of Yangtze River Economic Belt
  104. The effect of weathering on drillability of dolomites
  105. Noise masking of near-surface scattering (heterogeneities) on subsurface seismic reflectivity
  106. Query optimization-oriented lateral expansion method of distributed geological borehole database
  107. Petrogenesis of the Morobe Granodiorite and their shoshonitic mafic microgranular enclaves in Maramuni arc, Papua New Guinea
  108. Environmental health risk assessment of urban water sources based on fuzzy set theory
  109. Spatial distribution of urban basic education resources in Shanghai: Accessibility and supply-demand matching evaluation
  110. Spatiotemporal changes in land use and residential satisfaction in the Huai River-Gaoyou Lake Rim area
  111. Walkaway vertical seismic profiling first-arrival traveltime tomography with velocity structure constraints
  112. Study on the evaluation system and risk factor traceability of receiving water body
  113. Predicting copper-polymetallic deposits in Kalatag using the weight of evidence model and novel data sources
  114. Temporal dynamics of green urban areas in Romania. A comparison between spatial and statistical data
  115. Passenger flow forecast of tourist attraction based on MACBL in LBS big data environment
  116. Varying particle size selectivity of soil erosion along a cultivated catena
  117. Relationship between annual soil erosion and surface runoff in Wadi Hanifa sub-basins
  118. Influence of nappe structure on the Carboniferous volcanic reservoir in the middle of the Hongche Fault Zone, Junggar Basin, China
  119. Dynamic analysis of MSE wall subjected to surface vibration loading
  120. Pre-collisional architecture of the European distal margin: Inferences from the high-pressure continental units of central Corsica (France)
  121. The interrelation of natural diversity with tourism in Kosovo
  122. Assessment of geosites as a basis for geotourism development: A case study of the Toplica District, Serbia
  123. IG-YOLOv5-based underwater biological recognition and detection for marine protection
  124. Monitoring drought dynamics using remote sensing-based combined drought index in Ergene Basin, Türkiye
  125. Review Articles
  126. The actual state of the geodetic and cartographic resources and legislation in Poland
  127. Evaluation studies of the new mining projects
  128. Comparison and significance of grain size parameters of the Menyuan loess calculated using different methods
  129. Scientometric analysis of flood forecasting for Asia region and discussion on machine learning methods
  130. Rainfall-induced transportation embankment failure: A review
  131. Rapid Communication
  132. Branch fault discovered in Tangshan fault zone on the Kaiping-Guye boundary, North China
  133. Technical Note
  134. Introducing an intelligent multi-level retrieval method for mineral resource potential evaluation result data
  135. Erratum
  136. Erratum to “Forest cover assessment using remote-sensing techniques in Crete Island, Greece”
  137. Addendum
  138. The relationship between heat flow and seismicity in global tectonically active zones
  139. Commentary
  140. Improved entropy weight methods and their comparisons in evaluating the high-quality development of Qinghai, China
  141. Special Issue: Geoethics 2022 - Part II
  142. Loess and geotourism potential of the Braničevo District (NE Serbia): From overexploitation to paleoclimate interpretation
Downloaded on 10.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/geo-2022-0590/html
Scroll to top button