Intelligent implementation of muscle strain identification algorithm in Mi health exercise induced waist muscle strain

Wei Hu; Hao Zhong; Changyong Wang

doi:10.1515/nleng-2025-0122

Artikel Open Access

Intelligent implementation of muscle strain identification algorithm in Mi health exercise induced waist muscle strain

Wei Hu , Hao Zhong und Changyong Wang

Veröffentlicht/Copyright: 21. August 2025

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Nonlinear Engineering Band 14 Heft 1

Abstract

Due to the need for doctors to assist in making treatment strategies for the diagnosis results of B-ultrasound medical images, the diagnostic efficiency is relatively low. In order to improve the accuracy and efficiency of identifying lumbar muscle strain, based on the You Only Look Once version 3 algorithm, an intelligent recognition filtering algorithm combining sliding window and histogram of oriented gradient features is proposed to avoid misdiagnosis caused by artifact interference. The results show that the proposed algorithm has an accuracy of 95.2% and a recall rate of 94.7% when applied to hospital standard equipment. The proposed algorithm can effectively reduce the possibility of misdiagnosis and improve the reliability of clinical diagnosis. At the same time, this algorithm can effectively identify cases of lumbar muscle strain, reduce missed diagnosis, and ensure that patients can receive timely treatment. The proposed algorithm has an F1 score of 94.9% for standard devices, 95.7% for high-resolution devices, and 93.4% for mobile ultrasound devices. This algorithm can maintain good recognition performance on different devices and has strong adaptability. This algorithm can be widely applied in the ultrasound diagnosis of lumbar muscle strains in hospitals, providing effective decision support for doctors and ensuring that patients can receive the necessary medical services in a timely manner. Meanwhile, it provides important support for precision medicine and personalized treatment in the medical field, which is expected to improve the health and life quality of patients.

Keywords: identification of waist muscle strain; Mi health exercise; Yolov3 algorithm; HOG features; sliding window

1 Introduction

Mi health exercise can enhance body coordination and flexibility. However, excessive exercise or forced training without sufficient warm-up can lead to waist muscle strain (WMS). WMS refers to muscle or tendon damage caused by excessive or incorrect use of lower back muscles. Early and accurate identification of muscle strains is crucial for effective intervention and prevention of further damage [1]. In current clinical practice, the identification of WMS mainly relies on the experience and subjective evaluation of doctors. This manual recognition method is easily influenced by factors such as doctor experience and fatigue level, which may lead to misdiagnosis or missed diagnosis. The current You Only Look Once version 3 (Yolov3) object detection method has the problem of low accuracy in processing B-ultrasound images, and may not be able to accurately identify the specific location and degree of damage. Yolov3 can only serve as a preliminary identification tool and has certain limitations in terms of high precision and efficient diagnosis [2]. The histogram of oriented gradient (HOG) feature is a feature descriptor used for object detection in computer vision and image processing. HOG features are constructed by calculating and statistically analyzing the gradient direction histograms of local regions in an image, mainly used for object detection and recognition. Sliding window (SW) and HOG feature descriptors are a commonly used combination of techniques for object detection and recognition in image processing and computer vision [3]. Due to the problem of artifact interference that traditional YOLOv3 methods are prone to when processing B-ultrasound images, an innovative research proposes a hierarchical intelligent recognition method based on YOLOv3 by combining SW and HOG features. This innovative method can greatly improve the diagnostic accuracy of WMS while ensuring efficiency, effectively reducing misdiagnosis and missed diagnosis. The research aims to provide a more efficient and accurate WMS diagnostic method, providing a new approach for intelligent diagnosis of medical impact and promoting the development of precision medicine. This research mainly includes the following content. Section 2 is a review of medical image recognition technology and Yolov3 related research. Section 3 introduces the intelligent implementation of WMS recognition algorithm combining Yolov3 and HOG features. Section 3.1 is based on Yolov3 for WMS detection and recognition. Section 3.2 is the WMS filtering method for SW and HOG features. Section 4 is the analysis of the intelligent implementation results of the WMS recognition algorithm that integrates Yolov3 and HOG features. Section 4.1 is about the WMS recognition and detection performance based on Yolov3. Section 4.2 shows the WMS filtering effect of SW and HOG features. Section 5 is the conclusion of intelligent implementation of WMS recognition algorithm combining Yolov3 and HOG features.

2 Related works

In current medicine, image recognition technology is an important research direction. Some researchers have conducted extensive research on disease prevention and personalized treatment. Yang et al. developed an automatic image recognition software to evaluate slit lamp photos at different magnifications for measuring tear meniscus height. This software had good effectiveness and high accuracy [4]. Gu et al. constructed a learning framework for metastasis to improve the diagnostic efficiency of epithelial cancer. This method learned the distinguishing features of a single confocal laser endoscopic framework. The variable length and irregular shape processing of feature concatenation was carried out, indicating that the accuracy of the framework was 84.1% [5]. Dourado et al. developed a framework for open medical Internet of Things to address the online medical image recognition. This method analyzed images of cerebrovascular accidents, pulmonary nodules, and skin. This framework had an accuracy rate of 92% and had a certain reliability and effectiveness [6]. To classify COVID-19 in chest X-ray images, Abbas et al. proposed a convolutional neural network (CNN) for decomposition, transfer, and synthesis. Any irregularity in the image dataset was processed, and the accuracy of this method was 93.1% [7]. Zhang et al. aimed at the effective diagnosis of carotid atherosclerosis disease, used geodesic star constraint graph to cut and segment lumen boundaries to improve the accuracy, and recognized the lumen through support vector machine. This method had certain accuracy and flexibility [8]. Valkonen et al. designed a digital mask for automatic epithelial cell detection using deep CNN to facilitate the diagnosis of breast cancer. The visual evaluation of images achieved good differentiation of epithelial cells, indicating the practicality and effectiveness of this method [9].

Yolov3 has great application value in video surveillance, traffic detection, and medical image analysis. Various researchers have studied this topic. Wang and Liu designed a Yolov3 combined with hybrid dilated convolution method to address background noise in the recognition of partially occluded targets in complex backgrounds. The increase in parameter quantity was suppressed, and the system’s recognition ability for occluded targets was improved, indicating that the accuracy of this method was 90.36% [10]. Cao et al. used MobileNet and spatial pyramid pooling to improve Yolov3 for identifying microalgae species, reducing positional errors in detecting small targets and enhancing detection performance. The average accuracy of this method was 98.90% [11]. Jia et al. developed an improved Yolov3 network to enhance the model’s generalization ability to pixel features for the detection of abnormal cervical cells. The sensitivity of this network was improved, indicating a mAP rate of 78.87% [12]. Zou et al. used Yolov3 and CNN encoders to detect and extract license plate positions for license plate detection and recognition. Accurate positioning and character recognition were achieved, indicating that this method performed well [13]. Chun et al. focused on the face detection problem in complex environments and used Yolov3 to search for classification datasets and set score values to find prior boxes. Multiple sets of images were predicted and the optimal score value was found, indicating that this method had good performance [14]. Taheri Tajar et al. developed a lightweight Tiny Yolov3 vehicle detecting method to recognize, locate, and classify vehicles in video sequences. After pruning, simplifying training, and eliminating unnecessary layers, the MAP of this method reached 95.05% [15].

In summary, the current medical image recognition technology has made remarkable progress. However, there are still some challenges and limitations. To enhance the WMS medical image recognition’s accuracy and efficiency, an intelligent recognition algorithm for Yolov3 and HOG features is designed. The aim of this study is to expand the application of intelligent algorithms in disease diagnosis, treatment planning, and surgical assistance, in combination with clinical practical needs.

3 Yolov3 combined with HOG feature WMS intelligent detection algorithm design

This study achieves detection and recognition of WMS medical images based on Yolov3. Simultaneously combining SW and HOG features, a WMS filtering method is determined to ensure intelligent recognition of WMS medical images. The workflow of the proposed algorithm is shown in Figure 1.

Figure 1

The workflow of the proposed algorithm.

3.1 Detection and recognition of WMS based on Yolov3

The general method for examining WMS caused by Mi health exercise is for doctors to press on the patient’s lower back and perform B-ultrasound examination on the tenderness points. For the obtained B-ultrasound images, the study chose to extract key feature information through the Yolov3 algorithm to achieve preliminary recognition of WMS. The advantage of this method is that it can recognize multiple targets simultaneously and has high accuracy and real-time performance. Figure 2 shows the WMS recognition algorithm.

Figure 2

Identification process of WMS.

In Figure 2, in the object detection network, the collected B-ultrasound images are received as input, and low confidence and high confidence regions are filtered out. In the filtering network, low confidence regions and reference regions are input and filtered again. The ultimate goal is to obtain the area and high confidence areas after re-filtering, where the high confidence areas are the recognition regions that still meet the confidence threshold after filtering. Yolov3 is selected for preliminary identification, and its backbone network structure is Darknet-53. Darknet-53 mainly has convolutional layers, residual modules, and fully connected layers. The design of Darknet-53 draws inspiration from ResNet and DenseNet, with a deeper network structure and stronger feature extraction capabilities. This enables Yolov3 to better capture semantic information in images and improve the object detecting accuracy [16]. Compared to Yolov2, Yolov3 has changes in using residual models and the Feature Pyramid Network (FPN) for multi-scale object detection. Yolov3’s feature extractor is a residual model, containing 53 convolutional layers. Compared to Darknet-19, which uses residual units, Yolov3 is able to construct deeper structures. Figure 3 shows the Yolov3 network structure.

Figure 3

Schematic diagram of Yolov3 network structure.

In Figure 3, input a B-ultrasound image with a size of 544 × 544, and the Yolov3 network extracts feature maps of different scales and fuses them with the upsampled feature maps to output three predicted feature maps of different sizes. Yolov3 mostly has convolutional layers, batch processing, and Leaky ReLU activation functions, namely, CBL blocks. Leaky ReLU can prevent neuronal necrosis in the ReLU function. The initialization neuron selects a small value similar to 0.01 to ensure that ReLU remains active in the negative region. There are three prediction layers, and candidate boxes are evenly allocated to each prediction layer. Each candidate box information includes a tensor of bounding box position and confidence, as well as target category information. The loss function of Yolov3 includes candidate box position loss L can , confidence loss L conf , and classification loss L class . L conf refers to the loss function used in object detection tasks to measure the difference between the model’s confidence in whether the predicted box contains the target object and the true label. L conf is represented by Eq. (1) [17].

(1) L conf = ∑ i = 0 S 2 ∑ j = 0 B I i j [ C ˆ i j log ( C i j ) + ( 1 − C ˆ i j ) log ( 1 − C ˆ i j ) ] + λ ∑ i = 0 S 2 ∑ j = 0 B I i j [ C ˆ i j log ( C i j ) + ( 1 − C ˆ i j ) log ( 1 − C ˆ i j ) ] ,

where I i j represents the ( i , j ) th candidate box containing the detection target. C i j refers to the confidence level of candidate boxes. C ˆ i j is the confidence level of the reference box, usually taken as 0 or 1. λ is a parameter other than the loss weight of the target candidate box. L class is represented by Eq. (2).

(2) L class = ∑ i = 0 S 2 I i j ∑ c ∈ classes [ P ˆ i j ( c ) log ( P i j ( c ) ) + ( 1 − P ˆ i j ( c ) ) log ( 1 − P i j ( c ) ) ] ,

where P i j ( c ) stands for a probability that the ith grid’s jth candidate box belongs to class c . P ˆ i j is a true value for a category. The filtering network achieves image classification through the ResNet network. The template area of the candidate area is used as input, and the corresponding reference area is used to synthesize an image with a size of 128 × 128. The Hinge function is selected as the loss function in the filtering network, which can play a crucial role in maximum interval classification, represented by Eq. (3) [18].

(3) Loss ( y ) = max ( 0 , 1 − y ⋅ y ∧ ) ,

where y is the true label of the sample, y ∈ { − 1 , 1 } . y ∧ is the model’s predicted value. When the sample is correctly classified, the loss is 0. When the sample is misclassified, the loss is proportional to its distance to the hyperplane, i.e., 1 − y ⋅ y ∧ . To evaluate the overall algorithmic detection accuracy and individual targets’ detection accuracy, mean average precision (mAP) and AP are selected. Figure 4 shows the solution of AP.

Figure 4

Schematic diagram of the AP value solving process.

In Figure 4, the evaluation of object detection requires the predicted box and corresponding true label box for each image. For each image, the true label values (GTs) mainly include the four true values of the border position and the object category. Its predicted values (Dets) mainly include four predicted values: confidence, object category, and border position. By determining Dets and the actual GTs, AP can be calculated.

3.2 Method for filtering WMS based on SW and HOG features

In the WMS recognition process, the target detection network preliminarily identifies the possible strain areas of each muscle ultrasound image, and then filters the low confidence areas in the target detection network again through a filtering network. HOG is a method used to describe images’ local texture features. It divides the image into small local regions and calculates the gradient direction histogram within each region. These histograms can represent the texture features of local regions, used to represent the shape and appearance of the target object. Figure 5 shows the HOG feature extraction.

Figure 5

Schematic diagram of the HOG feature extraction process.

In Figure 5, the image is first pre-processed in the detection window, and then normalized to increase data diversity and prevent overfitting. After completing the gradient calculation, each block is projected with a specified weight, and the overlapping blocks are normalized for contrast. The histogram vectors within all blocks are combined to form HOG feature vectors. The horizontal and vertical gradients of pixels with position ( x , y ) in the image are represented by Eq. (4) [19].

(4) G x ( x , y ) = I ( x + 1 , y ) − I ( x − 1 , y ) G y ( x , y ) = I ( x , y + 1 ) − I ( x , y − 1 ) ,

where I ( x , y ) means a pixel’s grayscale. The gradient value is represented by Eq. (5).

(5) r ( x , y ) = G x ( x , y ) 2 + G y ( x , y ) 2 .

The direction of gradient values is represented by Eq. (6).

(6) θ ( x , y ) = arctan G y ( x , y ) G y ( x , y ) .

Considering the unsigned range limitation in the gradient direction, the gradient direction is represented by Eq. (7).

(7) θ ( x , y ) = θ ( x , y ) + π , θ ( x , y ) < 0 θ ( x , y ) , others .

In order to ensure consistency between the candidate area and its corresponding reference area in spatial location, the study chose to add SW to search for reference areas. SW is a search technique used to move a fixed size window in an image and perform object detection at each position. This window is usually rectangular, but its shape and size are arbitrary. At each position in SW, to determine the presence of the target object, feature extraction and classification are performed on the image regions within the window. SW is a search technique used to move a fixed size window in an image and perform object detection at each position. This window is usually rectangular, but its shape and size are arbitrary. At each position in SW, to determine the presence of the target object, feature extraction and classification are performed on the image regions within the window. When using SW and HOG feature descriptors for object detection, first define a fixed size window, and then slide this window to each position in the image. At each position, the image area within the window will be used to extract HOG feature descriptors. The classifier determines the presence of the target object in the window based on the trained model and provides corresponding discrimination results. The combination of SW and HOG feature descriptors can effectively search for target objects in images. Train the HOG feature extractor and SW to extract effective features from the detected regions. Figure 6 shows the WMS medical image reference and candidate regions.

Figure 6

Candidate area (a) and reference area (b) for damaged areas.

In Figure 6, the red box in the reference area expands outward to a green box. The height of the outward expansion is twice the blue box’s height H in the candidate area. The width slightly expands within a certain limit. Assuming the width of the candidate area in Figure 6(a) is W , the height H G of the reference area’s green box is represented by Eq. (8).

(8) H G = max ( h × h m , H min ) ,

where h _m refers to the green box’s height magnification in the reference area on the candidate area’s height. H min restricts the green box’s height in the reference area to be infinitely small. The green box’s width W _G in the reference area is represented by Eq. (9).

(9) W G = min ( max ( w × w m , W min ) , W max ) ,

where w _m is a magnification factor of the green box’s width in the reference area on the width of the candidate area. W min and W max refer to infinitely small and infinitely large limits on the width of the green box in the reference area, respectively. After determining the reference area green box as the template area, a mask of the same size is obtained. The mask only preserves the texture processing around the damaged area and prevents interference from the damaged area. A mask is usually a binary image. The pixel value of the damaged area is 0, indicating the area that is obstructed or damaged, while the value of the remaining pixels is 1, indicating the reserved area. Subsequently, the SW search is completed on the corresponding reference image, and the window size and the green box’s overlap rate in a reference area is set to 0.95. After determining the search area, the image HOG texture features are extracted. During the sliding process in the search area, SW takes out the window image and multiplies it with the mask to obtain the new window image. Image HOG texture feature extracting is carried out on the new window image. The relevant cosine similarity is represented by Eq. (10) [20].

(10) cos ( A , B ) = A ⋅ B ∥ A ∥ 2 ∥ B ∥ 2 ,

where A and B are the HOG feature vectors of the new window image and the reference area green box, respectively. Based on cosine similarity, the cosine distance between A and B is calculated by Eq. (11).

(11) dist ( A , B ) = 1 − cos ( A , B ) = 1 − A ⋅ B ∥ A ∥ 2 ∥ B ∥ 2 .

After the initial recognition network of Yolov3, as well as the training and testing of HOG features and SW filtering networks, the best integrated model was obtained. The integrated model is jointly trained and optimized to identify the strain area in medical images of lumbar muscle strain, ensuring the efficiency and accuracy of the overall algorithm. Input WMS image samples X 1 and X 2 , pass through the preliminary network and filtering network in sequence, and map to high-dimensional spaces X 1 i and X 2 i . Finally, measure the similarity between the two samples and calculate the distance D w between the two samples, as shown in Eq. (12).

(12) D w ( X 1 , X 2 ) = ∥ X 1 − X 2 ∥ = ∑ i = 1 p ( X 1 i − X 2 i ) 2 1 2 ,

where p is the tensor.

4 Performance analysis of WMS recognition system based on multimodal features

The study separately tested the WMS recognition performance based on Yolov3 and the WMS filtering performance combining SW and HOG features, and finally validated the Yolov3 HOG algorithm.

4.1 Recognition and detection effect of WMS based on Yolov3

This study conducted experiments on object detection and filtering networks on a server in the laboratory. This experiment was conducted using the CentOS Linux release 7.6.1810 system, with Intel Xeon E5-2683 v3 CPU processor, NVIDIA Geforce GTX 1080 GPU, Python 3.7.9, MXnet 1.7.0 installation, and Adam optimization algorithm. The training and testing sets were divided into 80 and 20%, respectively. The input image size is 544 × 544, the batch size value for training is 20, the iteration is 400, the weight attenuation is 0.0005, and the momentum is 0.9. This study selected the COCO dataset to determine the research method’s detecting effectiveness. Figure 7 shows the mAP results of five detections and the training loss changes under different learning rates.

Figure 7

The detection effect of the research method. (a) mAP comparison. (b) The value of the loss at different learning rates.

In Figure 7(a), the mAP was 0.4006, indicating that Yolov3 had a good ability to detect WMS regions. In Figure 7(b), regardless of the learning rate, the loss curve decreased rapidly before five iterations. When the learning rate was 0.0001, the convergence loss was about 2.1, and the change curve was relatively smooth, indicating a relatively stable convergence. When the learning rate was 0.0002, the convergence loss was approximately 2.8. When the learning rate was 0.0003, the convergence loss was about 3.0, and the change curve was relatively smooth. Therefore, when the learning rate was 0.0001, the algorithm performance improved significantly. To verify the training and testing sets of Yolov3, all other experimental settings were kept consistent and compared with Yolov2. Figure 8 shows a comparison of accuracy results.

Figure 8

Accuracy results for different model training sets and test sets. (a) Yolov2. (b) Yolov3.

In Figure 8(a), the accuracy in the Yolov2 training set was 98.73%. There was an abnormal improvement in the curve changes during 75 iterations in the test set, considering the randomness of these detection results. In Figure 8(b), the accuracy value in the Yolov3 training set was 99.25%. The convergence was achieved with an accuracy value of 93.42% in the test set, indicating good detection performance of Yolov3.

4.2 WMS comprehensive model filtering effect

In the test, to ensure that the reference area green box’s height and width range were appropriate, this experimental settings were H min = 200 , W min = 70 , W max = 150 , W G ∈ [ 70 , 150 ] , h m = 2 , w m = 2 , H G = 200 . In order to verify the filtering effect of the integrated model on WMS medical images, the AP loss values under five tests were compared with those of HOG features. Meanwhile, in order to further determine the applicability of the recognition network, the YOLOv8, RetinaNet, EfficientDet, and Detection TRansformer (DETR) models were validated on the Ultrasound Dataset and FASCICLE datasets. RetinaNet has introduced Focal Loss to address class imbalance and performs well in detecting small targets. DETR introduces the Transformer architecture for object detection, providing an end-to-end detection approach. EfficientDet selects EfficientNet as the backbone network and introduces a weighted bidirectional feature pyramid network to complete multi-scale feature integration. The Ultrasound Dataset is used to assist in diagnostic and classification tasks, while FASCICLE is used to analyze muscle weaknesses and prevent injuries. The comparison between AP loss value and mAP result is shown in Figure 9.

Figure 9

Comparison of AP loss value and mAP result. (a) Comparison of AP. (b) Comparison of different datasets.

In Figure 9(a), the multiple detection AP of SW and HOG features on the image was 82.5%, indicating that this method had high accuracy in detecting the position and shape of the target in the image. Compared to HOG features, the missed detection rate of SW and HOG features was relatively low, with an average of 0.122% of targets mistakenly judged as undetected. The measurement results showed little variation under different frequencies and were relatively stable. Multiple uses of HOG features for detection could yield consistent results, which had certain reliability and stability. In Figure 9(b), the comprehensive model has the highest mAP on the Ultrasound Dataset and FASCICLE, with values of 93.6 and 94.5%, respectively. Compared to Yolov8, the SW of the integrated model can supplement Yolov3's blind spot in detecting small targets. In both datasets, the HOG features of the integrated model excel at capturing local texture and edge information. The YOLOv8 model is relatively advanced, but it requires high device configuration and large memory, making it unsuitable for detecting image details. The Yolov2 model has the smallest mAP on the Ultrasound Dataset, only 77.7%. But RetinaNet model training is slow and the computation time is long. To achieve preliminary detection and classification of WMS, the integrated model was pre-trained on the dataset. Figure 10 shows the training accuracy and loss variation curves obtained.

Figure 10

Training accuracy and loss curves. (a) Training accuracy. (b) Training loss.

In Figure 10(a), the accuracy of the Yolov3-HOG model improves rapidly after five iterations, and the model begins to effectively learn features from the data. Due to the adaptability of the model to changes in data distribution and characteristics during the gradual optimization process, it exhibits fluctuating changes within 50 iterations. At 35 iterations, the accuracy of the model began to steadily increase and eventually reached a high accuracy of 99.80% at the 35th iteration. At this point, the training of the model is approaching convergence and can stably perform high-precision WMS detection. The change in the loss curve in Figure 10(b) can further validate the effectiveness of the model. In the early stages of training, the loss value rapidly decreases, and the model can quickly optimize and reduce prediction errors in the initial stage. At 35 iterations, the loss value converged to 0.006. At this point, the model has fully learned the features of the data and has strong generalization ability. To verify the integrated model’s effectiveness, the detection and filtering network Yolov3 HOG was compared with Faster R-CNN, U-Net, and EfficientDet detection networks. The training and validation accuracy of different models are compared in Figure 11.

Figure 11

Comparison of training and validation accuracy. (a) Training accuracy. (b) Verify accuracy.

In Figure 11(a), the training accuracy of the integrated model showed little fluctuation and was relatively stable, with an accuracy of 98.25%, indicating the best performance. Faster R-CNN, U-Net, and EfficientDet were 97.17, 97.58, and 96.73%, respectively. In Figure 11(b), the validation accuracy of the integrated model was 97.63%, indicating the best generalization performance. However, Faster R-CNN exhibited significant fluctuations in validation accuracy due to excessive noise and exhibited an unstable state as iterations increased. The validation accuracy curves of U-Net and EfficientDet models were relatively stable, with validation accuracies of 91.64 and 94.63%, respectively. Table 1 presents the L can , L conf , and L class for these four models during training.

Table 1

Comparison of various types of loss values for different models

Model	Integrated model	Faster R-CNN	U-Net	EfficientDet
L can	0.0270	0.0269	0.0271	0.0272
L conf	0.0338	0.0341	0.0344	0.0341
L class	0.0067	0.0070	0.0068	0.0069
Total losses	0.0676	0.0681	0.0684	0.0683

In Table 1, the integrated model had the smallest L conf and L class of 0.0338 and 0.0067, respectively, and the lowest total loss value of 0.0676. This indicated that the integrated model had a higher accuracy in predicting targets within the bounding box and improved the accuracy of predicting target categories within the bounding box, with better performance. The L can of Faster R-CNN was the smallest, 0.0269. To ensure the generalization ability of the integrated model, the accuracy and loss results in two sets are compared in Figure 12.

Figure 12

Accuracy and loss results of the integrated model. (a) Accuracy comparison. (b) Loss comparison.

In Figure 12(a), when the iteration was 10, the accuracy rapidly improved. At this point, the integrated model had a faster learning speed. The reason is that HOG features further enhance the model’s ability to understand the shape and texture of objects in the image. The accuracy showed significant fluctuations in the training set, which was relatively smooth in the other set. The integrated model’s accuracy in the validation set was 97.63%. In Figure 12(b), the integrated model loss curve was relatively smooth in the validation set, converging to a final loss value of 0.16, indicating good generalization ability. The fluctuation in the training set was significant because the model was in a learning state and its parameters were constantly adjusted to adapt to the training data. It gradually converged after 100 iterations, and the final loss value was 0.58.

The study chose to collect 1,000 ultrasound image data from different hospitals and equipment, including samples of patients of different ages, genders, and body types. Before conducting model training, the image data underwent standardized processing steps such as resizing, denoising, and enhancement. In order to ensure data quality, all images have undergone strict screening and inspection, removing some low-quality images caused by equipment failures, image blurring, or patient movement. All filtered images meet the standards to ensure data consistency during the training process. Divide the training and testing sets in a 4:1 ratio. The proposed algorithm is applied to practical medical assistance, and WMS diagnosis is performed on standard equipment, high-resolution equipment, and mobile B-ultrasound equipment, respectively. The evaluation index results obtained are shown in Table 2.

Table 2

Test results

Equipment	Standard equipment	High resolution equipment	Mobile B-ultrasound equipment
Accuracy	0.952	0.961	0.938
Recall	0.947	0.954	0.931
F1 score	0.949	0.957	0.934

In Table 2, the accuracy of the proposed algorithm in recognizing B-ultrasound image data in different devices exceeds 90%, indicating that the algorithm exhibits good scalability and adaptability. Under high-resolution equipment, the proposed algorithm achieved the highest accuracy of 96.1%, indicating that high-quality imaging equipment can further enhance the recognition ability of the algorithm.

5 Conclusion

This study designed an intelligent recognition algorithm for Yolov3 and HOG features to enhance WMS medical image recognition’s accuracy and efficiency. Among these results, the mAP of Yolov3 detection method was 0.4006, indicating its good ability to detect WMS regions. The proposed model has good performance during the training phase, with a training accuracy of 98.25%. The training accuracies of Faster R-CNN, U-Net, and EfficientDet are 97.17, 97.58, and 96.73%, respectively, all lower than the proposed models. In comparison, the proposed models improved by 1.08, 0.67, and 1.52%, respectively. In terms of validation accuracy, the proposed model still has the highest accuracy compared to other models, reaching 96.81%. This result is not only higher than the volatility validation accuracy of Faster R-CNN, but also significantly better than U-Net’s 91.64% and EfficientDet’s 94.63%. The proposed model has stronger generalization ability and stability when dealing with complex data. This indicated that the accuracy and efficiency of the proposed model were improved in WMS medical image recognition. This method has important clinical significance and can provide medical professionals with more efficient and accurate diagnostic tools, thereby shortening diagnosis time, reducing human errors, and ultimately improving patient treatment outcomes. However, due to the short experimental time, the study did not use Transformer or graph neural network architectures to process complex image data. Meanwhile, the real-time implementation of the model also faces challenges of data transmission and processing delays. In medical settings, timely diagnosis is crucial for the treatment of patients. In the future, integration methods of different models can be considered to promote the development of precision medicine. And optimize data flow and processing flow to reduce latency and improve response speed.

Funding information: Authors state no funding involved.
Author contributions: Wei Hu: Wei Hu is the primary author and corresponding author of this study. He contributed significantly to the conceptualization, methodology, and overall design of the research. He also took the lead in data analysis and manuscript writing. Hao Zhong: Hao Zhong contributed to the data collection and analysis process. He also assisted in the development of the research framework and in the writing and revision of the manuscript. Changyong Wang: Changyong Wang contributed to the literature review and helped refine the methodology. He also played a key role in reviewing and editing the manuscript for clarity and consistency. All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Conflict of interest: Authors state no conflict of interest.
Data availability statement: All data generated or analyzed during this study are included in this published article.

References

[1] Li YJ, Zhuang WS, Cai XG, Yang Y, Han MM, Zhang DW. Effect of acupuncture at “three points of iliolumbar” on lumbar function and pain in patients with iliopsoas muscle strain. Zhongguo Zhen jiu = Chin Acupunct Moxibustion. 2019;39(12):1279–82.Suche in Google Scholar

[2] Khan D, Pandey A, Kumar PR, Khan S. Analysis of human detection system. Ann Rom Soc Cell Biol. 2021;25(6):7239–45.Suche in Google Scholar

[3] Talukdar RG, Mukhopadhyay KK, Dhara S, Gupta S. Numerical analysis of the mechanical behaviour of intact and implanted lumbar functional spinal units: Effects of loading and boundary conditions. Proc Inst Mech Eng Part H: J Eng Med. 2021;235(7):792–804.10.1177/09544119211008343Suche in Google Scholar PubMed

[4] Yang J, Zhu X, Liu Y, Jiang X, Fu J, Ren X, et al. TMIS: a new image-based software application for the measurement of tear meniscus height. Acta Ophthalmol. 2019;97(7):e973–80.10.1111/aos.14107Suche in Google Scholar PubMed

[5] Gu Y, Vyas K, Yang J, Yang GZ. Transfer recurrent feature learning for endomicroscopy image recognition. IEEE Trans Med Imaging. 2019;38(3):791–801.10.1109/TMI.2018.2872473Suche in Google Scholar PubMed

[6] Dourado CMJM, Silva SPPD, Nobrega RVMD, Filho PPR, Muhammad K, Albuquerque VHCD. An open IoHT-based deep learning framework for online medical image recognition. IEEE J Sel Areas Commun. 2021;39(2):541–8.10.1109/JSAC.2020.3020598Suche in Google Scholar

[7] Abbas A, Abdelsamea M, Gaber MM. Classification of COVID-19 in chest X-ray images using DeTraC deep convolutional neural network. Appl Intell. 2021;51(2):854–64.10.1007/s10489-020-01829-7Suche in Google Scholar PubMed PubMed Central

[8] Zhang J, Teng Z, Guan Q, He J, Abutaleb W, Patterson AJ, et al. Automatic segmentation of MR depicted carotid arterial boundary based on local priors and constrained global optimisation. IET Image Process. 2019;13(3):506–14.10.1049/iet-ipr.2018.5330Suche in Google Scholar

[9] Valkonen M, Isola J, Ylinen O, Muhonen V, Saxlin A, Tolonen T, et al. Cytokeratin-supervised deep learning for automatic recognition of epithelial cells in breast cancers stained for ER, PR, and Ki-67. IEEE Trans Med Imaging. 2020;39(2):534–42.10.1109/TMI.2019.2933656Suche in Google Scholar PubMed

[10] Wang K, Liu M. YOLOv3-MT: A YOLOv3 using multi-target tracking for vehicle visual detection. Appl Intell. 2022;52(2):2070–91.10.1007/s10489-021-02491-3Suche in Google Scholar

[11] Cao M, Wang J, Chen Y, Wang Y. Detection of microalgae objects based on the Improved YOLOv3 model. Environ Sci: Process Impacts. 2021;23(10):1516–30.10.1039/D1EM00159KSuche in Google Scholar PubMed

[12] Jia D, He Z, Zhang C, Yin W, Wu N, Li Z. Detection of cervical cancer cells in complex situation based on improved YOLOv3 network. Multimed Tools Appl. 2022;81(6):8939–61.10.1007/s11042-022-11954-9Suche in Google Scholar

[13] Zou Y, Zhang Y, Yan J, Jiang X, Huang T, Fan H, et al. License plate detection and recognition based on YOLOv3 and ILPRNET. Signal Image Video Process. 2022;16(2):473–80.10.1007/s11760-021-01981-8Suche in Google Scholar

[14] Chun LZ, Dian L, Zhi JY, Jing W, Zhang C. YOLOv3: face detection in complex environments. Int J Comput Intell Syst. 2020;13(1):1153–60.10.2991/ijcis.d.200805.002Suche in Google Scholar

[15] Taheri Tajar A, Ramazani A, Mansoorizadeh M. A lightweight Tiny-YOLOv3 vehicle detection approach. J Real-Time Image Process. 2021;18(6):2389–401.10.1007/s11554-021-01131-wSuche in Google Scholar

[16] Hasanvand M, Nooshyar M, Moharamkhani E, Selyari A. Machine learning methodology for identifying vehicles using image processing. Artif Intell Appl. 2023;1(3):170–8.10.47852/bonviewAIA3202833Suche in Google Scholar

[17] Wang S, Hai X, Cao Y. Reflective safety clothes wearing detection in hydraulic engineering using YOLOv3-CCD. Asian J Res Comput Sci. 2023;15(2):11–24.10.9734/ajrcos/2023/v15i2316Suche in Google Scholar

[18] Zabihzadeh D, Tuama A, Karami-Mollaee A, Mousavirad SJ. Low-rank robust online distance/similarity learning based on the rescaled hinge loss. Appl Intell (Dordrecht, Netherlands). 2023;53(1):634–57.10.1007/s10489-022-03419-1Suche in Google Scholar PubMed PubMed Central

[19] Wen Z, Curran JM, Wevers HGE. Classification of firing pin impressions using HOG‐SVM. J Forensic Sci. 2023;68(6):1946–57.10.1111/1556-4029.15377Suche in Google Scholar PubMed

[20] Pu S, Zhang F, Fu SW. Microscopic image recognition of diatoms based on deep learning. J Phycol. 2023;59(6):1166–78.10.1111/jpy.13390Suche in Google Scholar PubMed

Received: 2024-09-09

Revised: 2025-02-20

Accepted: 2025-02-28

Published Online: 2025-08-21

This work is licensed under the Creative Commons Attribution 4.0 International License.

Artikel in diesem Heft

https://doi.org/10.1515/nleng-2025-0122

Schlagwörter für diesen Artikel

identification of waist muscle strain; Mi health exercise; Yolov3 algorithm; HOG features; sliding window

Creative Commons

BY 4.0