Analysis of the sports action recognition model based on the LSTM recurrent neural network

Ping Chen; Jiangui Peng

doi:10.1515/nleng-2024-0050

Artikel Open Access

Analysis of the sports action recognition model based on the LSTM recurrent neural network

Ping Chen und Jiangui Peng

Veröffentlicht/Copyright: 25. Februar 2025

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Nonlinear Engineering Band 14 Heft 1

Abstract

With the rapid growth of motion data, the traditional motion recognition algorithm is faced with the problem of insufficient processing ability. To solve this problem, a method based on gradient descent optimization (GDO)–long short-term memory (LSTM) is proposed to meet the needs of sports action recognition. Based on the experiment of sports data set of students in Hainan University, the experiments of skipping rope, swimming, skating, and shotput were carried out extensively. The total number of experiments were 77, 94, 72, and 85. The experimental results show that the accuracies of GDO-LSTM in sports action recognition were 98.7, 100, 100, and 94.1%, respectively, which was superior to that of the three-axis gyroscope (80.5, 40.4, 23.6, and 100%). These results show that the algorithm can effectively improve the accuracy of sports action recognition and has wide application potential.

Keywords: temporal attention; gradient descent method; motion sensor; three-axis magnetometer; nuclear parametric minimization

Notations

CLSTM: convolutional long short-term memory
CNN: convolutional neural network
GDO: gradient descent optimization
KNM: kernel norm minimization
LSTM: long short-term memory
MSE: mean squared error
PSO: particle swarm optimization
SDPA: scaled dot-product attention
SVM: support vector machine
Tanh: hyperbolic tangent

1 Introduction

In recent years, with the widespread adoption of the Internet and smart devices, the speed of motion data generation has increased exponentially. People are paying increasing attention to health and fitness, and the demand for motion recognition technology is increasing. Motor action recognition can not only help individuals monitor their athletic performance but also provide a scientific basis for fitness coaches and even plays a key role in rehabilitation treatment and sports training [1,2]. Especially after the epidemic, the rise of home fitness and the popularity of intelligent sports equipment make people’s demand for sports data analysis increasingly strong. Although the number of motion recognition algorithms is increasing gradually, there are still several problems. Due to the diversity and complexity of motion data, it is difficult for a large number of existing algorithms to effectively process the newly generated data, resulting in greatly reduced information value. Traditional motion recognition methods often rely on manual feature extraction, which is easy to be affected by noise and external interference, so the accuracy of recognition results is not high. In the face of individual differences and different motion modes, the existing models usually show poor generalization ability and cannot adapt to the needs of motion recognition in various situations [3,4]. In this context, a long short-term memory (LSTM) network is widely used in motion recognition because of its excellent time series modeling ability. However, LSTM still has some limitations when dealing with noise and unbalanced data. Therefore, to improve the demand for sports health management and solve the bottleneck of the current action recognition algorithm, the introduction of the gradient descent optimization (GDO) method can effectively improve the algorithm’s processing ability of abnormal data. A GDO-LSTM algorithm based on GDO and LSTM is constructed, which not only increases the structure of weight processing in LSTM but also deepens the scaled dot-product attention (SDPA) in the time attention module, which can avoid the problems in the process of motion. The accuracy of motion recognition is tested by combining four common movements. The aim is to use large amounts of data to train a model that can accurately judge movements, which can help companies not only meet the needs of users but also meet the needs of people in areas such as fitness. The main contribution and discovery of the research is that the advantages of GDO and LSTM networks are combined to significantly improve the effect of motor recognition. It provides a new idea for the future motion detection and analysis system in wearable devices, which helps to promote the development of intelligent fitness and health management. In future practical applications, the research results will support the development of smarter fitness applications, providing personalized exercise advice and feedback to help users achieve fitness goals more effectively. At the same time, it is helpful to improve the training strategy, reduce the risk of injury, and improve the overall competitive level of athletes. Combining social media functions, the development of an interactive fitness platform based on movement recognition can promote communication and motivation among users, thus enhancing their motivation to keep moving.

2 Related works

A very important branch of video processing technology, namely the recognition of human sports actions, has a very important role in physical health, training needs, etc. Xu and Yan analyzed a sports action recognition system based on clustering regression to improve the depth network in order to improve the recognition rates of athletes in sports. Through literature research, they chose a neural network as the basis of the algorithm. The shortcomings of the traditional neural network were also analyzed, and the traditional neural network was improved by combining the sports recognition needs of sports athletes. The network data collection method was used to build a video library of sports players’ movements, and basketball events were analyzed for recognition by feature judgment. The study shows that the recognition rate of basketball action is greatly improved compared with the traditional algorithmic model, and the results validate the significant effectiveness of their proposed improved deep network in the field of human behavior recognition research [5]. Sarabu and Santra proposed a dual-stream network with two convolutional neural networks (CNNs) and convolutional long short-term memory (CLSTM). They used a pre-trained ImageNet model to extract spatio-temporal features using two CNNs. The results of the two CNNs were then combined and fed to the CLSTM to obtain the total classification score. They explored the effect of the fusion function performance feature mapping of the two CNNs and derived the optimal fusion function and number of layers. To avoid the problem of overfitting, they used a data expansion technique. The experimental results show that their proposed model shows substantial improvement over current dual-stream methods [6]. de Albuquerque et al. proposed a bidirectional LSTM-based attention mechanism with an extended CNN that selectively focuses on the effective features in the input frames to recognize different human actions in the video. In this diverse network, they extracted significant distinguishing features and updated features that retain more information than shallow layers by using residual blocks. They fed these features into learning long-term dependencies, followed by attention mechanisms to improve performance and extract additional high-level selective action-related patterns and cues. The experiments were tested at UCF Sports with recognition rates of 98.3, 99.1, and 80.2%, an improvement of 1–3% over previous methods [7].

Hu et al. proposed an attention-based multi-level co-occurrence graph convolution, which is able to utilize body structure information from the skeleton and enhance multi-level co-occurrence feature learning by integrating a graph convolution network into the LSTM. They designed spatial attention modules for feature enhancement of key joints of the skeleton input. They designed multi-level co-occurrence memory units to automatically model the spatial relationships between joints while capturing co-occurrence features from different joints, people, and frames. Experimental results show that their model significantly outperforms mainstream methods on the interactive subset of the data set [8]. Zhu et al. designed a spatio-temporal dual-attention network, which mainly consists of feature extraction, attention, and fusion modules. Unlike the high-level fully connected layer features mainly used in previous studies, this work extracted the convolutional and fully connected layer features of CNNs to enrich the initial level of video representation. In addition, a temporal attention module and a joint temporal attention module were implemented to enhance the spatiotemporal attention capability. The potential was effectively mined and weighted by principal component analysis and feature fusion. The experimental results show that the method has better recognition performance compared with the existing methods [9]. Naveenkumar and Domnic’s recurrent neural network-based approach focused on the temporal evolution of body joints and ignored the geometric relationships between them. This led to the proposal of 11 quadrilaterals to capture the geometric relationships between joints for action recognition. An end-to-end three-layer bidirectional LSTM network was designed as the base net to learn the robust representation. Two subnets based on the base network were used to extract differentiated spatiotemporal features, the first one using four spatial features and the second one using two temporal features. Experimental results show that their method achieves state-of-the-art performance compared to recent methods [10]. Furnari and Farinella addressed the problem of self-centered action anticipation, i.e., predicting the actions taken by camera wearers and which objects they will interact with. They contributed to rolling unfolding LSTM, a learning architecture for predicting behavior in self-centered videos. The approach models the subtasks of summarizing the past and inferring the future, sequence completion pre-training techniques, and modal attention mechanisms to effectively blend by processing frames. Multimodal prediction was performed by flow fields and object-based features. The method was validated, and experiments showed that the proposed architecture was state-of-the-art in the field of self-centered video; the method also achieved competitive performance relative to methods that are not based on unsupervised pre-training [11].

In the above research, most algorithms have some limitations in the field of motion recognition, such as strong dependence on feature extraction, poor generalization ability, and limited data processing ability. The method adopted in this study can automatically extract effective features from a large number of data using the LSTM network. The ability of the model to handle unbalanced and noisy data is enhanced by GDO. Therefore, the proposed method can effectively fill the gaps in the field of motion recognition.

3 LSTM recurrent neural network-based sports action information recognition processing system

3.1 Construction of LSTM-based human motion recognition information acquisition and processing system

In order to recognize the sports action of the human body, the feedback of the system has to be considered, and the recognition results have to be timely. Then, the research based on the system will be different from the monitoring system of health, fall, etc. The architecture of other systems will only propose three layers, and the processing system in motion recognition is relatively complete [12]. Moreover, the recognition of the human body’s sports action relatively only needs two layers because the data application in the extraction has been handled better, the missing layer that is the application of the data, as shown in Figure 1.

Figure 1

Motion processing architecture.

In Figure 1, in the data collection layer, the placement of motion sensors is designed to collect human motion data, including acceleration and angular velocity. The collected data can go to the data pre-processing, where the initial cleaning is done to get the formal data. The data computation layer is able to create and store the data in a database, which is the core of the system [13]. Finally, at the computation layer, the data are further processed according to the requirements to output meaningful data of the motion type and re-store the labels. Once the model computing layer produces meaningful data, there will be situations where the human body is outside its typical range of motion, which can lead to significant data fluctuations and the appearance of extreme values. The appearance of extreme values is not only lossy to the instrument, but also yields inaccurate results. The data need to be normalized as in Eq. (1):

(1) x ′ = x − x min x max − x min = x − x ¯ σ .

After performing the normalization process as in Eq. (1), a set of data compressed to the range of 0–1 can be obtained, and such an operation can also avoid the appearance of outliers σ . However, to obtain the transformed data, feature capture is required. The feature capture includes both the maximum and minimum values as well as the standard deviation, which is calculated as in Eq. (2):

(2) max = max ( a i ) , i ∈ { 1 , 2 , 3 , … } std = 1 n ∑ i = 1 n ( a i − mean ) 2 min = min ( a i ) , i ∈ { 1 , 2 , 3 , … } .

In Eq. (2), “mean” refers to the mean of the data, and the standard deviation “std” is calculated to measure the dispersion of the data from the mean. The skewness and kurtosis of the data can also be found by using Eq. (2), but the operation of the support vector machine (SVM) can only be done using the Euclidean distance. The core of SVM is to obtain a super-shared plane, which can not only solve the longitude catastrophe that often occurs in traditional algorithms but also avoid the local overdose problem [14]. According to the Lagrangian multiplier, it can be transformed while converting the original to the dual, and the dual Lagrangian function is as follows:

(3) L ( w , b , a ) = 0.5 ∥ w ∥ 2 − ∑ i = 1 n a i ( y i ( w x i + b ) − 1 ) ,

where 0.5 ∥ w ∥ is the distance between the support plane w x i + b and the hyperplane is obtained when w x i + b = 0 . This Lagrangian function can be derived, and only partial derivatives can be found. When the derivative function takes the value 0, the intermediate state function is obtained. The intermediate state function can be substituted into Eq. (3) to obtain Eq. (4):

(4) W ( a ) = ∑ i = 1 n a i − 0.5 ∑ i , j = 1 n a i a j x i x j y i y j s , t , a i ≥ 0 , i = 1 , 2 , 3 , … , n ∑ i = 1 n a i y i = 0 .

However, Eq. (4) is only applicable to linearly divisible cases. In practice, the existence of linearly indivisible cases should be considered. Then, kernel norm minimization (KNM) can be applied. KNM can restore the observation matrix to the original matrix, as shown in Eq. (5):

(5) X ¯ ¯ = 0.5 arg min x ∥ Y − F ∥ F 2 + λ ∥ X ∥ ⁎ ,

where ∥ X ∥ ⁎ is the minimized kernel parametrization, the observation matrix is X , the original matrix of X is Y , and the estimated approximation of X is referred to as Y . The threshold value of X is denoted as λ , and the diagonal elements are solved as in Eq. (6):

(6) S λ ∑ i i = max ∑ i i − λ , 0 ,

where ∑ i i represents the soft threshold function of Y in Eq. (5), which can be applied to the LSTM. The LSTM network structure has output gates, forgetting gates, and cell states. The cell structure can record the time series and iteratively update them by summing. It not only avoids the large impact between data but also preserves the state of the time step [15]. Because of the variety of gates, the gradient does not explode when the weights are updated, as shown in Figure 2.

Figure 2

LSTM flow chart.

From Figure 2, after the input data enter the input gate, the qualified data are outputted after strict screening by the cell to forget the unqualified data. Cell screening is the most important process, the core of which lies in the decoder. The decoder is able to focus on the aggregated historical information based on time by fading and predicting attention [16]. The decoder is based on machine translation, and the end of each decoding is the beginning of the next decoding and the decoding principle is as in Eq. (7):

(7) p y = ∏ t = 1 n p ( y 1 ∣ { y 1 , y 2 , … , y t − 1 } , C ) ,

where the output decoding result is denoted as y 1 ∣ { y 1 , y 2 , … , y t − 1 } , C . From Eq. (7), its form is similar to Eq. (3) to a large extent, and the two can be combined to obtain a more optimal solution.

3.2 Improving LSTM by gradient descent in sports information acquisition system

For LSTM, although it had experienced a low period in the last century, people gradually found that the traditional algorithm needed to go through manual feature extraction, and certain fields could not be involved. With the development of gated recurrent unit (GRU) in computing efficiency, LSTM has been more widely used in image recognition due to its ability to mine time series data. LSTM is like the human nervous system, which consists of many neurons. This research is based on the variant of LSTM, in which the structure of weight processing is added among the structure of each individual neuron, which can avoid the problem of excessive weights in the working process of LSTM so that LSTM can be more fully exploited, as shown in Figure 3.

Figure 3

Improved LSTM process.

From the process in Figure 3, at moment t, the hidden layer accepts the node M_t and the input value X_t, which are input to the weight layer after the calculation of the hidden layer and are processed after merging with the node M_1 and the input value X_1 accepted by the weight layer at moment t ₁. The processed weight information is input to the memory unit, which is able to identify some elements that should be forgotten, some elements that should be output, and the remaining elements that can be recovered by valves. One of the most widespread applications in the memory unit is the Sigmond function, as given in Eq. (8):

(8) σ ( x ) = 1 1 + e − x ,

where σ ( x ) represents the Sigmond function. When the Sigmond function is used as the activation function, the function image is characterized by a small value-added and degraded value. Since the problem studied is one with a high number of classifications, the loss function used is preferably categorical cross-entropy, as in Eq. (9):

(9) L i = − ∑ j t i , j log ( p i , j ) ,

where the actual value is denoted as t , and the predicted values are called p . i , j are used to represent the classification where the actual and predicted data are located, respectively. When using categorical cross-entropy as the loss function, mean squared error (MSE) and cross-entropy (H) are operated as in Eq. (10):

(10) MSE = 1 n ∑ i = 1 n ( y i → − y i ) 2 , H ( p , q ) = ∑ i p ( i ) × log 1 q i ,

where y i → , y i are given as the predicted and actual values of the model, respectively, and their values are always greater than or equal to 0. The model will be more accurate if it is optimized several times by the GDO method to keep it close to 0. GDO can optimize MSE and H by iteration, as in Eq. (11):

(11) θ = θ − α ⋅ ∂ J ( θ ) ∂ θ ,

where the step size is represented using θ and J ( θ ) represents the objective function. MSO and H are able to minimize them and make the weights more appropriate after passing the GDO process. In the module of data pre-processing, sports are different from normal working life because of their elegance and beauty. Sports, because of their elegance and beauty, are always accompanied by great risks, so they are optimized for the temporal attention module [17]. The temporal attention module focuses on the SDPA with the following equation:

(12) Q = X W q , K = X W k , V = X W V ,

where the modeling query vector is represented by Q , and the meaning of K , V is to store the vector. From Eq. (12), three eigenmatrices of the input quantity X can be derived, and these three eigenmatrices are subjected to a scaling operation. Then, the normalization should be performed, and the normalization method is as in Eq. (13):

(13) Y = Attention ( Q , K , V ) = Soft max Q K T d m V , Z = Y W Z + X ,

where Y represents the weighted feature matrix of Y , and the meaning of Z is the feature expression of X , Y obtained by residual concatenation. For the aggregation and expression of historical information, it is divided into two parts, which are query and storage. Relating the present moment to some future moment, the two parts of information are fed into the vector for representation, and the two features will no longer be independent. Since the input sequence tends to lose a part of the information, an attention mechanism is introduced, as in Eq. (14):

(14) PE ( pos , i ) = sin ( pos / 10 , 000 i / dm ) even cos ( pos / 10 , 000 i / dm ) otherwise,

where the feature mapping is denoted as pos , and the position vector generated by pos is d pos . The data set generated using the attention mechanism is able to preserve the temporal order and also to extract complex action data from sports. This allows us to improve the LSTM (GDO-LSTM) based on the attention mechanism, as shown in Figure 4.

Figure 4

Improved sensor diagram.

In Figure 4, the sensor collects an assortment of data sets during its operation, using a random data set, which is fed into a modified LSTM input gate. After going through the LSTM process once, it will go through a specially designed cell structure. The data that have undergone the cell structure are only expanded for their Lagrangian function, and then a hyperbolic tangent (Tanh) process is performed to judge whether the result should be forgotten or not. If yes, they are temporarily forgotten and Lagrangianized once more; if not, they are transported to the output gate and output as a qualified result.

4 Model creation based on the GDO-LSTM algorithm in sports action recognition

4.1 GDO-LSTM sports monitoring system development and model parameter determination

This study selected a data set from Hainan University and selected 1,000 data items as the training set to train the model. The performance of the model was judged by the indicators of accuracy and error rate. In the recognition of sports movements, a three-axis magnetometer was used to calibrate sports movements. In the experiment, the specific hardware and parameters are shown in Table 1.

Table 1

Experimental parameters

Graphics processing unit	Internal storage	Operating system	Internal storage
NVIDIA Tesla M60	2T*2	128Ubnutu 21.02.20	512 G
Flash memory	Operator	Database	Operating system
CUDA Too kit 23.0	Python 6	Mysql 5.20.2023	Ubuntu 88.64
Web development framework	Language	Display card	CPU
Django 1.22.3	Easy Chinese	6.0 GHz	Intel penta-core

The collected data set required further processing to enable the studied algorithm to gain learning. For the processing of the data set, GDO-LSTM was used for iterative optimization. To verify its accuracy, a comparison with the traditional particle swarm optimization algorithm (PSO) was performed, and two images were obtained: the accuracy-training set image and the error rate-training set image, as shown in Figure 5 [18].

Figure 5

Comparison of (a) accuracy-training set image and (b) error rate-training set image.

From Figure 5, with the increase in the number of iterations, the accuracy of GDO-LSTM was lower than that of PSO until 200 iterations, and the error rate was higher than that of PSO. However, after the number of iterations reached 200 or more, the accuracy of GDO-LSTM was able to reach 1.1–1.5 times that of PSO. Although increasing the number of iterations increased the cost, the role of accuracy was crucial considering the scientific rigor. After the learning of the GDO-LSTM algorithm was completed, it had to be considered for the wearing of the motion sensor during the test. The motion sensor would measure three sets of data during the test: the acceleration of the object in the three axes of motion. The algorithm was applied to build a three-axis magnetometer and calibrate it using sit-to-stand and lie-flat, and the calibration diagram is shown in Figure 6.

Figure 6

Calibration of the three-axis magnetometer. (a) Sitting correction of three-axis magnetometer and (b) flat-lying correction of triaxial magnetometer.

From Figure 6, after the change from sitting to lying position, the acceleration of the X- and Y-axis would decrease due to the change in position, and the acceleration of the Z-axis would increase instead of decrease because of the change in height. After zeroing, the three-axis magnetometer had to be tested for sensitivity, and it was clear that there was a significant difference between it at rest and during weight lifting, as shown in Figure 7 [19].

Figure 7

Three-axis acceleration of static weight lifting. (a) Static acceleration–time and (b) weight lifting acceleration–time.

From Figure 7, the X-, Z-, and Y-axis acceleration were from top to bottom when Figure 7(b) was stationary. The Z-axis acceleration was very close to 0 because no change in height occurred. The Y-axis acceleration hardly changed during the process from rest to the highest point of the lift, which indicated that the tester did not move forward or backward. However, the X- and Z-axis accelerations changed considerably, with the mean value of the Z-axis above the X-axis, due to the fact that the weight lifting changed more for the up and down accelerations than for the left and right. By the moment of 97 s, when the weight lifting reached its highest point, the Z-axis acceleration reached its maximum, so defining such curves attributed to the weight lifting. The data processed by the oblivion gate could be loaded into the three-axis magnetometer to generate a baseline image for reference, as shown in Figure 8.

Figure 8

The base image of the three-axis magnetometer.

4.2 Experimental validation of the LSTM-GDO algorithm for sports action recognition

The calibration and sensitivity testing of a three-axis gyroscope, which works by measuring angular velocity and thus estimating the state of motion, is similar to that of a three-axis magnetometer. To be able to have objective sensor test results for sensor accuracy comparison, this study also explored the comparison of the accuracy of the three-axis magnetometer and gyroscope tests. It was first evaluated for simple motion, as shown in Figure 9.

Figure 9

Comparison between the three-axis magnetometer and gyroscope in skipping rope.

From Figure 9, for the three-axis magnetometer, the changes in the X- and Y-axis were very regular because the jump rope was repeating the same action. The changes in the Z-axis were not obvious because the jumping height was different, which was in line with the jump rope characteristics of the human body. For the three-axis magnetometer, because the angular velocity did not change much when jumping rope, it was reflected in the image that the measurement data did not change obviously, which led to the recognition effect as the action was not very good. To study the motion recognition of three complex motions for the sake of extensive experiments, in Figure 10, the curve was selected with the most obvious change when using the three-axis magnetometer and gyroscope test each time to analyze the accuracy of the two test instruments.

Figure 10

Extensive accuracy test of the three-axis magnetometer and gyroscope.

From the extensive tests of the accuracy of the three-axis magnetometer and gyroscope in Figure 10, in the skating motion recognition, both instruments performed very well. In contrast, among the recognition of swimming and shot put, the test accuracy of the three-axis magnetometer was obviously inferior to that of the three-axis gyroscope because the angular velocity changes in these two movements were not as obvious as the acceleration. To be able to analyze intuitively, the matrix shown in Figure 11 was established to be able to analyze more clearly the effect of the three-axis magnetometer and gyroscope in practical applications.

Figure 11

Three-axis magnetometer and gyroscope test matrix. (a) Extensive test matrix of three-axis dynamometer and (b) extensive test matrix of three-axis gyroscope.

The matrix in Figure 11 shows the recognition ability of the three-axis magnetometer and gyroscope. The test effect of the three-axis gyroscope was not very good, and there was a lot of confusion in the identification of swimming and skating. The three-axis magnetometer was accurate in the identification of four common sports most times, but there was some confusion between rope skipping and swimming, as well as shot put and skating, but it was enough to meet the equipment accuracy requirements.

5 Conclusion

To identify sports actions quickly and accurately to cope with the increasing sports action data, this study generated a fusion algorithm GDO-LSTM based on GDO and LSTM. MSE and H of the data were processed using GDO, and Lagrange multipliers and KNM were applied to process linearly separable and non-separable data, respectively. Weight processing was added to the LSTM to prevent weight explosion. By comparing with PSO iterations, it was concluded that 200 iterations of the GDO-LSTM were needed. Zeroing and sensitivity tests were required before using the three-axis magnetometer and gyroscope for detection. In the sensitivity test of the three-axis magnetometer, the Z-axis acceleration was 0 for sitting and standing, the X-axis acceleration was 0.25, and the Y-axis acceleration fluctuated from −0.25 to −0.75 because of the presence of gravity. Weight lifting was selected as the sensitivity detection, and the X-axis acceleration reached a maximum value of 0.47 at the moment of 97 s, and the Z-axis acceleration reached a minimum value of −0.25 at the moment of 45 s. There was a probability of change in time or value, but the image characteristics were unique. A three-axis gyroscope was selected in the experiment to test the recognition accuracy of four common sports actions and compared it with a three-axis magnetometer. The experiments show that the three-axis magnetometer has the worst recognition effect for ice skating action: it was recognized accurately only 17 times, was recognized as swimming 27 times, was recognized as throwing the lead ball 24 times, and was recognized as jumping rope 4 times. The best recognition result was achieved for lead throwing, and all 85 recognition results were accurate. Because of the change in the angular velocity associated with the lead ball drop, the three-axis dynamometer showed slight deviation, with five times being identified as ice skating. For the three-axis gyroscope, the sports action similar to jumping rope was recognized as skating ten times, but the accuracy rate was still very high. However, the recognition of swimming was not good: it was recognized accurately only 38 times, was recognized as skipping rope 20 times, and was recognized as throwing the lead ball 31 times. The recognition accuracies of the three-axis dynamometer for the four sports actions of jumping rope, swimming, skating, and throwing the shot put were 98.7, 100, 100, and 94.1%, respectively, while the accuracies of the three-axis gyroscope were 80.5, 40.4, 23.6, and 100%, respectively. The above results show that the GDO-LSTM can effectively identify multiple human movements while maintaining a high level of accuracy. The algorithm can deal with data fluctuation more robustly and reduce the accuracy decline caused by environmental factors. It has great potential in practical application scenarios such as intelligent fitness and sports training monitoring. The proposed method can provide users with timely analysis of sports performance, effectively help users adjust sports strategies, improve training results, and play an important role in the field of motor action recognition. Although there are some deviations in the action recognition of jumping rope and throwing shotput, it can already meet the user requirements. The reason for the lack of research on sports movement data is that sports movement data are private and not suitable for widespread dissemination. With more volunteers, it is believed that future studies can be improved.

Funding information: The authors state no funding involved.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and consented to its submission to the journal, reviewed all the results and approved the final version of the manuscript. P.C.: wrote the original draft, participated in the literature search and analyses, evaluations and manuscript preparation, and wrote the paper. J.P.: conceived and designed the manuscript, interpreted the data, and participated in project administration, including resources, software, validation, visualization, conceptualization, investigation, and methodology.
Conflict of interest: The authors state no conflict of interest.
Data availability statement: Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

[1] Alaoui B, Bari D, Ghabbar Y. Surface weather parameters forecasting using analog ensemble method over the main airports of Morocco. J Meteorol Res. 2022;36(6):866–81.10.1007/s13351-022-2019-0Suche in Google Scholar

[2] Yang J, Yagiz S, Liu YJ, Laouafa F. Comprehensive evaluation of machine learning algorithms applied to TBM performance prediction. Undergr Space. 2022;7(1):37–49.10.1016/j.undsp.2021.04.003Suche in Google Scholar

[3] Hasanpour R, Rostami J, Schmitt J, Ozcelik Y, Sohrabian B. Prediction of TBM jamming risk in squeezing grounds using Bayesian and artificial neural networks. J Rock Mech Geotech Eng. 2020;12:21–31.10.1016/j.jrmge.2019.04.006Suche in Google Scholar

[4] Liu L, Zhou W, Gutierrez M. Effectiveness of predicting tunneling-induced ground settlements using machine learning methods with small datasets. J Rock Mech Geotech Eng. 2022;14(4):1028–41.10.1016/j.jrmge.2021.08.018Suche in Google Scholar

[5] Xu H, Yan R. Research on sports action recognition system based on cluster regression and improved ISA deep network. J Intell Fuzzy Syst: Appl Eng Technol. 2020;39(4Pta2):5871–81.10.3233/JIFS-189062Suche in Google Scholar

[6] Sarabu A, Santra AK. Human action recognition in videos using convolution long short-term memory network with spatio-temporal networks. Emerg Sci J. 2021;5(1):25–33.10.28991/esj-2021-01254Suche in Google Scholar

[7] Muhammad K, Ullah A, Imran AS, Sajjad M, Kiran MS, Sannino G, et al. Human action recognition using attention-based LSTM network with dilated CNN features. Future Gener Comput Syst. 2021;125:820–30.10.1016/j.future.2021.06.045Suche in Google Scholar

[8] Xu S, Rao H, Peng H, Jiang X, Hu B. Attention based multi-level co-occurrence graph convolutional LSTM for 3D action recognition. IEEE Internet Things J. 2020;8(21):15990–6001.10.1109/JIOT.2020.3042986Suche in Google Scholar

[9] Zhang Z, Lv Z, Gan C, Zhu Q. Human action recognition using convolutional LSTM and fully-connected LSTM with different attentions. Neurocomputing. 2020;410:304–16.10.1016/j.neucom.2020.06.032Suche in Google Scholar

[10] Naveenkumar M, Domnic S. Learning representations from quadrilateral based geometric features for skeleton-based action recognition using LSTM networks. Intell Decis Technol. 2020;14(1):47–54.10.3233/IDT-190078Suche in Google Scholar

[11] Furnari A, Farinella G. Rolling-unrolling LSTMs for action anticipation from first-person video. IEEE Trans Pattern Anal Mach Intell. 2020;43(11):4021–36.10.1109/TPAMI.2020.2992889Suche in Google Scholar PubMed

[12] Yu M, Ning C, Xue Y. Brain medical image fusion scheme based on shuffled frog eaping algorithm and adaptive pulse oupled neural network. Image Process. 2020;6(15):1203–9.10.1049/ipr2.12092Suche in Google Scholar

[13] Li J, Zhang W, Diao W, Feng Y, Sun X, Fu K. CSF-Net: Color spectrum fusion network for semantic labeling of airborne laser scanning point cloud. IEEE J Sel Top Appl Earth Obs Remote Sens. 2022;15:339–52.10.1109/JSTARS.2021.3133602Suche in Google Scholar

[14] Feng T, Wang C, Zhang J, Wang B, Jin YF. An improved artificial bee colony-random forest (IABC-RF) model for predicting the tunnel deformation due to an adjacent foundation pit excavation. Undergr Space. 2022;7(4):514–27.10.1016/j.undsp.2021.11.004Suche in Google Scholar

[15] Singh S, Gupta D. Multistage multimodal medical image fusion model using feature‐adaptive pulse coupled neural network. Int J Imaging Syst Technol. 2020;31(2):981–1001.10.1002/ima.22507Suche in Google Scholar

[16] Wang L, Zhang J, Liu Y, Mi J, Zhang J. Multimodal medical image fusion based on Gabor representation combination of multi-CNN and fuzzy neural network. IEEE Access. 2021;9:67634–47.10.1109/ACCESS.2021.3075953Suche in Google Scholar

[17] Polinati S, Bavirisetti DP, Rajesh KN, Dhuli R. Multimodal medical image fusion based on content-based and PCA-sigmoid. Curr Med Imaging. 2022;18(5):546–62.10.2174/1573405617666211004114726Suche in Google Scholar PubMed

[18] Hou S, Liu Y, Yang Q. Real-time prediction of rock mass classification based on TBM operation big data and stacking technique of ensemble learning. J Rock Mech Geotech Eng. 2022;14(1):123–43.10.1016/j.jrmge.2021.05.004Suche in Google Scholar

[19] Wang D, Zhao H, Li Q. Medical brain image classification based on multi-feature fusion of convolutional neural network. J Intell Fuzzy Syst. 2020;38(1):127–37.10.3233/IFS-179387Suche in Google Scholar

Received: 2024-08-02

Revised: 2024-10-10

Accepted: 2024-10-16

Published Online: 2025-02-25

This work is licensed under the Creative Commons Attribution 4.0 International License.

Artikel in diesem Heft

https://doi.org/10.1515/nleng-2024-0050

Schlagwörter für diesen Artikel

temporal attention; gradient descent method; motion sensor; three-axis magnetometer; nuclear parametric minimization

Creative Commons

BY 4.0