Home Study on recognition and classification of English accents using deep learning algorithms
Article Open Access

Study on recognition and classification of English accents using deep learning algorithms

  • Wenjuan Ke EMAIL logo
Published/Copyright: December 31, 2023
Become an author with De Gruyter Brill

Abstract

The recognition and classification of English accents have high practical value in areas such as security management and information retrieval. This study introduced two English accent features, filter bank (FBank) and Mel-frequency cepstral coefficient (MFCC), based on deep learning techniques. It then combined convolutional neural network (CNN), gated recurrent unit, and an attention mechanism to design a 1D CNN-BiGRU-Attention model for English accent recognition and classification. Experimental tests were conducted on the VoxForge dataset. The results showed that compared to MFCC, FBank performed better in English accent recognition and classification, and 70FBank achieved the highest F1 value. Among the recurrent neural network, long short-term memory, and other models, the BiGRU model had the best performance. The average F1 value of the 1D CNN-BiGRU-attention model was the highest, reaching 85.52%, and all the F1 values were above 80% for different accents, indicating that the addition of the attention mechanism effectively improved the model’s recognition and classification effectiveness. The results prove the reliability of the method proposed in this article for English accent recognition and classification, making it suitable for practical application and promotion.

AMS Mathematics Subject Classification number: 68T07

1 Introduction

Speech recognition technology can enhance human–computer interaction and has significant applications in various fields such as smart homes and medical rehabilitation [1]. In actual speech, the presence of accents can significantly impact recognition performance [2]. Accents refer to different pronunciations of words [3]. Within the same language, different accents can lead to variations in tone, duration, and other aspects [4]. Taking English accents as an example, American English tends to have more rolled sounds and a flat intonation, while British English avoids rolling the tongue and has great pitch variation. Indian English is spoken at a fast pace, and French English often substitutes “z” for “th.” Furthermore, even within the same country, different regions have their own distinct accents; however, all of these variations still fall under the umbrella of the English language. By recognizing and classifying different English accents, it is possible to provide corresponding services for customers in various self-service industries such as tourism and restaurants. This can also assist in determining the origin of individuals in areas of immigration control and crime investigation for security purposes. Deep learning algorithms are capable of automatically learning features from raw data, and they adopt an end-to-end learning approach, making model construction relatively simple. They possess excellent transfer learning capabilities and demonstrate outstanding performance in handling large-scale complex data, particularly in the field of computer vision, such as object detection and facial recognition [5]. These algorithms find wide applications in areas like autonomous driving, video surveillance, virtual reality, etc. [6]. In addition, deep learning has been successfully applied in the field of natural language processing, such as machine translation and sentiment analysis [7]. In the medical field, deep learning-based pathology image analysis and diagnosis contribute to improving diagnostic accuracy [8]. In speech recognition, deep learning is widely used in applications like voice assistants and speech-to-text conversion [9]. Various deep learning methods have been extensively researched. Zhang et al. [10] designed two models, multi-layer cellular neural network-connectionist temporal classification (MCNN-CTC) and SENet (SE)-MCNN-CTC, for Chinese speech recognition. Through experiments, they found that the relative error rate of SE-MCNN-CTC decreased by 13.51%, resulting in a final error rate decrease of 22.21%, indicating higher generalization performance. The study conducted by Manohar and Logashanmugam [11] focused on speech emotion recognition and proposed a hybrid deep learning approach, which was found to outperform other models in terms of performance. Kumar et al. [12] investigated the technology of converting speech into text for hearing-impaired students and utilized a deep learning-based model to extract features from audio and video, achieving a word error rate of 6.59%. The study conducted by Seki et al. [13] proposed a deep neural network that combines filter banks (fBanks). The experimental results showed a 5.8% reduction in word errors among ten utterances. This work designed a deep learning algorithm for English accent recognition and classification, and its reliability was demonstrated through experiments on different accent datasets. This work provides a new and reliable method for practical applications in areas such as public safety and language learning involving English accent recognition and classification. This study has taken into account the diversity of English accents, and the designed model demonstrates good performance in recognizing and classifying different accents. This study provides a new approach to address the issues of accent diversity and adaptability, which contributes to promoting wider and more effective applications of speech recognition classification technology.

2 Extraction of English accent features

As an acoustic signal, the English accent needs to undergo preprocessing and feature extraction before it can be recognized and classified using deep learning algorithms. First, the energy of the high-frequency part of the signal is enhanced through pre-emphasis, which is achieved by applying a first-order digital filter. The corresponding equation is:

(1) H ( z ) = 1 a z 1 ,

where a stands for the pre-emphasis coefficient.

Then, by utilizing the short-term stationary characteristics of the signal, it is divided into multiple speech frames, each typically having a length of 10–30 ms. The overlapping portion between adjacent frames, known as frame shift, is usually around half to one-third of the frame length. After framing, window functions are applied to maintain smoothness at both ends of each speech frame. Commonly used window functions are listed in Table 1.

Table 1

Commonly used window functions

Name Equation Feature
Rectangular window w ( n ) = 1,0 n L 1 0 , else High sidelobes and great spectral leakage
Hanning window w ( n ) = 0.5 0.5 cos [ 2 π n / ( L 1 ) ] , 0 n L 1 0 , else Good suppression of spectral leakage, but low resolution
Hamming window w ( n ) = 0.54 0.46 cos [ 2 π n / ( L 1 ) ] , 0 n L 1 0 , else Slower sidelobe attenuation than Hanning window, better low-pass characteristics

According to Table 1, this study utilizes a Hamming window with a frame length of 25 ms and a frame shift of 10 ms in the processing of English accent signals. Finally, to preserve the effective segment of the signal, endpoint detection is performed using the commonly employed dual threshold method [14]. Let the signal obtained after framing and windowing be x i ( m ) . The steps of endpoint detection are as follows.

  1. The energy per frame is calculated E i = m = 1 n x i 2 ( m ) .

  2. A small value called σ is set for center clipping processing x i ( m ) = x i ( m ) , | x i ( m ) | σ 0 , | x i ( m ) | < σ

  3. The zero crossing rate per frame is calculated ZCR = 1 2 m = 1 N | sign [ x i ( m ) ] sign [ x i ( m 1 ) ] | , sign [ x i ( m ) ] = 1 , | x i ( m ) | 0 1 , | x i ( m ) | < 0 .

  4. The starting and ending points of the dual threshold triggering are determined. The continuous signal between them are considered as valid signal.

After the preprocessing of English accent signals is completed, feature extraction can be performed. Commonly used features include linear predictive cepstral coefficients [15], Mel-frequency cepstral coefficients (MFCC) [16], and so on. Among them, MFCC and FBank are closer to how the human ear processes audio; therefore, this study mainly focuses on studying these two features. The extraction process is shown in Figure 1.

Figure 1 
               FBank and MFCC feature extraction.
Figure 1

FBank and MFCC feature extraction.

According to Figure 1, after pre-emphasis and other preprocessing of the English oral signal, fast Fourier transform is performed on the obtained signal to complete time-frequency conversion.

(2) X t ( k ) = n = 0 K 1 x t ( n ) exp j 2 π nk K , k = 0 , 1 , , K 1 ,

where x t ( n ) stands for the x t ( n ) -th sampling point of the t -th frame of the signal.

Then, the power spectrum of the t -th frame signal is calculated as follows:

(3) P t ( k ) = | X t ( k ) | 2 / K , k = 1 , 2 , , K / 2 + 1 .

Actual frequency f is mapped to Mel frequency f mel as follows :

(4) f mel = 2 , 595 × lg ( 1 + f / 700 ) .

Mel filters are constructed. The frequency response obtained by the m -th filter is as follows:

(5) H m ( δ ) = 0 , δ < f L ( m ) , δ f C ( m ) f C ( m ) f L ( m ) , f L ( m ) < δ f C ( m ) , f C ( m ) δ f H ( m ) f C ( m ) , f C ( m ) < δ f H ( m ) , 0 , δ > f H ( m ) ,

where f C ( m ) refers to the central frequency of the filter, and f L ( m ) and f H ( m ) are upper and lower cutoff frequencies, respectively.

The signal is passed through a group of Mel filters, and the logarithmic energy output of each filter can be written as follows:

(6) S t ( m ) = ln K = 0 N 1 | X t ( k ) | 2 H m ( δ ) ,

where S t ( m ) is the m -th dimension feature of FBank extracted from the t -th frame signal.

Discrete cosine transform (DCT) is performed on S t ( m ) to obtain

(7) C t ( n ) = n = 1 N S t ( m ) cos π n M ( m 0.5 ) , n = 1 , 2 , , L ,

where C t ( n ) is the MFCC feature, L is the number of orders, and M is the number of filters.

3 Recognition and classification using deep learning algorithms

Convolutional neural networks (CNNs) are commonly used algorithms in deep learning [17], with wide applications in various areas such as image recognition [18]. They perform excellently at extracting local features from data, which is why this study chose CNN to extract local features from English accent signals. In actual operations, a window with a size of k slides over an input feature map to complete the convolution. It is assumed that the input feature map of the l 1 -th layer is x l 1 . The output is obtained by multiplying x l 1 with the convolutional kernel k l of the l-th layer, summing them up, adding the bias term b l of the current layer, and applying an activation function. For the j -th convolutional kernel, the convolution operation can be written as follows:

(8) x j l = f i M j x i l 1 k ij l + b j l ,

where M j refers to the height of the j -th convolutional kernel.

Pooling operations aim to reduce the complexity of calculation to accelerate network convergence. First, the j -th feature map in the l 1 -th layer, x j l 1 , is downsampled. Then, it is multiplied with β j l , the j -th parameter of the l -th layer. Then, the output and the bias b j l of the current layer are added together. After the operation using the activation function, x j l is output. The pooling operation is written as follows:

(9) x j l = f [ β j l downsample ( x j l 1 ) + b j l ] ,

where downsample refers to to downsampling operation.

However, CNN has poor performance in handling sequential data. To further improve the effectiveness of English accent recognition classification, CNN is combined with a recurrent neural network (RNN). Within the RNN, the gated recurrent unit (GRU) is an improved algorithm [19] that effectively alleviates the gradient vanishing problem of RNN while also possessing advantages such as a simple structure and high accuracy. GRU has two units: update gate z t and reset gate r t . It is assumed that the current input is r t . GRU determines how much hidden layer information h t 1 can be forgotten at the late moment through r t , which can be written as follows:

(10) r t = σ ( W r · [ h t 1 , x t ] ) .

Then, z t is used to determine how much h t 1 and preparatory hidden layer information h t can be reserved , which can be written as follows:

(11) z t = σ ( W z · [ h t 1 , x t ] ) ,

(12) h t = tanh ( W h · [ h t 1 r t , x t ] ) .

Finally, the output h t of GRU is composed of two parts, which can be written as follows:

(13) h t = ( 1 z t ) h t + z t h t 1 ,

where W is a weight matrix and σ is an activation function.

In order to capture information from both past and future moments, this article adopts BiGRU, and the calculation formula can be written as follows:

(14) h t = GRU ( h t 1 , x t ) ,

(15) h t = GRU ( h t 1 , x t ) ,

(16) h t = W t h t + V t h t ,

where h t represents the output of forward GRU, h t represents the output of reverse GRU, and W t and V t are weight matrices.

In the recognition and classification of English accent, special attention needs to be paid to the unique pronunciation styles of different accents. The attention mechanism [20] can adjust weight coefficients to modify the importance of features. Therefore, this study combines the attention mechanism with CNN-BiGRU. Local features are extracted from the extracted English speech signal features mentioned earlier using a two-layer 1D CNN. By adopting a two-layer structure, the features extracted by the first convolutional layer can be passed on to the second layer, enabling further learning of more complex and abstract features, thus achieving deep-level feature learning. BiGRU is used for extracting global features. An attention module is then added to adjust feature weights. In order to convert the model’s output into probabilities for each category, a softmax function is used in the last layer for classification and recognition of different English accents. The designed model structure is shown in Figure 2.

Figure 2 
               1D CNN-BiGRU-attention model.
Figure 2

1D CNN-BiGRU-attention model.

As shown in Figure 2, the output sequence of the BiGRU is denoted as s t . The target attention weight can be written as follows:

(17) a t = tanh ( s t ) .

The class probability vector generated by softmax is:

(18) p t = [ exp ( a t ) ] / t = 1 m exp ( a t ) .

Finally, the calculation formula of the probability distribution of different classes is as follows:

(19) P = softmax ( w a v + b a ) ,

where w a and b a are the weight and bias, and v is the weighting vector of a t , v = t = 1 m a t p t .

4 Results and analysis

4.1 Experimental setting

The experimental dataset was VoxForge [21], which includes processed data of various English accents. From this dataset, six accents with a relatively large amount of data were selected, as listed in Table 2.

Table 2

Experimental dataset

Accent Number
American English (AM) 8,306
British English (BR) 8,134
European English (EU) 7,842
Canadian English (CA) 3,405
Indian English (IN) 2,407
Australian English (AU) 2,323

80% of the data was used for training, while the remaining 20% was used for testing. The model was built using the Keras framework and programmed in Python 3.6 on an Ubuntu operating system. The learning rate for the 1D CNN-BiGRU-Attention model was set to 0.001, the batch size was set to 64, and the number of training epochs was set to 120. The Adam optimizer was used. Evaluation of the model’s recognition and classification performance was based on a confusion matrix (Table 3).

  1. Precision: P = TP / ( TP + FP )

  2. Recall rate: R = TP / ( TP + FN )

  3. F1 value: F1 = 2 PR / ( P + R )

Table 3

Confusion matrix

Recognition and classification result
Real class Positive case Negative case
Positive case TP FN
Negative case FP TN

4.2 Result analysis

The commonly used MFCC feature in the extraction of English accent features was 39-dimensional, consisting of 12 dimensions for output, 1 dimension for logarithmic energy, and first-order and second-order differences. The default FBank group for FBank was set to 23. First, the differences between using FBank features and MFCC features in a 1D CNN-BiGRU-Attention model were compared. The FBank groups for FBank were adjusted to 23, 45, 70, 95, and 115. The recognition classification results are presented in Figure 3.

Figure 3 
                  Influence of English accent features on model identification and classification results.
Figure 3

Influence of English accent features on model identification and classification results.

From Figure 3, it can be observed that when using the MFCC as the feature, the F1 value of the model in English accent recognition and classification was 78.91%. However, there was a significant improvement in F1 value when using the FBank as the feature. When the number of FBanks was set to 23, the F1 value of the FBank reached 81.26%, representing an increase of 2.35% compared to MFCC. Compared to the FBank, the MFCC lost some correlation details after DCT processing, resulting in inferior performance in recognition classification. As the number of FBanks increased on the FBank features, the model’s F1 value initially increased and then decreased. Comparatively speaking, the highest F1 value was achieved with 70FBank, which was 6.61% higher than the MFCC. This may be because the continuous increase in FBanks can lead to an increase in signal noise, making the training process more complex and resulting in a poorer recognition and classification effect. Based on this, 70FBank was used as the feature in subsequent experiments.

In the case of fixing the CNN layer and attention layer, the influence of different RNNs on the model performance was compared, as shown in Table 4.

Table 4

Influence of different RNNs on model performance

Precision (%) Recall rate (%) F1 value (%)
1D CNN-RNN-Attention 71.26 68.77 69.99
1D CNN-LSTM-Attention 75.44 73.11 74.26
1D CNN-BiLSTM-Attention 78.21 74.25 76.18
1D CNN-GRU-Attention 84.33 81.26 82.77
1D CNN-BiGRU-Attention 86.72 84.36 85.52

The different choices of RNN had a significant impact on the recognition and classification results of the model, as observed from Table 4. When using the RNN, the F1 score for English accent recognition was only 69.99%, which was the lowest. After replacing RNN with the long short-term memory (LSTM), the F1 value increased to 74.26%, showing an improvement of 4.27% compared to the RNN. Further replacing the RNN with the GRU led to an even higher F1 score of 82.77%, indicating an improvement of 12.78% compared to the RNN and an additional improvement of 8.51% compared to the LSTM. These results indicated that the GRU performed the best. Comparing unidirectional and bidirectional RNNs, the BiLSTM had an F1 value of 76.18%, which was a 1.92% improvement over the LSTM, while the BiGRU had an F1 value of 85.52%, which was a 2.75% improvement over GRU. This suggested that models using bidirectional networks could learn more features and improve its performance in English accent identification and classification.

The influence of the attention mechanism on the model performance was compared, and the results are shown in Table 5.

Table 5

Influence of the attention mechanism on the model performance

Precision (%) Recall rate (%) F1 value (%)
1D CNN-BiLSTM 76.01 74.31 75.15
1D CNN-BiLSTM-Attention 78.21 74.25 76.18
1D CNN-BiGRU 85.22 82.41 83.79
1D CNN-BiGRU-Attention 86.72 84.36 85.52

The presence or absence of the attention mechanism also had a certain impact on the identification and classification results of the model, as observed from Table 5. Without adding the attention mechanism, the F1 value of the 1D CNN-BiLSTM approach was 75.15%, which decreased by 1.03% compared to the 1D CNN-BiLSTM-Attention approach. Similarly, the F1 value of the 1D CNN-BiGRU approach was 83.79%, which decreased by 1.73% compared to the 1D CNN-BiGRU-Attention approach. These results indicated that the ability to classify weights using the attention mechanism could further enhance the performance of the model in English accent recognition, thereby demonstrating the effectiveness of the method proposed in this work.

Finally, the F1 value of different approaches for the identification and classification of different English accents was compared, and the results are presented in Table 6.

Table 6

F1 value of different approaches for the identification and classification of different English accents (unit: %)

AM BR EU CA IN AU
1D CNN-LSTM-Attention 77.33 78.46 76.59 60.12 80.76 72.30
1D CNN-BiLSTM-Attention 81.26 83.54 70.12 65.42 81.26 75.48
1D CNN-GRU-Attention 75.32 91.27 87.26 75.61 81.22 85.94
1D CNN-BiGRU-Attention 92.36 92.16 80.33 80.12 87.33 80.82

From Table 6, it can be seen that first, different methods achieved good recognition and classification results on Indian accent (IN), with F1 values above 80%, which may be because IN was significantly different from the other accents and had obvious features. Comparing the different approaches, the 1D CNN-LSTM-Attention approach achieved the highest F1 value of 80.76% on IN but performed the worst in recognizing Canadian English (CA) with a lowest F1 value of only 60.12%. On the other hand, the 1D CNN-BiLSTM-Attention approach showed the best recognition and classification performance on BR but performed poorly in recognizing Australian English (AU), which may be because the small sample size of AU. As for British English (BR), the 1D CNN-GRU-Attention approach reached an F1 value of 91.27%; for American English (AM) and BR, their F1 values were above 90%; however, it had the poorest performance in recognizing CA with an F1 value of only 80.12%. Different methods showed different results in accent recognition and classification tasks, indicating that these methods had varying effects on feature extraction and classification. However, overall, the 1D CNN-BiGRU-Attention approach had an F1 value of over 80% for each type of accent, with the highest average F1 value reaching 85.52%, which demonstrated the reliability of this method.

5 Conclusion

This article proposed a 1D CNN-BiGRU-Attention model based on deep learning for recognizing and classifying different English accents. The experiments showed that FBank, as a feature, exhibited better recognition and classification performance compared to MFCC. In comparison with other methods, the designed 1D CNN-BiGRU-Attention model demonstrated excellent performance in recognizing and classifying different accents, achieving a superior average F1 score of 85.52%. Therefore, the proposed method is feasible and can be applied in practice. However, this study also has some limitations. For instance, in terms of features, only the performance of MFCC and FBank were compared. Additionally, the selection of models did not consider deeper and more complex CNN structures. Therefore, future work needs to delve deeper into feature research to explore the performance of different features in English accent recognition and classification. Furthermore, further research is needed on model construction to analyze the effectiveness of more complex deep learning methods.

  1. Funding information: The author states no funding involved.

  2. Author contributions: Conceptualization – Wenjuan Ke; methodology – Wenjuan Ke; formal analysis – Wenjuan Ke; investigation – Wenjuan Ke; resources – Wenjuan Ke; writing-original draft preparation – Wenjuan Ke; writing-review and editing – Wenjuan Ke; visualization – Wenjuan Ke; The author has read and agreed to the published version of the manuscript.

  3. Conflict of interest: No conflict of interest was declared by the author.

  4. Data availability statement: The datasets analysed during the current study are available from the corresponding author on reasonable request.

References

[1] Jat DS, Limbo A, Singh C. Speech-based automation system for the patient in orthopedic trauma ward – ScienceDirect. Smart Biosens Med Care. 2020;201–14.10.1016/B978-0-12-820781-9.00011-5Search in Google Scholar

[2] Berjon P, Nag A, Dev S. Analysis of French phonetic idiosyncrasies for accent recognition. Soft Comput Lett. 2021;3:1–7.10.1016/j.socl.2021.100018Search in Google Scholar

[3] Lazaro JB, Po MCP, Ramones LM, Tolidanes PML. Real-time speech recognition engine for accent correction using hidden markov model. AIP Conference Proceedings, (Bandung, Indonesia); 2018, July 27–28. p. 1–6.10.1063/1.5080882Search in Google Scholar

[4] Barkana BD, Patel A. Analysis of vowel production in Mandarin/Hindi/American-accented English for accent recognition systems. Appl Acoust. 2020;162:1–13.10.1016/j.apacoust.2019.107203Search in Google Scholar

[5] Xiao, B, Kang SC. Development of an image data set of construction machines for deep learning object detection. J Comput Civ Eng. 2021;35:1–18.10.1061/(ASCE)CP.1943-5487.0000945Search in Google Scholar

[6] Pang K. A decision-making method for self-driving based on deep reinforcement learning. J Phys: Conf Ser. 2020;1576:1–8.10.1088/1742-6596/1576/1/012025Search in Google Scholar

[7] Nahar KMO, Almomani A, Shatnawi N, Alauthman M. A robust model for translating arabic sign language into spoken arabic using deep learning. Intell Autom Soft Comput. 2023;37:2037–57.10.32604/iasc.2023.038235Search in Google Scholar

[8] Jiang, YQ, Xiong, JH, Li, HY, Yang XH, Yu WT, Gao M, et al. Using smartphone and deep learning technology to help diagnose skin cancer. Br J Dermatol. 2020;182:e95.10.1111/bjd.18826Search in Google Scholar

[9] Khanam F, Munmun, FA, Ritu NA, Saha AK, Mridha MF. Text to speech synthesis: a systematic review, deep learning based architecture and future research direction. J Adv Inf Technol. 2022;13:398–412.10.12720/jait.13.5.398-412Search in Google Scholar

[10] Zhang W, Zhai M, Huang Z, Li W, Cao Y. Towards end-to-end speech recognition for Chinese mandarin using SE-MCNN-CTC. J Appl Acoust. 2020;39:223–30.Search in Google Scholar

[11] Manohar K, Logashanmugam E. Hybrid deep learning with optimal feature selection for speech emotion recognition using improved meta-heuristic algorithm. Knowl Syst. 2022;246:1–22.10.1016/j.knosys.2022.108659Search in Google Scholar

[12] Kumar LA, Renuka DK, Rose SL, Shunmuga Priya MC, Wartana IM. Deep learning based assistive technology on audio visual speech recognition for hearing impaired. Int J Cognit Comput Eng. 2022;3:24–30.10.1016/j.ijcce.2022.01.003Search in Google Scholar

[13] Seki H, Yamamoto K, Akiba T, Nakagawa S. Discriminative learning of filterbank layer within deep neural network based speech recognition for speaker adaptation. IEICE Trans Inf Syst. 2019;102:364–74.10.1587/transinf.2018EDP7252Search in Google Scholar

[14] Gan Z, Hou M, Hou H, Yang H. Savitzky-Golay filtering and improved energy entropy for speech endpoint detection under low SNR. J Phys: Conf Ser. 2020;1617:1–9.10.1088/1742-6596/1617/1/012070Search in Google Scholar

[15] Syiem B, Dutta SK, Binong J, Singh LJ. Comparison of Khasi speech representations with different spectral features and hidden Markov states. J Electron Sci Technol. 2021;19:155–62.10.1016/j.jnlest.2020.100079Search in Google Scholar

[16] Heriyanto H, Wahyuningrum T, Fitriana GF. Classification of Javanese script hanacara voice using Mel frequency cepstral coefficient MFCC and selection of dominant weight features. J Infotel. 2021;13:84–93.10.20895/infotel.v13i2.657Search in Google Scholar

[17] Huang Z, Kurotori T, Pini R, Benson SM, Zahasky C. Three-dimensional permeability inversion using convolutional neural networks and positron emission tomography. Water Resour Res. 2022;58:1–21.10.1029/2021WR031554Search in Google Scholar

[18] Pally RJ, Samadi S. Application of image processing and convolutional neural networks for flood image classification and semantic segmentation. Environ Model Softw. 2022;148:1–15.10.1016/j.envsoft.2021.105285Search in Google Scholar

[19] Yevnin Y, Chorev S, Dukan I, Toledo Y. Short-term wave forecasts using gated recurrent unit model. Ocean Eng. 2023;268:1–8.10.1016/j.oceaneng.2022.113389Search in Google Scholar

[20] Shobana J, Murali M. An improved self attention mechanism based on optimized BERT-BiLSTM model for accurate polarity prediction. Comput J. 2023;66:1279–94.10.1093/comjnl/bxac013Search in Google Scholar

[21] Maesa A, Garzia F, Scarpiniti M, Cusani R. Text independent automatic speaker recognition system using mel-frequency cepstrum coefficient and Gaussian mixture models. J Inf Secur. 2012;3:335–40.10.4236/jis.2012.34041Search in Google Scholar

Received: 2023-09-19
Revised: 2023-11-23
Accepted: 2023-12-05
Published Online: 2023-12-31

© 2023 the author(s), published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

  1. Research Articles
  2. Salp swarm and gray wolf optimizer for improving the efficiency of power supply network in radial distribution systems
  3. Deep learning in distributed denial-of-service attacks detection method for Internet of Things networks
  4. On numerical characterizations of the topological reduction of incomplete information systems based on evidence theory
  5. A novel deep learning-based brain tumor detection using the Bagging ensemble with K-nearest neighbor
  6. Detecting biased user-product ratings for online products using opinion mining
  7. Evaluation and analysis of teaching quality of university teachers using machine learning algorithms
  8. Efficient mutual authentication using Kerberos for resource constraint smart meter in advanced metering infrastructure
  9. Recognition of English speech – using a deep learning algorithm
  10. A new method for writer identification based on historical documents
  11. Intelligent gloves: An IT intervention for deaf-mute people
  12. Reinforcement learning with Gaussian process regression using variational free energy
  13. Anti-leakage method of network sensitive information data based on homomorphic encryption
  14. An intelligent algorithm for fast machine translation of long English sentences
  15. A lattice-transformer-graph deep learning model for Chinese named entity recognition
  16. Robot indoor navigation point cloud map generation algorithm based on visual sensing
  17. Towards a better similarity algorithm for host-based intrusion detection system
  18. A multiorder feature tracking and explanation strategy for explainable deep learning
  19. Application study of ant colony algorithm for network data transmission path scheduling optimization
  20. Data analysis with performance and privacy enhanced classification
  21. Motion vector steganography algorithm of sports training video integrating with artificial bee colony algorithm and human-centered AI for web applications
  22. Multi-sensor remote sensing image alignment based on fast algorithms
  23. Replay attack detection based on deformable convolutional neural network and temporal-frequency attention model
  24. Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validation
  25. Computer technology of multisensor data fusion based on FWA–BP network
  26. Application of adaptive improved DE algorithm based on multi-angle search rotation crossover strategy in multi-circuit testing optimization
  27. HWCD: A hybrid approach for image compression using wavelet, encryption using confusion, and decryption using diffusion scheme
  28. Environmental landscape design and planning system based on computer vision and deep learning
  29. Wireless sensor node localization algorithm combined with PSO-DFP
  30. Development of a digital employee rating evaluation system (DERES) based on machine learning algorithms and 360-degree method
  31. A BiLSTM-attention-based point-of-interest recommendation algorithm
  32. Development and research of deep neural network fusion computer vision technology
  33. Face recognition of remote monitoring under the Ipv6 protocol technology of Internet of Things architecture
  34. Research on the center extraction algorithm of structured light fringe based on an improved gray gravity center method
  35. Anomaly detection for maritime navigation based on probability density function of error of reconstruction
  36. A novel hybrid CNN-LSTM approach for assessing StackOverflow post quality
  37. Integrating k-means clustering algorithm for the symbiotic relationship of aesthetic community spatial science
  38. Improved kernel density peaks clustering for plant image segmentation applications
  39. Biomedical event extraction using pre-trained SciBERT
  40. Sentiment analysis method of consumer comment text based on BERT and hierarchical attention in e-commerce big data environment
  41. An intelligent decision methodology for triangular Pythagorean fuzzy MADM and applications to college English teaching quality evaluation
  42. Ensemble of explainable artificial intelligence predictions through discriminate regions: A model to identify COVID-19 from chest X-ray images
  43. Image feature extraction algorithm based on visual information
  44. Optimizing genetic prediction: Define-by-run DL approach in DNA sequencing
  45. Study on recognition and classification of English accents using deep learning algorithms
  46. Review Articles
  47. Dimensions of artificial intelligence techniques, blockchain, and cyber security in the Internet of medical things: Opportunities, challenges, and future directions
  48. A systematic literature review of undiscovered vulnerabilities and tools in smart contract technology
  49. Special Issue: Trustworthy Artificial Intelligence for Big Data-Driven Research Applications based on Internet of Everythings
  50. Deep learning for content-based image retrieval in FHE algorithms
  51. Improving binary crow search algorithm for feature selection
  52. Enhancement of K-means clustering in big data based on equilibrium optimizer algorithm
  53. A study on predicting crime rates through machine learning and data mining using text
  54. Deep learning models for multilabel ECG abnormalities classification: A comparative study using TPE optimization
  55. Predicting medicine demand using deep learning techniques: A review
  56. A novel distance vector hop localization method for wireless sensor networks
  57. Development of an intelligent controller for sports training system based on FPGA
  58. Analyzing SQL payloads using logistic regression in a big data environment
  59. Classifying cuneiform symbols using machine learning algorithms with unigram features on a balanced dataset
  60. Waste material classification using performance evaluation of deep learning models
  61. A deep neural network model for paternity testing based on 15-loci STR for Iraqi families
  62. AttentionPose: Attention-driven end-to-end model for precise 6D pose estimation
  63. The impact of innovation and digitalization on the quality of higher education: A study of selected universities in Uzbekistan
  64. A transfer learning approach for the classification of liver cancer
  65. Review of iris segmentation and recognition using deep learning to improve biometric application
  66. Special Issue: Intelligent Robotics for Smart Cities
  67. Accurate and real-time object detection in crowded indoor spaces based on the fusion of DBSCAN algorithm and improved YOLOv4-tiny network
  68. CMOR motion planning and accuracy control for heavy-duty robots
  69. Smart robots’ virus defense using data mining technology
  70. Broadcast speech recognition and control system based on Internet of Things sensors for smart cities
  71. Special Issue on International Conference on Computing Communication & Informatics 2022
  72. Intelligent control system for industrial robots based on multi-source data fusion
  73. Construction pit deformation measurement technology based on neural network algorithm
  74. Intelligent financial decision support system based on big data
  75. Design model-free adaptive PID controller based on lazy learning algorithm
  76. Intelligent medical IoT health monitoring system based on VR and wearable devices
  77. Feature extraction algorithm of anti-jamming cyclic frequency of electronic communication signal
  78. Intelligent auditing techniques for enterprise finance
  79. Improvement of predictive control algorithm based on fuzzy fractional order PID
  80. Multilevel thresholding image segmentation algorithm based on Mumford–Shah model
  81. Special Issue: Current IoT Trends, Issues, and Future Potential Using AI & Machine Learning Techniques
  82. Automatic adaptive weighted fusion of features-based approach for plant disease identification
  83. A multi-crop disease identification approach based on residual attention learning
  84. Aspect-based sentiment analysis on multi-domain reviews through word embedding
  85. RES-KELM fusion model based on non-iterative deterministic learning classifier for classification of Covid19 chest X-ray images
  86. A review of small object and movement detection based loss function and optimized technique
Downloaded on 11.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jisys-2023-0174/html
Scroll to top button