Abstract
Due to insufficient information and feature extraction in existing face-detection methods, as well as limited computing power, designing high-precision and efficient face-detection algorithms is an open challenge. Based on this, we propose an improved face detection algorithm. First, through 1 × 1’s common convolution block (CBL) expands the channel for feature extraction, introduces a depthwise separable residual network into the YOLO-v4 network to further reduce the amount of model computation, and uses CBL to reduce the dimension, so as to improve the efficiency of the subsequent network. Second, the improved attention mechanism is used to splice the high-level features, and the high-level features and the shallow features are fused to obtain the feature vectors containing more information, so as to improve the richness and representativeness of the feature vectors. Finally, the experimental results show that compared with other comparative methods, our method achieves the best results on public face datasets, and our performance in personal face detection is significantly better than other methods.
1 Introduction
Text, photos, videos, and other types of information are becoming increasingly common in everyday life [1,2]. As the main research object of computer vision, pictures, videos, and other information forms are everywhere in our lives, such as movies, animations, face brushing, and sign-in, making our lives more rich and convenient [3,4,5,6]. Computer vision uses modern artificial intelligence technology to detect, recognize, and track the video and pictures collected by the monitoring equipment. Face recognition, as an important component of computer vision, is the main method of identity recognition in daily life. Because of its convenience and speed, it is widely used in public places where identity recognition is required, bringing great convenience to staff, consumers, and other personnel [7,8,9,10]. Face detection is one of the important applications in the field of computer vision, which involves automatic recognition and verification of individual identity information. The research objects are mainly objects that are mainly divided into static pictures and video streams [11,12,13]. Among them, video streams are more real-time, rich, and contain more information than static pictures, which increases the difficulty of face detection technology research for dynamic video [14,15,16,17].
At present, most of the images used in research are in standard format. However, images in real life do not necessarily have standard faces, and existing face detection technologies have robustness problems [18,19,20,21]. This is currently also a challenge in face detection. Deep learning is a branch of machine learning that is based on neural networks and automatically extracts features through training large amounts of data, thereby achieving recognition and understanding of complex patterns [22]. The current deep learning-based face detection algorithms have drawbacks such as sensitivity to environmental factors [23]. Deep learning face detection algorithms may fail under the influence of factors such as lighting, shadows, angles, occlusion, etc., resulting in the algorithm being unable to accurately detect faces [24]. Deep learning algorithms heavily rely on training data, so collecting and processing a large amount of high-quality facial image data is a huge challenge [25]. In addition, deep learning algorithms require a large amount of computing resources and time for training and inference, which limits their promotion in practical applications. Nevertheless, deep learning algorithms still have broad development prospects in the field of facial detection [26]. In the future, deep learning-based facial detection algorithms will develop towards a more intelligent, real-time, and efficient direction, and more innovative application scenarios will also emerge [27]. Based on this, the model proposed in this article includes modifications to the model and the introduction of attention mechanisms. The improved depthwise (DW) separable residual network is introduced in the YOLO-v4 network, and the attention mechanism is introduced.
In this article, the YOLO-v4 backbone network is improved to improve the efficiency of model detection. The advanced features are spliced with the improved attention mechanism, and then the advanced features are fused with the shallow features. In this way, feature vectors containing more information can be obtained, thus improving the richness and representativeness of feature vectors.
This article first introduces the definition of face detection, the research status at home and abroad, and the development process of CNN, then analyzes the difficulties of face detection technology, and finally introduces the innovation work of this article. Second, the DW separable residual network structure and attention mechanism are introduced, and experiments are carried out on the WiderFace dataset and MAFA occluded face dataset. Finally, the article summarizes the full text and points out the shortcomings of this work and the research prospects based on this method.
2 Related research
The purpose of face detection is to detect images, determine whether there are faces in the image, and locate the faces in the image. Face detection algorithms mainly include knowledge-based methods, statistic-based methods, and deep learning methods. Farfade et al. [28] proposed a deep-learning network model based on the Deep Sense Face Detector. This model does not need posture or key point annotation and can capture faces in all directions with a single model. It has strong resistance to various poses. However, the greater the deflection angle and posture change, the lower the accuracy will be. Li et al. [29] construct a cascade structure to detect facial features from rough to fine based on the implementation of a deep convolution network of Cascade CNN. Jiang and Learned-Miller [30] add the center loss function to the original R-CNN structure, which can better detect small-size images. Yu et al. [31] proposed a new intersection over union loss function to replace the commonly used L2 loss function and improve the accuracy of face detection. Woo et al. [32] propose a lightweight convolution block attention module for object recognition. Peng et al. [33] proposed an object partial attention model for fine-grained image classification. This model combines attention mechanism and residual network module and has been successfully applied to fine-grained image classification with good performance. Zeng et al. [34] used the Mish function to improve the original residual network to obtain the improved residual network and introduced the attention mechanism to obtain the final network model to make the extracted facial features more discriminative. Yan et al. [35] designed a method of combining L2 loss and triplet loss to form a loss function and improved the face residual network by using deep separable convolution to reduce the number of network parameters.
However, the above methods can only mine shallow information. The extracted features are weak in the expression of human faces, often resulting in low accuracy of face recognition. To overcome the above problems, the innovation of the proposed method lies in the following:
An improved DW separable residual network is introduced to improve YOLO-v4, and a 1 × 1 common convolution block (CBL) is used to expand the channel for feature extraction. The DW separable convolution is introduced to further reduce the model computation, and CBL is used for dimensionality reduction to improve the computational efficiency of subsequent networks.
The improved attention mechanism is used to fuse advanced features and shallow features to obtain feature vectors containing more information.
3 Proposed method
3.1 Network framework
The complex network structure often makes the detection model slow to run, difficult to train, and difficult to achieve real-time detection speed on equipment with low computing costs. The detection speed based on the lightweight face detection method is fast, which can well meet the requirements of fast face detection in practical applications. However, the precision of face detection does not meet standards, and face position is not accurate, especially for complex faces. Therefore, we propose a lightweight and efficient face-detection algorithm. The backbone network introduces a more balanced and efficient DW separable residual network based on a lightweight network. In addition, our method integrates an attention mechanism. The method is mainly composed of three parts, namely, a lightweight feature extraction network, an improved attention module, and an output layer. Figure 1 depicts the entire network framework.

Network framework.
3.2 Depthwise separable residual network
YOLO-v4 backbone network adopts residual network structure. The specific process is as follows: The input data first passes through the 2 × 2 basic volume layer and then is divided into two parts. One part is used as the main part (Resblock) to iterate in the loop to obtain the operational relationship between the weight and the input data. The other part is used to establish an independent residual edge, and the input data are directly output after a small amount of processing. Finally, the output data of the two parts are added across layers, and the sum result is used as the output of this layer. This structure is used to make gradient flow spread in different paths by separating gradient flow, so that the network can learn more correlation differences of gradient flow. At the same time, it can reduce computational power consumption and improve the computing speed and network learning ability by reducing the amount of cyclic stacking computation. In this article, the remaining network structure in the YOLO-v4 backbone network is improved to a deeply separable remaining network. Figure 2 depicts the specific structure. This structure continues the idea of YOLO-v4 separating gradient flow and continues to iterate some input data while the other part jumps to the end. Then, the ordinary convolution residual block of the original cyclic iteration is replaced with the deep separable residual block. First, pass 1 × 1’s CBL expands the channel for feature extraction, then introduces DW to further reduce the amount of model computation, and finally uses 1 × 1 CBL to reduce dimension to improve the efficiency of the subsequent network.

Depthwise separable residual structure.
In the DW separable residual network structure of this article, the process of convolution is as follows: assuming that the number of input image channels is
To sum up, the number of output channels of DW separable residual network modules used in the backbone network can be controlled. It is composed of 1 CBL and 17 DW separable residual network structures (I-Resblock) with different steps. This design can reduce the overall computational load of the model in this article, thus improving the speed of face detection in the natural environment, and is conducive to rapid real-time face detection.
3.3 Improved attention module
Attention map focusing on visual interpretation is an important field in image recognition. In the unsupervised learning mode, the previous attention model only uses the response value of the convolution layer to extract the weight of the attention mechanism in the feedforward propagation process. The attention mechanism in this article is shown in Figure 3, including the feature extractor, enhanced attention module, and perception module.

Improved attention mechanism structure chart.
The construction of an enhanced attention module and perception module further extracts the deep features of the picture. The enhanced attention module pays attention to important features, strengthens training on important features, and extracts deep features. The enhanced attention module has a
The perception branch structure is the same as that of traditional image classification models such as VGGNet and Res-Net. Attention mapping is applied to feature mapping through the attention mechanism [32]. The specific calculations are as follows:
where
In general, the parameters in the fully connected layer will be quite large, so directly adding multiple fully connected layers will increase parameter quantity and complexity, making model training slow and prone to over-fitting. Generally, Mirror and Crop are selected for data conversion; that is, large images are cut into small images according to a fixed scale and input into the convolution network. However, this kind of method has the problem of too much parameter calculation and is easy to over-fit. Other methods, such as the dropout method, can reduce the over-fitting phenomenon, but the parameter of this method is the problem of too much calculation and is not easy to implement. Therefore, replace the FC layer in the network with
4 Experiment and analysis
4.1 Experimental environment and evaluation index
The designed system involves a lot of calculations during operation, so it has certain requirements for the hardware environment. Table 1 shows the detailed configuration of our system.
Experimental configuration
Project | Specific information |
---|---|
Operating system | Windows |
Raphics card | GTX 1080Ti |
Memory | 64GB |
Language | Python3.5 |
Development platform | Pytorch |
Tensorflow | 2.12 |
Video camera | 720p HD |
Solid state drive | 1T |
To verify the precision of our method, the face detection methods in the studies of Zeng et al. [34] and Yan et al. [35] are selected for comparative experiments. During training, random initialization is used for all convolution layer parameters. The model optimization method uses random gradient descent. The batch size is set to 32, and the weight attenuation is set to 0.0005. The momentum is set to 0.9, and the maximum number of iterations is set to 12 × 104. The first 9 × 104 iterations and learning rate is set to 10−3. The last 3 × 104 iterations and learning rate is set to 10−4.
The method’s performance metrics were evaluated using accuracy, recall, and average accuracy (AP) [37]:
Here, TP denotes the tally of positive samples correctly identified as positive by the classifier, TN represents the count of negative samples correctly identified as negative by the classifier, FP signifies the count of negative samples misclassified as positive by the classifier, and FN signifies the count of positive samples misclassified as negative by the classifier.
4.2 Training process
Figure 4 shows the training and validation loss curves for different improved networks. The final fluctuation of the training and validation loss curves of each network is small, which indicates that the network stability is good. It can be seen from Figure 4(a) that the original YOLO-v4 algorithm has a small degree of loss reduction and a slow speed during the training process. As can be seen from Figure 4(b), the proposed algorithm has a large degree of loss reduction and a fast speed during the training process. The loss curve of this method no longer decreases at the 80th iteration, and the model convergence is completed. Therefore, the attention mechanism can improve the overall accuracy and convergence rate of our model.

Training and validation loss curve: (a) YOLO-v4 and (b) proposed method.
4.3 WiderFace dataset
The WiderFace dataset was first released as a benchmark dataset for human face detection in 2015. It includes 32,203 images and 393,703 labeled faces. Each subset contains three detection difficulty levels: easy, medium, and difficult.
To prove the effectiveness of our method, this section compares the studies of Zeng et al. [34] and Yan et al. [35] with the face detection method. Table 2 shows that average precision values of 0.953, 0.928, and 0.910 for the three difficulty subsets of WiderFace are higher than those of the reference literature. The methods in the studies of Zeng et al. [34] and Yan et al. [35] have weak feature extraction ability and can use limited facial features, resulting in the detection accuracy of only 0.923 and 0.926 in simple subsets. The proposed method improves the deep separable residual network and introduces DW to further reduce the computational load of the model. An improved attention mechanism is used to concatenate advanced features, and advanced features are fused with shallow features to obtain feature vectors that contain more information. Therefore, the proposed method has high detection performance (Figure 5).
Comparison between this method and the comparison method under WiderFace
Model | Easy (AP) | Medium (AP) | Hard (AP) |
---|---|---|---|
Zeng et al. [34] | 0.923 | 0.912 | 0.886 |
Yan et al. [35] | 0.926 | 0.906 | 0.872 |
Proposed method | 0.953 | 0.928 | 0.910 |

Test results of different difficulty subsets: (a) easy, (b) medium, and (c) hard.
4.4 MAFA occluded face dataset
The MAFA dataset labeled 35,806 rectangular frames of faces in 30,811 face images containing multiple blocked scenes and mask types. To verify the effectiveness of our method in detecting occluded faces, we conducted experimental comparisons on the MAFA dataset under the same evaluation criteria.
To prove the effectiveness of our method, this section compares the studies of Zeng et al. [34] and Yan et al. [35] with the face detection method. From Table 3, it can be seen that our method has the highest average accuracy on the MAFA dataset. In Table 3, the first five attributes correspond to the five specific directions of the face. As the face deflection angle increases, the detection accuracy in five directions decreases, but our method achieves the highest detection accuracy. Taking into account the detection results under various attributes, the average precision of this method is 0.891 (Figure 6).
Comparison between this method and the comparison method under MAFA data
Model | Zeng et al. [34] | Yan et al. [35] | Proposed method |
---|---|---|---|
Left | 0.886 | 0.912 | 0.923 |
Left‑front | 0.872 | 0906 | 0.926 |
Front | 0.910 | 0.928 | 0.953 |
Right‑front | 0.802 | 0.821 | 0.834 |
Right | 0.785 | 0.876 | 0.883 |
Simple | 0.615 | 0.714 | 0.897 |
Complex | 0.602 | 0.703 | 0.881 |
Body | 0.596 | 0.698 | 0.864 |
Hybrid | 0.587 | 0.676 | 0.856 |

Some test results of the MAFA test set.
To further demonstrate the accuracy and effectiveness of our method, self-comparison experiments were conducted on the MAFA dataset. The experiment takes YOLO-v4 as the baseline method. YOLO-v4 + Idsrn means that the improved DW separable residual network (Idsrn) is added on the basis of YOLO-v4. YOLO-v4 + Idsrn + Am indicates that the improved DW separable residual network and attention mechanism are added on the basis of YOLO-v4. The results of the self-comparison experiment of the above methods in the MAFA test set are described in Table 4. It can be seen that introducing attention networks can significantly improve the detection accuracy of multi-scale occluded faces.
Self-comparison experiment results%
Method | AP |
---|---|
YOLO-v4 | 0.814 |
YOLO-v4 + Idsrn | 0.853 |
YOLO-v4 + Idsrn + Am | 0.902 |
4.5 Limitation and discussion
To more intuitively verify the effectiveness of our method, visual analysis was conducted using the GradCAM tool. Figure 7 shows the visualization of SeNet and our method. The redder the color, the higher the attention of the model. From Figure 7, it can be seen that the red area range of our method is more concentrated, with darker colors around the eyes and nose, which will pay more attention to the facial area, verifying that our method is more effective in focusing on important facial regions.

Different attention visualization results: (a) original image, (b) SeNet, and (c) proposed method.
Although the model proposed in this article has improved the detection accuracy, we also noticed some areas that need to be improved. First, the number of parameters in the new model has increased compared to the original model, resulting in an increase in the size of the model, increasing the complexity of the calculation and the need for storage. The next step is to use model compression and model pruning methods to reduce the number of parameters, so as to better balance model size and detection accuracy. Second, there is room for improvement in the detection speed of our model. Compared with other methods, our model does not achieve the fastest detection speed. In order to improve the real-time performance of the model, we will investigate the use of more efficient feature extraction and matching methods or adopt more optimized hardware acceleration techniques. The experimental test in this article only verifies the generalized complex face detection. Although we have achieved some results in the experiment, our next step is to investigate the specific challenges in harsh conditions.
5 Conclusion
This article proposes a face detection method based on improved DW separable residual network and attention mechanism. The improved DW separable residual network is introduced to improve YOLO-v4, and an improved attention mechanism is introduced to obtain a feature vector containing more information. Our model has average accuracy values of 0.953, 0.928, and 0.910 for the three difficulty subsets of WiderFace, all of which are higher than the comparison algorithm. Our method also has the highest average accuracy on the MAFA dataset. Experiments have shown that our method can achieve higher accuracy in face detection and outperform the comparison method in performance.
However, this article does not consider that there may be data imbalance in the dataset, which may also affect the detection accuracy of the face detection algorithm. In addition, this article does not consider the influence of the algorithm on shadow, angle, occlusion, and other factors, which will also affect the detection accuracy. The essence of face detection research at this stage is to first extract features related to faces from images and then classify the features to detect faces. Therefore, whether we can design and summarize a more novel network structure different from classification is worth further study. Our face detection method can also be applied to CPUs and embedded devices. However, due to time and device limitations, relevant testing has not yet been conducted on mobile terminals and embedded devices. The next step can be further studied in embedded devices and mobile terminals.
-
Funding information: This work was supported by the Planning subject for the 14th five-year plan of Shanxi Education Sciences (No. GH21105 and No. GH-220335) and Educational Reform and Innovation Project of Higher Education in Shanxi Province (No. J20221040).
-
Author contributions: Yue Qi (Writing - original draft, Project administration, Writing - review & editing, Conceptualization, Resources), Yiqin Wang (Writing - original draft, Project administration, Writing - review & editing, Data curation, Software), Yunyun Dong (Formal Analysis, Supervision).
-
Conflict of interest: Authors state no conflict of interest.
-
Data availability statement: The data used to support the findings of this study are included within the article. All image data comes from the public data set http://shuoyang1213.me/WIDERFACE/ and https://github.com/cabani/MaskedFace-Net.
References
[1] Ahonen T, Hadid A, Pietikainen M. Face description with local binary patterns: Application to face recognition. IEEE Trans Pattern Anal Mach Intell. 2006 Oct;28(12):2037–41.10.1109/TPAMI.2006.244Search in Google Scholar PubMed
[2] Zhao W, Chellappa R, Phillips PJ, Rosenfeld A. Face recognition: A literature survey. ACM Comput Surv (CSUR). 2003 Dec;35(4):399–458.10.1145/954339.954342Search in Google Scholar
[3] Phillips PJ, Moon H, Rizvi SA, Rauss PJ. The FERET evaluation methodology for face-recognition algorithms. IEEE Trans Pattern Anal Mach Intell. 2000 Oct;22(10):1090–104.10.1109/34.879790Search in Google Scholar
[4] Wang Y, Zhang C, Lu J, Bai L, Zhao Z, Han J. Weld reinforcement analysis based on long-term prediction of molten pool image in additive manufacturing. IEEE Access. 2020 Apr;8:69908–18.10.1109/ACCESS.2020.2986130Search in Google Scholar
[5] Phillips PJ, Wechsler H, Huang J, Rauss PJ. The FERET database and evaluation procedure for face-recognition algorithms. Image Vis Comput. 1998 Apr;16(5):295–306.10.1016/S0262-8856(97)00070-XSearch in Google Scholar
[6] Yu H, Yang J. A direct LDA algorithm for high-dimensional data – with application to face recognition. Pattern Recognit. 2001 Oct;34(10):2067–70.10.1016/S0031-3203(00)00162-XSearch in Google Scholar
[7] Liu C, Wechsler H. Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans Image Process. 2002 Apr;11(4):467–76.10.1109/TIP.2002.999679Search in Google Scholar PubMed
[8] Lawrence S, Giles CL, Tsoi AC, Back AD. Face recognition: A convolutional neural-network approach. IEEE Trans Neural Netw. 1997 Jan;8(1):98–113.10.1109/72.554195Search in Google Scholar PubMed
[9] Pankaj P, Bharti PK, Kumar B. A new design of occlusion-invariant face recognition using optimal pattern extraction and CNN with GRU-based architecture. Int J Image Graph. 2023 Jul;23(4):2350029.10.1142/S0219467823500298Search in Google Scholar
[10] Qiu H, Chen X, Liu W, Zhou G, Wang Y, Lai J. A fast ℓ1-solver and its applications to robust face recognition. J Ind Manag Optim (JIMO). 2012;8:163–78.10.3934/jimo.2012.8.163Search in Google Scholar
[11] Morton J, Johnson MH. CONSPEC and CONLERN: a two-process theory of infant face recognition. Psychol Rev. 1991 Apr;98(2):164.10.1037//0033-295X.98.2.164Search in Google Scholar
[12] Chien JT, Wu CC. Discriminant waveletfaces and nearest feature classifiers for face recognition. IEEE Trans Pattern Anal Mach Intell. 2002 Dec;24(12):1644–9.10.1109/TPAMI.2002.1114855Search in Google Scholar
[13] Liao S, Zhu X, Lei Z, Zhang L, Li SZ. Learning multi-scale block local binary patterns for face recognition. In Advances in Biometrics: International Conference, ICB 2007, Seoul, Korea, August 27–29, 2007. Proceedings 2007. Springer Berlin Heidelberg; p. 828–37.10.1007/978-3-540-74549-5_87Search in Google Scholar
[14] Nelson CA. The development and neural bases of face recognition. Infant Child Dev: An Int J Res Pract. 2001 Mar;10(1–2):3–18.10.1002/icd.239Search in Google Scholar
[15] Deng W, Hu J, Guo J. Extended SRC: Undersampled face recognition via intraclass variant dictionary. IEEE Trans Pattern Anal Mach Intell. 2012 Jan;34(9):1864–70.10.1109/TPAMI.2012.30Search in Google Scholar PubMed
[16] Schroff F, Kalenichenko D, Philbin J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2015. p. 815–23.10.1109/CVPR.2015.7298682Search in Google Scholar
[17] Chen LF, Liao HY, Ko MT, Lin JC, Yu GJ. A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognit. 2000 Oct;33(10):1713–26.10.1016/S0031-3203(99)00139-9Search in Google Scholar
[18] Zhang L, Yang M, Feng X. Sparse representation or collaborative representation: Which helps face recognition? In 2011 International Conference on Computer Vision. IEEE; 2011. p. 471–8.10.1109/ICCV.2011.6126277Search in Google Scholar
[19] Nguyen HV, Bai L, Shen L. Local gabor binary pattern whitened pca: A novel approach for face recognition from single image per person. In Advances in Biometrics: Third International Conference, ICB 2009, Alghero, Italy, June 2–5, 2009. Proceedings 3 2009. Springer Berlin Heidelberg; p. 269–78.10.1007/978-3-642-01793-3_28Search in Google Scholar
[20] Pankaj P, Bharti PK, Kumar B. A new design of occlusion invariant face recognition using optimal pattern extraction and CNN with GRU-based architecture. Int J Inf Secur Priv (IJISP). 2022 Jan;16(1):1–25.10.4018/IJISP.305222Search in Google Scholar
[21] Koley S, Roy H, Dhar S, Bhattacharjee D. Illumination invariant face recognition using fused cross lattice pattern of phase congruency (FCLPPC). Inf Sci. 2022 Jan;584:633–48.10.1016/j.ins.2021.10.059Search in Google Scholar
[22] Kasongo SM. A deep learning technique for intrusion detection system using a Recurrent Neural Networks based framework. Comput Commun. 2023 Feb;199:113–25.10.1016/j.comcom.2022.12.010Search in Google Scholar
[23] Krishnaraj M, Raj RJ. Video frame-based deep learning face detection-a review. In 2021 3rd International Conference on Signal Processing and Communication (ICPSC); 2021. IEEE; p. 207–13.10.1109/ICSPC51351.2021.9451782Search in Google Scholar
[24] Castelblanco A, Rivera E, Solano J, Tengana L, Lopez C, Ochoa M. Dynamic face authentication systems: Deep learning verification for camera close-Up and head rotation paradigms. Comput Secur. 2022 Apr;115:102629.10.1016/j.cose.2022.102629Search in Google Scholar
[25] Tayyab M, Marjani M, Jhanjhi NZ, Hashem IA, Usmani RS, Qamar F. A comprehensive review on deep learning algorithms: Security and privacy issues. Comput Secur. 2023;131:103297.10.1016/j.cose.2023.103297Search in Google Scholar
[26] Sathyamoorthy B, Snehalatha U, Rajalakshmi T. Facial emotion detection of thermal and digital images based on machine learning techniques. Biomed Eng: Appl Basis Commun. 2023 Feb;35(1):2250052.10.4015/S1016237222500521Search in Google Scholar
[27] Ge H, Zhu Z, Dai Y, Wang B, Wu X. Facial expression recognition based on deep learning. Comput Methods Prog Biomed. 2022 Mar;215:106621.10.1016/j.cmpb.2022.106621Search in Google Scholar PubMed
[28] Farfade SS, Saberian MJ, Li LJ. Multi-view face detection using deep convolutional neural networks. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval; 2015. p. 643–50.10.1145/2671188.2749408Search in Google Scholar
[29] Li H, Lin Z, Shen X, Brandt J, Hua G. A convolutional neural network cascade for face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2015. p. 5325–34.10.1109/CVPR.2015.7299170Search in Google Scholar
[30] Jiang H, Learned-Miller E. Face detection with the faster R-CNN. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017); 2017. IEEE; p. 650–7.10.1109/FG.2017.82Search in Google Scholar
[31] Yu J, Jiang Y, Wang Z, Cao Z, Huang T. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM International Conference on Multimedia; 2016. p. 516–20.10.1145/2964284.2967274Search in Google Scholar
[32] Woo S, Park J, Lee JY, Kweon IS. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 3–19.10.1007/978-3-030-01234-2_1Search in Google Scholar
[33] Peng Y, He X, Zhao J. Object-part attention model for fine-grained image classification. IEEE Trans Image Process. 2017 Nov;27(3):1487–500.10.1109/TIP.2017.2774041Search in Google Scholar PubMed
[34] Zeng J, Li J, Feng L. Face recognition based on Improved residual network and channel attention. Autom Control Comput Sci. 2022 Oct;56(5):383–92.10.3103/S0146411622050108Search in Google Scholar
[35] Yan W, Liu T, Liu S, Geng Y, Sun Z. A lightweight face recognition method based on depthwise separable convolution and triplet loss. In 2020 39th Chinese Control Conference (CCC); 2020. IEEE; p. 7570–5.10.23919/CCC50068.2020.9189491Search in Google Scholar
[36] Chollet F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 1251–8.10.1109/CVPR.2017.195Search in Google Scholar
[37] Powers DM. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arxiv preprint arxiv:2010.16061; 2020 Oct.Search in Google Scholar
© 2024 the author(s), published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Research Articles
- A study on intelligent translation of English sentences by a semantic feature extractor
- Detecting surface defects of heritage buildings based on deep learning
- Combining bag of visual words-based features with CNN in image classification
- Online addiction analysis and identification of students by applying gd-LSTM algorithm to educational behaviour data
- Improving multilayer perceptron neural network using two enhanced moth-flame optimizers to forecast iron ore prices
- Sentiment analysis model for cryptocurrency tweets using different deep learning techniques
- Periodic analysis of scenic spot passenger flow based on combination neural network prediction model
- Analysis of short-term wind speed variation, trends and prediction: A case study of Tamil Nadu, India
- Cloud computing-based framework for heart disease classification using quantum machine learning approach
- Research on teaching quality evaluation of higher vocational architecture majors based on enterprise platform with spherical fuzzy MAGDM
- Detection of sickle cell disease using deep neural networks and explainable artificial intelligence
- Interval-valued T-spherical fuzzy extended power aggregation operators and their application in multi-criteria decision-making
- Characterization of neighborhood operators based on neighborhood relationships
- Real-time pose estimation and motion tracking for motion performance using deep learning models
- QoS prediction using EMD-BiLSTM for II-IoT-secure communication systems
- A novel framework for single-valued neutrosophic MADM and applications to English-blended teaching quality evaluation
- An intelligent error correction model for English grammar with hybrid attention mechanism and RNN algorithm
- Prediction mechanism of depression tendency among college students under computer intelligent systems
- Research on grammatical error correction algorithm in English translation via deep learning
- Microblog sentiment analysis method using BTCBMA model in Spark big data environment
- Application and research of English composition tangent model based on unsupervised semantic space
- 1D-CNN: Classification of normal delivery and cesarean section types using cardiotocography time-series signals
- Real-time segmentation of short videos under VR technology in dynamic scenes
- Application of emotion recognition technology in psychological counseling for college students
- Classical music recommendation algorithm on art market audience expansion under deep learning
- A robust segmentation method combined with classification algorithms for field-based diagnosis of maize plant phytosanitary state
- Integration effect of artificial intelligence and traditional animation creation technology
- Artificial intelligence-driven education evaluation and scoring: Comparative exploration of machine learning algorithms
- Intelligent multiple-attributes decision support for classroom teaching quality evaluation in dance aesthetic education based on the GRA and information entropy
- A study on the application of multidimensional feature fusion attention mechanism based on sight detection and emotion recognition in online teaching
- Blockchain-enabled intelligent toll management system
- A multi-weapon detection using ensembled learning
- Deep and hand-crafted features based on Weierstrass elliptic function for MRI brain tumor classification
- Design of geometric flower pattern for clothing based on deep learning and interactive genetic algorithm
- Mathematical media art protection and paper-cut animation design under blockchain technology
- Deep reinforcement learning enhances artistic creativity: The case study of program art students integrating computer deep learning
- Transition from machine intelligence to knowledge intelligence: A multi-agent simulation approach to technology transfer
- Research on the TF–IDF algorithm combined with semantics for automatic extraction of keywords from network news texts
- Enhanced Jaya optimization for improving multilayer perceptron neural network in urban air quality prediction
- Design of visual symbol-aided system based on wireless network sensor and embedded system
- Construction of a mental health risk model for college students with long and short-term memory networks and early warning indicators
- Personalized resource recommendation method of student online learning platform based on LSTM and collaborative filtering
- Employment management system for universities based on improved decision tree
- English grammar intelligent error correction technology based on the n-gram language model
- Speech recognition and intelligent translation under multimodal human–computer interaction system
- Enhancing data security using Laplacian of Gaussian and Chacha20 encryption algorithm
- Construction of GCNN-based intelligent recommendation model for answering teachers in online learning system
- Neural network big data fusion in remote sensing image processing technology
- Research on the construction and reform path of online and offline mixed English teaching model in the internet era
- Real-time semantic segmentation based on BiSeNetV2 for wild road
- Online English writing teaching method that enhances teacher–student interaction
- Construction of a painting image classification model based on AI stroke feature extraction
- Big data analysis technology in regional economic market planning and enterprise market value prediction
- Location strategy for logistics distribution centers utilizing improved whale optimization algorithm
- Research on agricultural environmental monitoring Internet of Things based on edge computing and deep learning
- The application of curriculum recommendation algorithm in the driving mechanism of industry–teaching integration in colleges and universities under the background of education reform
- Application of online teaching-based classroom behavior capture and analysis system in student management
- Evaluation of online teaching quality in colleges and universities based on digital monitoring technology
- Face detection method based on improved YOLO-v4 network and attention mechanism
- Study on the current situation and influencing factors of corn import trade in China – based on the trade gravity model
- Research on business English grammar detection system based on LSTM model
- Multi-source auxiliary information tourist attraction and route recommendation algorithm based on graph attention network
- Multi-attribute perceptual fuzzy information decision-making technology in investment risk assessment of green finance Projects
- Research on image compression technology based on improved SPIHT compression algorithm for power grid data
- Optimal design of linear and nonlinear PID controllers for speed control of an electric vehicle
- Traditional landscape painting and art image restoration methods based on structural information guidance
- Traceability and analysis method for measurement laboratory testing data based on intelligent Internet of Things and deep belief network
- A speech-based convolutional neural network for human body posture classification
- The role of the O2O blended teaching model in improving the teaching effectiveness of physical education classes
- Genetic algorithm-assisted fuzzy clustering framework to solve resource-constrained project problems
- Behavior recognition algorithm based on a dual-stream residual convolutional neural network
- Ensemble learning and deep learning-based defect detection in power generation plants
- Optimal design of neural network-based fuzzy predictive control model for recommending educational resources in the context of information technology
- An artificial intelligence-enabled consumables tracking system for medical laboratories
- Utilization of deep learning in ideological and political education
- Detection of abnormal tourist behavior in scenic spots based on optimized Gaussian model for background modeling
- RGB-to-hyperspectral conversion for accessible melanoma detection: A CNN-based approach
- Optimization of the road bump and pothole detection technology using convolutional neural network
- Comparative analysis of impact of classification algorithms on security and performance bug reports
- Cross-dataset micro-expression identification based on facial ROIs contribution quantification
- Demystifying multiple sclerosis diagnosis using interpretable and understandable artificial intelligence
- Unifying optimization forces: Harnessing the fine-structure constant in an electromagnetic-gravity optimization framework
- E-commerce big data processing based on an improved RBF model
- Analysis of youth sports physical health data based on cloud computing and gait awareness
- CCLCap-AE-AVSS: Cycle consistency loss based capsule autoencoders for audio–visual speech synthesis
- An efficient node selection algorithm in the context of IoT-based vehicular ad hoc network for emergency service
- Computer aided diagnoses for detecting the severity of Keratoconus
- Improved rapidly exploring random tree using salp swarm algorithm
- Network security framework for Internet of medical things applications: A survey
- Predicting DoS and DDoS attacks in network security scenarios using a hybrid deep learning model
- Enhancing 5G communication in business networks with an innovative secured narrowband IoT framework
- Quokka swarm optimization: A new nature-inspired metaheuristic optimization algorithm
- Digital forensics architecture for real-time automated evidence collection and centralization: Leveraging security lake and modern data architecture
- Image modeling algorithm for environment design based on augmented and virtual reality technologies
- Enhancing IoT device security: CNN-SVM hybrid approach for real-time detection of DoS and DDoS attacks
- High-resolution image processing and entity recognition algorithm based on artificial intelligence
- Review Articles
- Transformative insights: Image-based breast cancer detection and severity assessment through advanced AI techniques
- Network and cybersecurity applications of defense in adversarial attacks: A state-of-the-art using machine learning and deep learning methods
- Applications of integrating artificial intelligence and big data: A comprehensive analysis
- A systematic review of symbiotic organisms search algorithm for data clustering and predictive analysis
- Modelling Bitcoin networks in terms of anonymity and privacy in the metaverse application within Industry 5.0: Comprehensive taxonomy, unsolved issues and suggested solution
- Systematic literature review on intrusion detection systems: Research trends, algorithms, methods, datasets, and limitations
Articles in the same Issue
- Research Articles
- A study on intelligent translation of English sentences by a semantic feature extractor
- Detecting surface defects of heritage buildings based on deep learning
- Combining bag of visual words-based features with CNN in image classification
- Online addiction analysis and identification of students by applying gd-LSTM algorithm to educational behaviour data
- Improving multilayer perceptron neural network using two enhanced moth-flame optimizers to forecast iron ore prices
- Sentiment analysis model for cryptocurrency tweets using different deep learning techniques
- Periodic analysis of scenic spot passenger flow based on combination neural network prediction model
- Analysis of short-term wind speed variation, trends and prediction: A case study of Tamil Nadu, India
- Cloud computing-based framework for heart disease classification using quantum machine learning approach
- Research on teaching quality evaluation of higher vocational architecture majors based on enterprise platform with spherical fuzzy MAGDM
- Detection of sickle cell disease using deep neural networks and explainable artificial intelligence
- Interval-valued T-spherical fuzzy extended power aggregation operators and their application in multi-criteria decision-making
- Characterization of neighborhood operators based on neighborhood relationships
- Real-time pose estimation and motion tracking for motion performance using deep learning models
- QoS prediction using EMD-BiLSTM for II-IoT-secure communication systems
- A novel framework for single-valued neutrosophic MADM and applications to English-blended teaching quality evaluation
- An intelligent error correction model for English grammar with hybrid attention mechanism and RNN algorithm
- Prediction mechanism of depression tendency among college students under computer intelligent systems
- Research on grammatical error correction algorithm in English translation via deep learning
- Microblog sentiment analysis method using BTCBMA model in Spark big data environment
- Application and research of English composition tangent model based on unsupervised semantic space
- 1D-CNN: Classification of normal delivery and cesarean section types using cardiotocography time-series signals
- Real-time segmentation of short videos under VR technology in dynamic scenes
- Application of emotion recognition technology in psychological counseling for college students
- Classical music recommendation algorithm on art market audience expansion under deep learning
- A robust segmentation method combined with classification algorithms for field-based diagnosis of maize plant phytosanitary state
- Integration effect of artificial intelligence and traditional animation creation technology
- Artificial intelligence-driven education evaluation and scoring: Comparative exploration of machine learning algorithms
- Intelligent multiple-attributes decision support for classroom teaching quality evaluation in dance aesthetic education based on the GRA and information entropy
- A study on the application of multidimensional feature fusion attention mechanism based on sight detection and emotion recognition in online teaching
- Blockchain-enabled intelligent toll management system
- A multi-weapon detection using ensembled learning
- Deep and hand-crafted features based on Weierstrass elliptic function for MRI brain tumor classification
- Design of geometric flower pattern for clothing based on deep learning and interactive genetic algorithm
- Mathematical media art protection and paper-cut animation design under blockchain technology
- Deep reinforcement learning enhances artistic creativity: The case study of program art students integrating computer deep learning
- Transition from machine intelligence to knowledge intelligence: A multi-agent simulation approach to technology transfer
- Research on the TF–IDF algorithm combined with semantics for automatic extraction of keywords from network news texts
- Enhanced Jaya optimization for improving multilayer perceptron neural network in urban air quality prediction
- Design of visual symbol-aided system based on wireless network sensor and embedded system
- Construction of a mental health risk model for college students with long and short-term memory networks and early warning indicators
- Personalized resource recommendation method of student online learning platform based on LSTM and collaborative filtering
- Employment management system for universities based on improved decision tree
- English grammar intelligent error correction technology based on the n-gram language model
- Speech recognition and intelligent translation under multimodal human–computer interaction system
- Enhancing data security using Laplacian of Gaussian and Chacha20 encryption algorithm
- Construction of GCNN-based intelligent recommendation model for answering teachers in online learning system
- Neural network big data fusion in remote sensing image processing technology
- Research on the construction and reform path of online and offline mixed English teaching model in the internet era
- Real-time semantic segmentation based on BiSeNetV2 for wild road
- Online English writing teaching method that enhances teacher–student interaction
- Construction of a painting image classification model based on AI stroke feature extraction
- Big data analysis technology in regional economic market planning and enterprise market value prediction
- Location strategy for logistics distribution centers utilizing improved whale optimization algorithm
- Research on agricultural environmental monitoring Internet of Things based on edge computing and deep learning
- The application of curriculum recommendation algorithm in the driving mechanism of industry–teaching integration in colleges and universities under the background of education reform
- Application of online teaching-based classroom behavior capture and analysis system in student management
- Evaluation of online teaching quality in colleges and universities based on digital monitoring technology
- Face detection method based on improved YOLO-v4 network and attention mechanism
- Study on the current situation and influencing factors of corn import trade in China – based on the trade gravity model
- Research on business English grammar detection system based on LSTM model
- Multi-source auxiliary information tourist attraction and route recommendation algorithm based on graph attention network
- Multi-attribute perceptual fuzzy information decision-making technology in investment risk assessment of green finance Projects
- Research on image compression technology based on improved SPIHT compression algorithm for power grid data
- Optimal design of linear and nonlinear PID controllers for speed control of an electric vehicle
- Traditional landscape painting and art image restoration methods based on structural information guidance
- Traceability and analysis method for measurement laboratory testing data based on intelligent Internet of Things and deep belief network
- A speech-based convolutional neural network for human body posture classification
- The role of the O2O blended teaching model in improving the teaching effectiveness of physical education classes
- Genetic algorithm-assisted fuzzy clustering framework to solve resource-constrained project problems
- Behavior recognition algorithm based on a dual-stream residual convolutional neural network
- Ensemble learning and deep learning-based defect detection in power generation plants
- Optimal design of neural network-based fuzzy predictive control model for recommending educational resources in the context of information technology
- An artificial intelligence-enabled consumables tracking system for medical laboratories
- Utilization of deep learning in ideological and political education
- Detection of abnormal tourist behavior in scenic spots based on optimized Gaussian model for background modeling
- RGB-to-hyperspectral conversion for accessible melanoma detection: A CNN-based approach
- Optimization of the road bump and pothole detection technology using convolutional neural network
- Comparative analysis of impact of classification algorithms on security and performance bug reports
- Cross-dataset micro-expression identification based on facial ROIs contribution quantification
- Demystifying multiple sclerosis diagnosis using interpretable and understandable artificial intelligence
- Unifying optimization forces: Harnessing the fine-structure constant in an electromagnetic-gravity optimization framework
- E-commerce big data processing based on an improved RBF model
- Analysis of youth sports physical health data based on cloud computing and gait awareness
- CCLCap-AE-AVSS: Cycle consistency loss based capsule autoencoders for audio–visual speech synthesis
- An efficient node selection algorithm in the context of IoT-based vehicular ad hoc network for emergency service
- Computer aided diagnoses for detecting the severity of Keratoconus
- Improved rapidly exploring random tree using salp swarm algorithm
- Network security framework for Internet of medical things applications: A survey
- Predicting DoS and DDoS attacks in network security scenarios using a hybrid deep learning model
- Enhancing 5G communication in business networks with an innovative secured narrowband IoT framework
- Quokka swarm optimization: A new nature-inspired metaheuristic optimization algorithm
- Digital forensics architecture for real-time automated evidence collection and centralization: Leveraging security lake and modern data architecture
- Image modeling algorithm for environment design based on augmented and virtual reality technologies
- Enhancing IoT device security: CNN-SVM hybrid approach for real-time detection of DoS and DDoS attacks
- High-resolution image processing and entity recognition algorithm based on artificial intelligence
- Review Articles
- Transformative insights: Image-based breast cancer detection and severity assessment through advanced AI techniques
- Network and cybersecurity applications of defense in adversarial attacks: A state-of-the-art using machine learning and deep learning methods
- Applications of integrating artificial intelligence and big data: A comprehensive analysis
- A systematic review of symbiotic organisms search algorithm for data clustering and predictive analysis
- Modelling Bitcoin networks in terms of anonymity and privacy in the metaverse application within Industry 5.0: Comprehensive taxonomy, unsolved issues and suggested solution
- Systematic literature review on intrusion detection systems: Research trends, algorithms, methods, datasets, and limitations