Robust and energy-efficient expression recognition based on improved deep ResNets

Yunhua Chen; Jin Du; Qian Liu; Ling Zhang; Yanjun Zeng

doi:10.1515/bmt-2018-0027

Artikel

Robust and energy-efficient expression recognition based on improved deep ResNets

Yunhua Chen , Jin Du , Qian Liu , Ling Zhang und Yanjun Zeng

Veröffentlicht/Copyright: 26. Februar 2019

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Biomedical Engineering / Biomedizinische Technik Band 64 Heft 5

Abstract

To improve the robustness and to reduce the energy consumption of facial expression recognition, this study proposed a facial expression recognition method based on improved deep residual networks (ResNets). Residual learning has solved the degradation problem of deep Convolutional Neural Networks (CNNs); therefore, in theory, a ResNet can consist of infinite number of neural layers. On the one hand, ResNets benefit from better performance on artificial intelligence (AI) tasks, thanks to its deeper network structure; meanwhile, on the other hand, it faces a severe problem of energy consumption, especially on mobile devices. Hence, this study employs a novel activation function, the Noisy Softplus (NSP), to replace rectified linear units (ReLU) to get improved ResNets. NSP is a biologically plausible activation function, which was first proposed in training Spiking Neural Networks (SNNs); thus, NSP-trained models can be directly implemented on ultra-low-power neuromorphic hardware. We built an 18-layered ResNet using NSP to perform facial expression recognition across datasets Cohn-Kanade (CK+), Karolinska Directed Emotional Faces (KDEF) and GENKI-4K. The results achieved better anti-noise ability than ResNet using the activation function ReLU and showed low energy consumption running on neuromorphic hardware. This study not only contributes a solution for robust facial expression recognition, but also consolidates the low energy cost of their implementation on neuromorphic devices, which could pave the way for high-performance, noise-robust and energy-efficient vision applications on mobile hardware.

Keywords: Convolutional Neural Networks; deep residual networks; facial expression recognition; leaky integrate-and-fire (LIF) neurons; Noisy Softplus

Acknowledgments

The research leading to these results has received funding from the Natural Science Foundation of Guangdong Province, China (No: 2016A030313713, No: 2014A030310169), Production and Research Cooperation Special Project of Guangdong Province, China (No: 2014B090904080), and Science and Technology Projects of Guangdong Provincial Transportation Department, China (Science and Technology-2016-02-030).

Author Statement
Research funding: The Natural Science Foundation of Guangdong Province China.
Conflict of interest: Authors state no conflict of interest.
Informed consent: Informed consent is not applicable.
Ethical approval: The conducted research is not related to either human or animals use.

References

[1] Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Comput 2006;18:1527–54.10.1162/neco.2006.18.7.1527Suche in Google Scholar PubMed

[2] Liu M, Li S, Shan S, Wang R, Chen X. Deeply learning deformable facial action parts model for dynamic expression analysis. In Proc of the 12th Asian conference on computer vision, Singapore, Singapore. 2014:143–57.10.1007/978-3-319-16817-3_10Suche in Google Scholar

[3] Kahou SE, Pal C, Bouthillier X, Froumenty P, Memisevic R, Vincent P, et al. Combining modality specific deep neural networks for emotion recognition in video. In Proc of the 15th ACM on international conference on multimodal interaction, Sydney, Australia. 2013:543–50.10.1145/2522848.2531745Suche in Google Scholar

[4] Kim BK, Lee H, Roh J, Lee SY. Hierarchical committee of deep CNNs with exponentially-weighted decision fusion for static facial expression recognition. In Proc of the 2015 ACM on international conference on multimodal interaction, New York, USA. 2015:427–34.10.1145/2818346.2830590Suche in Google Scholar

[5] Sun S, Chen W, Wang L, Liu X, Liu T. On the depth of deep neural networks: a theoretical view. In Proc of the 30th international association for the advancement of artificial intelligence conference, Phoenix, USA. 2016:2066–2072.10.1609/aaai.v30i1.10243Suche in Google Scholar

[6] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint 2014 Sep 4. arXiv: 1409.1556.Suche in Google Scholar

[7] Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In Proc of the 2015 IEEE conference on computer vision and pattern recognition, Boston, USA. 2015:1–9.10.1109/CVPR.2015.7298594Suche in Google Scholar

[8] Srivastava RK, Greff K, Schmidhuber J. Highway networks. arXiv preprint 2015 May 3. arXiv: 1505.00387.Suche in Google Scholar

[9] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc of the 2016 IEEE conference on computer vision and pattern recognition, Las Vegas, USA. 2016:770–8.10.1109/CVPR.2016.90Suche in Google Scholar

[10] Deng J, Dong W, Socher R, Li L. ImageNet: a large-scale hierarchical image database. In Proc of the 2009 IEEE conference on computer vision and pattern recognition, Miami, USA. 2009:248–55.10.1109/CVPR.2009.5206848Suche in Google Scholar

[11] Izhikevich EM. Simple model of spiking neurons. IEEE Trans Neural Network 2003;14:1569–72.10.1109/TNN.2003.820440Suche in Google Scholar PubMed

[12] Hunsberger E, Eliasmith C. Spiking deep networks with LIF neurons. arXiv preprint 2015 Oct 29. arXiv:1510.08829.Suche in Google Scholar

[13] Paugam-Moisy H, Bohte S. Computing with spiking neuron networks. Handbook of natural computing. Berlin, Heidelberg: Springer, 2012:335–76.10.1007/978-3-540-92910-9_10Suche in Google Scholar

[14] Masquelier T, Thorpe SJ. Unsupervised learning of visual features through spike timing dependent plasticity. PLoS Comput Biol 2007;3:e31.10.1371/journal.pcbi.0030031Suche in Google Scholar PubMed PubMed Central

[15] Perez-Carrasco JA, Serrano C, Acha B, Serrano-Gotarredona T, Linares-Barranco B. Spike-based convolutional network for real-time processing. In Proc of the 20th international conference on pattern recognition, Istanbul, Turkey. 2010:3085–8.10.1109/ICPR.2010.756Suche in Google Scholar

[16] Liu Q, Furber S. Noisy Softplus: a biology inspired activation function. In Proc of the 23rd international conference on neural information processing, Kyoto, Japan. 2016:405–12.10.1007/978-3-319-46681-1_49Suche in Google Scholar

[17] LeCun Y, Cortes C, Burges CJ. MNIST handwritten digit database. AT&T Labs [Online]. Available from: http://yann.lecun.com/exdb/mnist.Suche in Google Scholar

[18] Cruz-Albrecht JM, Yung MW, Srinivasa N. Energy-efficient neuron, synapse and STDP integrated circuits. IEEE Trans Biomed Circuits Syst 2012;6:246–56.10.1109/TBCAS.2011.2174152Suche in Google Scholar PubMed

[19] Liu Q. Deep spiking neural networks. Manchester: The University of Manchester, 2018.Suche in Google Scholar

[20] Reece JB, Urry LA, Cain ML, Wasserman ST, Minorsky PV, Jackson RB, et al. Campbell biology. Boston, MA: Pearson, 2011.Suche in Google Scholar

[21] Hodgkin AL, Huxley AF. Action potentials recorded from inside a nerve fibre. Nature 1939;144:710–1.10.1038/144710a0Suche in Google Scholar

[22] Gerstner W, Kistler WM, Naud R, Paninski L. Neuronal dynamics: from single neurons to networks and models of cognition. Cambridge: Cambridge University Press, 2014.10.1017/CBO9781107447615Suche in Google Scholar

[23] Chen Y, Liu W, Zhang L, Yan M, Zeng YJ. Hybrid facial image feature extraction and recognition for non-invasive chronic fatigue syndrome diagnosis. Comput Biol Med 2015;64:30–9.10.1016/j.compbiomed.2015.06.005Suche in Google Scholar PubMed

[24] Shan C, Gong S, McOwan PW. Facial expression recognition based on local binary patterns: a comprehensive study. Image Vision Comput 2009;27:803–16.10.1016/j.imavis.2008.08.005Suche in Google Scholar

[25] Hu Y, Zeng Z, Yin L, Wei X. Multiview facial expression recognition. In Proc of the 8th IEEE international conference on automatic face and gesture recognition, Amsterdam, Netherlands. 2008:1–6.10.1109/AFGR.2008.4813445Suche in Google Scholar

[26] Tariq U, Lin KH, Li Z, Zhou X. Emotion recognition from an ensemble of features. In Proc of the 2011 IEEE international conference on automatic face and gesture recognition and workshops, Santa Barbara, CA, USA. 2011:872–7.10.1109/FG.2011.5771365Suche in Google Scholar

[27] Huang KC, Huang SY, Kuo YH. Emotion recognition based on a novel triangular facial feature extraction method. In Proc of the 2010 international joint conference on neural networks, Barcelona, Spain. 2010:1–6.10.1109/IJCNN.2010.5596374Suche in Google Scholar

[28] Valstar MF, Pantic M. Fully automatic recognition of the temporal phases of facial actions. IEEE Trans Syst Man Cybernet B: Cybernet 2012;42:28–43.10.1109/TSMCB.2011.2163710Suche in Google Scholar PubMed

[29] Valstar MF, Gunes H, Pantic M. How to distinguish posed from spontaneous smiles using geometric features. In Proc of the 9th international conference on multimodal interfaces, Nagoya, Japan. 2007:38–45.10.1145/1322192.1322202Suche in Google Scholar

[30] Liew CF, Yairi T. A comparison study of feature spaces and classification methods for facial expression recognition. In Proc of the 2013 IEEE international conference on robotics and biomimetics, Shenzhen, China. 2013:1294–9.10.1109/ROBIO.2013.6739643Suche in Google Scholar

[31] Santra B, Mukherjee DP. Local dominant binary patterns for recognition of multi-view facial expressions. In Proc of the 10th Indian conference on computer vision graphics and image processing, New York, NY, USA. 2016:25.10.1145/3009977.3010008Suche in Google Scholar

[32] Ruiz-Garcia A, Elshaw M, Altahhan A, Palade V. Stacked deep convolutional auto-encoders for emotion recognition from facial expressions. In Proc of the 2017 IEEE international joint conference on neural networks, Anchorage, AK, USA. 2017:1586–93.10.1109/IJCNN.2017.7966040Suche in Google Scholar

[33] Xiao J, Liu T, Zhang Y, Zou B, Lei J, Li Q. Multi-focus image fusion based on depth extraction with inhomogeneous diffusion equation. Signal Process 2016;25:171–86.10.1016/j.sigpro.2016.01.014Suche in Google Scholar

[34] Lucey P, Cohnn JF, Kanade T, Saragih J, Ambadar Z, Matthews I. A complete dataset for action unit and emotion-specified expression. In Proc of the 2010 IEEE computer society conference on computer vision and pattern recognition workshops, San Francisco, CA, USA. 2010:94–101.10.1109/CVPRW.2010.5543262Suche in Google Scholar

[35] Lundqvist D, Flykt A, Ohman A. The Karolinska Directed Emotional Faces – KDEF, CD ROM from department of clinical neuroscience, psychology section. Karolinska Institutet, 1998:91–630.10.1037/t27732-000Suche in Google Scholar

[36] Whitehill J, Movellan JR. A discriminative approach to frame-by-frame head pose tracking. In Proc of the 8th IEEE international conference on automatic face and gesture recognition, Amsterdam, Netherlands. 2008:1–7.10.1109/AFGR.2008.4813396Suche in Google Scholar

[37] Cui D, Huang GB, Liu T. Smile detection using pairwise distance vector and extreme learning machine. In Proc of the 2016 international joint conference on neural networks, Vancouver, Canada. 2016:2298–305.10.1109/IJCNN.2016.7727484Suche in Google Scholar

[38] Zhang L, Tjondronegoro D, Chandran V, Eggink J. Towards robust automatic affective classification of images using facial expressions for practical applications. Multimed Tools Appl 2016;75:4669–95.10.1007/s11042-015-2497-5Suche in Google Scholar

[39] Cao Y, Chen Y, Khosla D. Spiking deep convolutional neural networks for energy-efficient object recognition. Int J Comput Vision 2015;113:54–66.10.1007/s11263-014-0788-3Suche in Google Scholar

[40] Trimberger SM. Field-programmable gate array technology. Berlin: Springer Science & Business Media 2012.Suche in Google Scholar

Received: 2017-09-13

Accepted: 2018-09-21

Published Online: 2019-02-26

Published in Print: 2019-09-25

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/bmt-2018-0027

Schlagwörter für diesen Artikel

Convolutional Neural Networks; deep residual networks; facial expression recognition; leaky integrate-and-fire (LIF) neurons; Noisy Softplus