Abstract
The aim of this study was to develop a gesture-driven facial model with speech synthesis capability. A two-dimensional facial Model was developed and animated based on the Facial Action Coding System. Such emotions as “happy”, “sad”, “anger”, and “fear” were simulated and visualized through the combination of eight action units. A speech synthesizer for the Tamil language was built using a syllable-based concatenation approach. The results indicated that the synthetic speech had an average accuracy rate ranging from 85% to 90% as natural as the human speech. Moreover, 75%–85% of the words were articulated well and identified by the children correctly. The ultimate goal of the system is to assist children with vocal and hearing disabilities in their language learning process.
Acknowledgments
This research work was financially supported by the Dr. Mahalingam College of Engineering and Technology, Pollachi, South India. We also wish to thank the members of the college management.
Conflict of interest statement
Authors’ conflict of interest disclosure: The authors stated that there are no conflicts of interest regarding the publication of this article. Research support played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.
Research funding: None declared.
Employment or leadership: None declared.
Honorarium: None declared.
References
1. Schroder M. Emotional speech synthesis – a review. Proceedings of EUROSPEECH 2001;1:561–64.10.21437/Eurospeech.2001-150Suche in Google Scholar
2. Jia J, Zhang S, Meng F, Wang Y, Cai L. Emotional audio-visual speech synthesis based on PAD. IEEE Audio, Speech, Language Process 2011;19:570–82.10.1109/TASL.2010.2052246Suche in Google Scholar
3. Sumby WH, Pollack I. Visual contribution to speech intelligibility in noise. J Acoust Soc Am 1954;26:212–15.10.1121/1.1907309Suche in Google Scholar
4. Ouni S, Cohen M, Ishak H, Massaro D. Visual contribution to speech perception: measuring the intelligibility of animated talking heads. Eurasip Journal of Audio Speech Music Process 2007;2007:3.10.1155/2007/47891Suche in Google Scholar
5. Goyal UK, Kapoor A, Kalra P. Text-to-audiovisual speech synthesizer. Proceedings of the Second International Conference on Virtual Worlds 2000;1834:256–69.10.1007/3-540-45016-5_24Suche in Google Scholar
6. Busso C, Narayanan SS. Interrelation between speech and facial gestures in emotional utterances: a single subject study. IEEE Audio, Speech, Language Process 2007;15:2331–47.10.1109/TASL.2007.905145Suche in Google Scholar
7. Drahos P. Photo-realistic head model for real-time animation. Information Sciences and Technologies Bulletin of the ACM Slovakia 2011;3:12–22.Suche in Google Scholar
8. Noh J. A survey of facial modeling and animation techniques. Proceedings of ACM SIGGRAPH 2001:1–5.Suche in Google Scholar
9. Chung SK. Facial animation: a survey based on artistic expression control. National Taiwan University of Arts 2008:131–62.Suche in Google Scholar
10. Hossain MS, Akbar M, Starkey JD. Inexpensive construction of a 3D face model from stereo images. 10th International Conference on Computer and Information Technology 2007:1–6.10.1109/ICCITECHN.2007.4579387Suche in Google Scholar
11. Patel NM, Zaveri M. 3D facial model construction and expressions synthesis using a single frontal face image. Int J Graphics 2010;1:34–40.Suche in Google Scholar
12. Kodandaramaiah GN, Manjunatha MB, KJilani SA, Giriprasad MN, Kulkarni RB, Mukunda Rao M. Use of lip synchronization by hearing impaired using digital image processing for enhanced perception of speech. 2nd International Conference on Computer, Control and Communication 2009:1–7.10.1109/IC4.2009.4909175Suche in Google Scholar
13. Blanz V, Vetter T. A morphable model for the synthesis of 3D faces. Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques 1999:187–94.10.1145/311535.311556Suche in Google Scholar
14. Patel NM, Zaveri M. Parametric facial expression synthesis and animation. Int J Comput Appl 2010;3:34–40.Suche in Google Scholar
15. Sifakis E, Neverov I, Fedkiw R. Automatic determination of facial muscle activations from sparse motion capture marker data. Proceedings of ACM SIGGRAPH 2005;24:417–25.10.1145/1073204.1073208Suche in Google Scholar
16. Beskow J, McGlashan S. Olga – a conversational agent with gestures. Proceedings of IJCAI 1997:39–44.Suche in Google Scholar
17. Theune M, Meijs K, Heylen D, Ordelman R. Generating expressive speech for storytelling applications. IEEE Audio, Speech, Language Process 2006:14.10.1109/TASL.2006.876129Suche in Google Scholar
18. Wik P, Hjalmarsson A. Embodied conversational agents in computer assisted language learning. ACM J Speech Commun 2009;51:1024–37.10.1016/j.specom.2009.05.006Suche in Google Scholar
19. Styger T, Keller E. Formant synthesis. In: Fundamentals of Speech Synthesis and Speech Recognition: Basic Concepts, state-of-the-art and future challenges, 1994:109–28.Suche in Google Scholar
20. Palo P. A review of articulatory speech synthesis. Thesis submitted at Helsinki University of Technology, 2006.Suche in Google Scholar
21. Shirbahadurkar SD, Bormane DS, Kazi RL. Subjective and spectrogram analysis of speech synthesizer for Marathi TTS using Concatenative synthesis. Recent Trends in Information, Telecommunication and Computing (ITC) 2010:262–64.10.1109/ITC.2010.76Suche in Google Scholar
22. Utama RJ, Syrdal AK, Conkie A. Six approaches to limited domain concatenative speech synthesis. In Proceedings of ICSLP 2006:2058–61.10.21437/Interspeech.2006-404Suche in Google Scholar
23. Visagie A, Du Preez JA. Sinusoidal modeling in speech synthesis, a survey. Proceedings of the 12th Annual Symposium of the Pattern Recognition Association of South Africa (PRASA)Franschhoek 2001:138–42.Suche in Google Scholar
24. Taylor P. Unifying unit selection and hidden Markov model speech synthesis. In Proceedings of ICSLP 2006:1758–61.10.21437/Interspeech.2006-487Suche in Google Scholar
25. Thomas S. Natural sounding text-to-speech synthesis based on syllable-like units. Thesis, Indian Institute of Technology, 2007.Suche in Google Scholar
26. Sangeetha J, Jothilakshmi S, Sindhuja S, Ramalingam. V. Text to speech synthesis system for Tamil. Int J Emerging Tech Adv En 2013;3:170–75.Suche in Google Scholar
27. Fischer R. Automatic facial expression analysis and emotional classification. Thesis, Massachusetts Institute of Technology, 2004.Suche in Google Scholar
28. Rathinavelu A, Radhika R. Animated articulator talking head with gestures for words. Int J Adv Comput 2012; 35:31–34.Suche in Google Scholar
29. Rathinavelu A, Karthikeyan M. Computer aided visualization model for speech perception and intelligibility. Int J Comput Commun Tech 2012;3:47–50.Suche in Google Scholar
©2015 by De Gruyter
Artikel in diesem Heft
- Frontmatter
- Review
- Confucian thoughts on special education in China: key thoughts and impact
- Original Articles
- Vision of Israeli physical therapy national directors on the future development of the profession
- Computer-aided animated gesture-driven facial model with speech synthesis
- Restless Legs Syndrome in adolescent school children in Belgaum city: a cross-sectional study
- A one-year retrospective review of vitamin D level, bone profile, liver function tests and body mass index in children with cystic fibrosis in a children’s university hospital
- Analysis of the positioning of the head, trunk, and upper limbs during gait in children with visual impairment
- Cultural translation of a domestic violence intervention for small children: key policy and practice directions
- Sense of coherence and WHOQoL among parents of children with ASD in Malaysia
- Phonetic environment of disfluencies in children with stuttering
- Parental perception of developmental vulnerability after inter-country adoption: a 10-year follow-up study: longitudinal study after inter-country adoption
- Quantitative analysis of static equilibrium in women after mastectomy
- The relationship between sports teachers’ reports, motor performance and perceived self-efficacy of children with developmental coordination disorders
- Case Report
- Ectrodactyly ectodermal dysplasia cleft lip/palate (EEC) syndrome in a family
Artikel in diesem Heft
- Frontmatter
- Review
- Confucian thoughts on special education in China: key thoughts and impact
- Original Articles
- Vision of Israeli physical therapy national directors on the future development of the profession
- Computer-aided animated gesture-driven facial model with speech synthesis
- Restless Legs Syndrome in adolescent school children in Belgaum city: a cross-sectional study
- A one-year retrospective review of vitamin D level, bone profile, liver function tests and body mass index in children with cystic fibrosis in a children’s university hospital
- Analysis of the positioning of the head, trunk, and upper limbs during gait in children with visual impairment
- Cultural translation of a domestic violence intervention for small children: key policy and practice directions
- Sense of coherence and WHOQoL among parents of children with ASD in Malaysia
- Phonetic environment of disfluencies in children with stuttering
- Parental perception of developmental vulnerability after inter-country adoption: a 10-year follow-up study: longitudinal study after inter-country adoption
- Quantitative analysis of static equilibrium in women after mastectomy
- The relationship between sports teachers’ reports, motor performance and perceived self-efficacy of children with developmental coordination disorders
- Case Report
- Ectrodactyly ectodermal dysplasia cleft lip/palate (EEC) syndrome in a family