Abstract
Current statistical methods and technologies used for speaker identification via dynamic formant frequency often involve classic multivariate analyses that must meet a number of criteria in order to be considered trustworthy. The authors propose more advanced classification techniques, including artificial neural networks. Owing to iterative learning algorithms, neural networks can be trained to detect highly complex, nonlinear relations hidden in input data. This study specifically considers feed-forward multilayer perceptron and radial basic function network models. The investigation involves an analysis of the Polish vowel (stressed or unstressed) in selected contexts described by the four lowest formant frequencies. Results indicate high accuracy of neural networks as a speaker identification tool reaching up to 100%. In addition, the authors have determined that the accuracy of classification is similar when based on a single context to when input data are aggregated over several different contexts.
Acknowledgments
This work is supported by the scientific grant of the Jagiellonian University Medical College (K/ZDS/003959). Special thanks to colleagues from the Institute of Forensic Research for sharing voice recordings.
Conflict of interest statement
Authors’ conflict of interest disclosure: The authors stated that there are no conflicts of interest regarding the publication of this article. Research support played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.
Research funding: None declared.
Employment or leadership: None declared.
Honorarium: None declared.
References
1. Remez RE, Fellowes JM, Rubin PE. Talker identification based on phonetic information. J Exp Psychol Human 1997;23:651–66.10.1037/0096-1523.23.3.651Suche in Google Scholar
2. McDougall K. Dynamic features of speech and the characterization of speakers: towards a new approach using formant frequencies. Int J Speech Lang Law 2006;13:89–126.10.1558/sll.2006.13.1.89Suche in Google Scholar
3. Jessen M. Forensic phonetics. Lang Linguist Compass 2008;2:671–711.10.1111/j.1749-818X.2008.00066.xSuche in Google Scholar
4. Rose P. Forensic speaker identification, 1st ed. New York: Taylor & Francis Forensic Science Series, 2002.10.1201/9780203166369Suche in Google Scholar
5. Nolan F, Grigoras C. A case for formant analysis in forensic speaker identification. Int J Speech Lang Law 2005;2:143–73.10.1558/sll.2005.12.2.143Suche in Google Scholar
6. Tiwari M, Tiwari M. Voice – how humans communicate? J Nat Sci Biol Med 2012;3:3–11.10.4103/0976-9668.95933Suche in Google Scholar PubMed PubMed Central
7. Bacharowski JA, Owren MJ. Acoustic correlates of talker sex and individual talker identity are present in a short vowel segment produced in running speech. J Acoust Soc Am 1999;106:1054–63.10.1121/1.427115Suche in Google Scholar PubMed
8. Bosch JC. Acoustic study of the vowel formant frequencies and F0: a contribution to Catalan forensic phonetics. In: Proceedings of the 15th International Congress of Phonetic Sciences. Barcelona, Spain: Universitat Autònoma de Barcelona, 2003:687–90.Suche in Google Scholar
9. Jassem W, Grygiel W. Off-line classification of Polish vowel spectra using artificial neural networks. J Int Phon Assoc 2004;34:37–52.10.1017/S0025100304001537Suche in Google Scholar
10. McDougall K. Speaker-specific formant dynamics: an experiment on Australian English /aI/. Int J Speech Lang Law 2004;11:103–30.10.1558/sll.2004.11.1.103Suche in Google Scholar
11. McDougall K, Nolan F. Discrimination of speakers using the formant dynamics of /u:/ in British English. In: Proceedings of the 16th International Congress of Phonetic Sciences. Saarbrücken, Germany: Universität des Saarlandes, 2007:1825–8.Suche in Google Scholar
12. Hedbávná B. An acoustical study of long domain /r/ and /l/ coarticulation. In: Proceedings of the 15th Conference of Phonetic Sciences. Barcelona, Spain: Universitat Autònoma de Barcelona, 2003:679–82.Suche in Google Scholar
13. Jassem W. Formant of the Polish vowels as phonemic and speaker-related cues. Report on a discriminant analysis. Speech Lang Technol 1999;3:191–216.Suche in Google Scholar
14. McLachlan G. Discriminant analysis and statistical pattern recognition, 2nd ed. Hoboken, NJ: Wiley Series in Probability and Statistics, 2004.Suche in Google Scholar
15. Huberty CJ, Olejnik S. Applied MANOVA and discriminant analysis, 2nd ed. Hoboken, NJ: Wiley Series in Probability and Statistics, 2006.10.1002/047178947XSuche in Google Scholar
16. Tabachnik BG, Fidell LS. Using multivariate statistics, 6th ed. Cambridge: Pearson, 2012.Suche in Google Scholar
17. Salapa K, Trawinska A, Roterman I. Forensic speaker identification models based on artificial neural networks. Case study: Polish vowel e. In: Annual Conference of the International Association for Forensic Phonetics and Acoustics (IAFPA), University of South Florida, Tampa, FL, USA, 2013.Suche in Google Scholar
18. Salapa K, Trawinska A, Roterman I. Applying data mining classification techniques to speaker identification. In: Proceedings of XIX National Conference on Application of Mathematics in Biology and Medicine, University of Gdansk, 2013:84–9.Suche in Google Scholar
19. Salapa K, Trawinska A, Roterman I. Forensic voice comparison by means of artificial neural networks. Bio-Algorithms Med-Syst 213;9:191–7.10.1515/bams-2013-0153Suche in Google Scholar
20. Jassem W. Illustrations of the IPA: Polish. J Int Phon Assoc 2003;33:103–7.10.1017/S0025100303001191Suche in Google Scholar
21. Trawinska A, Klus A. Forensic speaker identification by the linguistic-acoustic method in KEU and IES. Prob Forensic Sci 2009;LXXVIII:160–74.Suche in Google Scholar
22. Austrian Academy of Science, Acoustics Research Institute, S_TOOLS_STx – intelligent sound processing. Available at: https://www.kfs.oeaw.ac.at. Accessed: 1 Jun 2010.Suche in Google Scholar
23. Bishop CM. Pattern recognition and machine learning, 1st ed. New York: Springer, 2006.Suche in Google Scholar
24. Basu JK, Bhattacharyya D, Kim T. Use of artificial neural network in pattern recognition. Int J Software Eng Appl 2010;4:23–34.Suche in Google Scholar
25. Du H. Data mining techniques and applications: an introduction, 1st ed. Hampshire, UK: Cengage Learning, 2010.Suche in Google Scholar
26. Tufféry S. Data mining and statistics for decision making, 1st ed. Hoboken, NJ: Wiley Series in Computational Statistics, 2011.10.1002/9780470979174Suche in Google Scholar
27. Maimon O, Rokach L. The data mining and knowledge discovery handbook, 1st ed. New York: Springer, 2005.10.1007/b107408Suche in Google Scholar
28. Tadeusiewicz R, Korbicz J, Rutkowski L, Duch W, editors. Inżynieria biomedyczna. Podstawy i zastosowania. Tom 9 Sieci neuronowe w inżynierii biomedycznej. Warszawa: Akademicka Oficyna Wydawnicza EXIT 2013 in Polish].Suche in Google Scholar
29. StatSoft. Neural network architectures. Available at: http://documentation.statsoft.com, Accessed: 3 May 2013.Suche in Google Scholar
30. Tadeusiewicz R, Duch W, Korbicz J, editors. Biocybernetyka i inżynieria biomedyczna, Tom 6 Sieci neuronowe, 1st ed. Warszawa: Akademicka Oficyna Wydawnicza EXIT, 2000 in Polish].Suche in Google Scholar
31. Zar JH. Biostatistical analysis, 5th ed. Cambridge: Pearson, 2011.Suche in Google Scholar
32. Zhang C, Morrison GS, Enzinger E, Ochoa F. Effects of telephone transmission on the performance of formant-trajectory-based forensic voice comparison – female voices. Speech Commun 2013;55:796–813.10.1016/j.specom.2013.01.011Suche in Google Scholar
33. Champod Ch, Meuwly D. The inference of identity in forensic speaker recognition. Speech Commun 2003;31:193–203.Suche in Google Scholar
©2014 by Walter de Gruyter Berlin/Boston
Artikel in diesem Heft
- Frontmatter
- Mini Review
- Computing support for advanced medical data analysis and imaging
- Computer Science Towards Medicines
- Application of WLS strips for position determination in strip PET tomograph based on plastic scintillators
- Depth of interaction determination with temperature gradient function in continuous bismuth germanate oxide (BGO) crystal
- Simulations of γ quanta scattering in a single module of the J-PET detector
- Database and data structure for the novel TOF-PET detector developed for the J-PET project
- Determination of the map of efficiency of the Jagiellonian Positron Emission Tomograph (J-PET) detector with the GATE package
- Speaker identification based on artificial neural networks. Case study: the Polish vowel (pilot study)
Artikel in diesem Heft
- Frontmatter
- Mini Review
- Computing support for advanced medical data analysis and imaging
- Computer Science Towards Medicines
- Application of WLS strips for position determination in strip PET tomograph based on plastic scintillators
- Depth of interaction determination with temperature gradient function in continuous bismuth germanate oxide (BGO) crystal
- Simulations of γ quanta scattering in a single module of the J-PET detector
- Database and data structure for the novel TOF-PET detector developed for the J-PET project
- Determination of the map of efficiency of the Jagiellonian Positron Emission Tomograph (J-PET) detector with the GATE package
- Speaker identification based on artificial neural networks. Case study: the Polish vowel (pilot study)