Startseite Medizin Speaker identification based on artificial neural networks. Case study: the Polish vowel (pilot study)
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Speaker identification based on artificial neural networks. Case study: the Polish vowel (pilot study)

  • Kinga Salapa EMAIL logo , Agata Trawińska , Irena Roterman und Ryszard Tadeusiewicz
Veröffentlicht/Copyright: 24. Mai 2014
Veröffentlichen auch Sie bei De Gruyter Brill

Abstract

Current statistical methods and technologies used for speaker identification via dynamic formant frequency often involve classic multivariate analyses that must meet a number of criteria in order to be considered trustworthy. The authors propose more advanced classification techniques, including artificial neural networks. Owing to iterative learning algorithms, neural networks can be trained to detect highly complex, nonlinear relations hidden in input data. This study specifically considers feed-forward multilayer perceptron and radial basic function network models. The investigation involves an analysis of the Polish vowel (stressed or unstressed) in selected contexts described by the four lowest formant frequencies. Results indicate high accuracy of neural networks as a speaker identification tool reaching up to 100%. In addition, the authors have determined that the accuracy of classification is similar when based on a single context to when input data are aggregated over several different contexts.


Corresponding author: Kinga Salapa, Department of Bioinformatics and Telemedicine, Jagiellonian University Medical College, Lazarza 16, Cracow 31-530, Poland, E-mail:

Acknowledgments

This work is supported by the scientific grant of the Jagiellonian University Medical College (K/ZDS/003959). Special thanks to colleagues from the Institute of Forensic Research for sharing voice recordings.

Conflict of interest statement

Authors’ conflict of interest disclosure: The authors stated that there are no conflicts of interest regarding the publication of this article. Research support played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.

Research funding: None declared.

Employment or leadership: None declared.

Honorarium: None declared.

References

1. Remez RE, Fellowes JM, Rubin PE. Talker identification based on phonetic information. J Exp Psychol Human 1997;23:651–66.10.1037/0096-1523.23.3.651Suche in Google Scholar

2. McDougall K. Dynamic features of speech and the characterization of speakers: towards a new approach using formant frequencies. Int J Speech Lang Law 2006;13:89–126.10.1558/sll.2006.13.1.89Suche in Google Scholar

3. Jessen M. Forensic phonetics. Lang Linguist Compass 2008;2:671–711.10.1111/j.1749-818X.2008.00066.xSuche in Google Scholar

4. Rose P. Forensic speaker identification, 1st ed. New York: Taylor & Francis Forensic Science Series, 2002.10.1201/9780203166369Suche in Google Scholar

5. Nolan F, Grigoras C. A case for formant analysis in forensic speaker identification. Int J Speech Lang Law 2005;2:143–73.10.1558/sll.2005.12.2.143Suche in Google Scholar

6. Tiwari M, Tiwari M. Voice – how humans communicate? J Nat Sci Biol Med 2012;3:3–11.10.4103/0976-9668.95933Suche in Google Scholar PubMed PubMed Central

7. Bacharowski JA, Owren MJ. Acoustic correlates of talker sex and individual talker identity are present in a short vowel segment produced in running speech. J Acoust Soc Am 1999;106:1054–63.10.1121/1.427115Suche in Google Scholar PubMed

8. Bosch JC. Acoustic study of the vowel formant frequencies and F0: a contribution to Catalan forensic phonetics. In: Proceedings of the 15th International Congress of Phonetic Sciences. Barcelona, Spain: Universitat Autònoma de Barcelona, 2003:687–90.Suche in Google Scholar

9. Jassem W, Grygiel W. Off-line classification of Polish vowel spectra using artificial neural networks. J Int Phon Assoc 2004;34:37–52.10.1017/S0025100304001537Suche in Google Scholar

10. McDougall K. Speaker-specific formant dynamics: an experiment on Australian English /aI/. Int J Speech Lang Law 2004;11:103–30.10.1558/sll.2004.11.1.103Suche in Google Scholar

11. McDougall K, Nolan F. Discrimination of speakers using the formant dynamics of /u:/ in British English. In: Proceedings of the 16th International Congress of Phonetic Sciences. Saarbrücken, Germany: Universität des Saarlandes, 2007:1825–8.Suche in Google Scholar

12. Hedbávná B. An acoustical study of long domain /r/ and /l/ coarticulation. In: Proceedings of the 15th Conference of Phonetic Sciences. Barcelona, Spain: Universitat Autònoma de Barcelona, 2003:679–82.Suche in Google Scholar

13. Jassem W. Formant of the Polish vowels as phonemic and speaker-related cues. Report on a discriminant analysis. Speech Lang Technol 1999;3:191–216.Suche in Google Scholar

14. McLachlan G. Discriminant analysis and statistical pattern recognition, 2nd ed. Hoboken, NJ: Wiley Series in Probability and Statistics, 2004.Suche in Google Scholar

15. Huberty CJ, Olejnik S. Applied MANOVA and discriminant analysis, 2nd ed. Hoboken, NJ: Wiley Series in Probability and Statistics, 2006.10.1002/047178947XSuche in Google Scholar

16. Tabachnik BG, Fidell LS. Using multivariate statistics, 6th ed. Cambridge: Pearson, 2012.Suche in Google Scholar

17. Salapa K, Trawinska A, Roterman I. Forensic speaker identification models based on artificial neural networks. Case study: Polish vowel e. In: Annual Conference of the International Association for Forensic Phonetics and Acoustics (IAFPA), University of South Florida, Tampa, FL, USA, 2013.Suche in Google Scholar

18. Salapa K, Trawinska A, Roterman I. Applying data mining classification techniques to speaker identification. In: Proceedings of XIX National Conference on Application of Mathematics in Biology and Medicine, University of Gdansk, 2013:84–9.Suche in Google Scholar

19. Salapa K, Trawinska A, Roterman I. Forensic voice comparison by means of artificial neural networks. Bio-Algorithms Med-Syst 213;9:191–7.10.1515/bams-2013-0153Suche in Google Scholar

20. Jassem W. Illustrations of the IPA: Polish. J Int Phon Assoc 2003;33:103–7.10.1017/S0025100303001191Suche in Google Scholar

21. Trawinska A, Klus A. Forensic speaker identification by the linguistic-acoustic method in KEU and IES. Prob Forensic Sci 2009;LXXVIII:160–74.Suche in Google Scholar

22. Austrian Academy of Science, Acoustics Research Institute, S_TOOLS_STx – intelligent sound processing. Available at: https://www.kfs.oeaw.ac.at. Accessed: 1 Jun 2010.Suche in Google Scholar

23. Bishop CM. Pattern recognition and machine learning, 1st ed. New York: Springer, 2006.Suche in Google Scholar

24. Basu JK, Bhattacharyya D, Kim T. Use of artificial neural network in pattern recognition. Int J Software Eng Appl 2010;4:23–34.Suche in Google Scholar

25. Du H. Data mining techniques and applications: an introduction, 1st ed. Hampshire, UK: Cengage Learning, 2010.Suche in Google Scholar

26. Tufféry S. Data mining and statistics for decision making, 1st ed. Hoboken, NJ: Wiley Series in Computational Statistics, 2011.10.1002/9780470979174Suche in Google Scholar

27. Maimon O, Rokach L. The data mining and knowledge discovery handbook, 1st ed. New York: Springer, 2005.10.1007/b107408Suche in Google Scholar

28. Tadeusiewicz R, Korbicz J, Rutkowski L, Duch W, editors. Inżynieria biomedyczna. Podstawy i zastosowania. Tom 9 Sieci neuronowe w inżynierii biomedycznej. Warszawa: Akademicka Oficyna Wydawnicza EXIT 2013 in Polish].Suche in Google Scholar

29. StatSoft. Neural network architectures. Available at: http://documentation.statsoft.com, Accessed: 3 May 2013.Suche in Google Scholar

30. Tadeusiewicz R, Duch W, Korbicz J, editors. Biocybernetyka i inżynieria biomedyczna, Tom 6 Sieci neuronowe, 1st ed. Warszawa: Akademicka Oficyna Wydawnicza EXIT, 2000 in Polish].Suche in Google Scholar

31. Zar JH. Biostatistical analysis, 5th ed. Cambridge: Pearson, 2011.Suche in Google Scholar

32. Zhang C, Morrison GS, Enzinger E, Ochoa F. Effects of telephone transmission on the performance of formant-trajectory-based forensic voice comparison – female voices. Speech Commun 2013;55:796–813.10.1016/j.specom.2013.01.011Suche in Google Scholar

33. Champod Ch, Meuwly D. The inference of identity in forensic speaker recognition. Speech Commun 2003;31:193–203.Suche in Google Scholar

Received: 2014-3-21
Accepted: 2014-4-24
Published Online: 2014-5-24
Published in Print: 2014-6-30

©2014 by Walter de Gruyter Berlin/Boston

Heruntergeladen am 7.12.2025 von https://www.degruyterbrill.com/document/doi/10.1515/bams-2014-0006/pdf
Button zum nach oben scrollen