Startseite Adolescent age estimation using voice features
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Adolescent age estimation using voice features

  • Marcin D. Bugdol ORCID logo EMAIL logo , Monika N. Bugdol ORCID logo , Maria J. Bieńkowska , Anna Lipowicz , Agata M. Wijata und Andrzej W. Mitas
Veröffentlicht/Copyright: 14. Januar 2020
Veröffentlichen auch Sie bei De Gruyter Brill

Abstract

In this paper, a method for evaluating the chronological age of adolescents on the basis of their voice signal is presented. For every examined child, the vowels a, e, i, o and u were recorded in extended phonation. Sixty voice parameters were extracted from each recording. Voice recordings were supplemented with height measurement in order to check if it could improve the accuracy of the proposed solution. Predictor selection was performed using the LASSO (least absolute shrinkage and selection operator) algorithm. For age estimation, the random forest (RF) for regression method was employed and it was tested using a 10-fold cross-validation. The lowest absolute error (0.37 year ± 0.28) was obtained for boys only when all selected features were included into prediction. In all cases, the achieved accuracy was higher for boys than for girls, which results from the fact that the change of voice with age is larger for men than for women. The achieved results suggest that the presented approach can be employed for accurate age estimation during rapid development in children.

Acknowledgement

We would like to thank Bruce Turner for the English language corrections.

  1. Author Statement

  2. Research funding: Authors state no funding involved.

  3. Conflict of interest: Authors declare no conflict of interest.

  4. Informed consent: Informed consent is not applicable.

  5. Ethical approval: The conducted research is not related to either human or animal use.

References

[1] Russell M, Series RW, Wallace JL, Brown C, Skilling A. The STAR system: an interactive pronunciation tutor for young children. Comput Speech Lang 2000;14:161–75.10.1006/csla.2000.0139Suche in Google Scholar

[2] Kim HJ, Bae K, Yoon HS. Age and gender classification for a home-robot service. In: RO-MAN 2007 – The 16th IEEE International Symposium on Robot and Human Interactive Communication; 2007:122–6.10.1109/ROMAN.2007.4415065Suche in Google Scholar

[3] Bugdol MD, Bugdol MN, Lipowicz AM, Mitas AW, Bienkowska MJ, Wijata AM. Prediction of menarcheal status of girls using voice features. Comput Biol Med 2018;100:296–304.10.1016/j.compbiomed.2017.11.005Suche in Google Scholar PubMed

[4] Mirhassani SM, Zourmand A, Ting HN. Age estimation based on children’s voice: a Fuzzy-based decision fusion strategy. Sci World J 2014;2014:9.10.1155/2014/534064Suche in Google Scholar PubMed PubMed Central

[5] Muller C, Burkhardt F. Combining short-term cepstral and long-term pitch features for automatic recognition of speaker age. In: Interspeech 2007, 8th Annual Conference of the International Speech Communication Association, Antwerp, Belgium; 2007:2277–80.10.21437/Interspeech.2007-618Suche in Google Scholar

[6] Metze F, Ajmera J, Englert R, Bub U, Burkhardt F, Stegmann J, et al. Comparison of four approaches to age and gender recognition for telephone applications. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, Honolulu, HI, USA. vol. 4; 2007:IV1089–92. DOI: 10.1109/ICASSP.2007.367263.10.1109/ICASSP.2007.367263Suche in Google Scholar

[7] Mahmoodi D, Marvi H, Taghizadeh M, Soleimani A, Razzazi F, Mahmoodi M. Age estimation based on speech features and support vector machine. In: CEEC’11, 3rd Computer Science and Electronic Engineering Conference, Colchester, UK; 2011:60–4. DOI: 10.1109/CEEC.2011.5995826.10.1109/CEEC.2011.5995826Suche in Google Scholar

[8] Van Heerden C, Barnard E, Davel M, Van Der Walt C, Van Dyk E, Feld M, et al. Combining regression and classification methods for improving automatic speaker age recognition. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, Dallas, TX, USA; 2010:5174–7. DOI: 10.1109/ICASSP.2010.5495006.10.1109/ICASSP.2010.5495006Suche in Google Scholar

[9] Li M, Han KJ, Narayanan S. Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Comput Speech Lang 2013;27:151–67.10.1016/j.csl.2012.01.008Suche in Google Scholar

[10] Barkana BD, Zhou J. A new pitch-range based feature set for a speaker’s age and gender classification. Appl Acoust 2015;98:52–61.10.1016/j.apacoust.2015.04.013Suche in Google Scholar

[11] Iseli M, Shue YL, Alwan A. AGE- and gender-dependent analysis of voice source characteristics. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, Toulouse, France. vol. 1; 2006:I389–92. DOI: 10.1109/ICASSP.2006.1660039.10.1109/ICASSP.2006.1660039Suche in Google Scholar

[12] Bocklet T, Maier A, Bauer JG, Burkhardt F, Nöth E. Age and gender recognition for telephone applications based on GMM supervectors and support vector machines. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, Las Vegas, NV, USA; 2008:1605–8. DOI: 10.1109/ICASSP.2008.4517932.10.1109/ICASSP.2008.4517932Suche in Google Scholar

[13] Dobry G, Hecht RM, Avigal M, Zigel Y. Supervector dimension reduction for efficient speaker age estimation based on the acoustic speech signal. IEEE T Acoust Speech 2011;19:1975–85.10.1109/TASL.2011.2104955Suche in Google Scholar

[14] Meinedo H, Trancoso I. Age and gender classification using fusion of acoustic and prosodic features. In: Interspeech 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan; 2010:2818–21.10.21437/Interspeech.2010-745Suche in Google Scholar

[15] Minematsu N, Sekiguchi M, Hirose K. Automatic estimation of one’s age with his/her speech based upon acoustic modeling techniques of speakers. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, Orlando, FL, USA. vol. 1; 2002:I/137–40. DOI: 10.1109/ICASSP.2002.5743673.10.1109/ICASSP.2002.5743673Suche in Google Scholar

[16] Boersma P, Weenink D. Praat: doing phonetics by computer [Computer program] Version 6.0.05; 2015. Available from: http://www.praat.org.Suche in Google Scholar

[17] Datta AK, Singh SS, Ranjan S, Soubhik C, Kartik M, Anirban P. Signal Analysis of Hindustani Classical Music. Singapore: Springer; 2017.10.1007/978-981-10-3959-1Suche in Google Scholar

[18] Shmilovitz D. On the definition of total harmonic distortion and its effect on measurement interpretation. IEEE T Power Deliver 2005;20:526–8.10.1109/TPWRD.2004.839744Suche in Google Scholar

[19] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. Springer Series in Statistics. New York, NY, USA: Springer New York Inc.; 2001.10.1007/978-0-387-21606-5Suche in Google Scholar

[20] Breiman L. Random forests. Mach Learn 2001;45:5–32.10.1023/A:1010933404324Suche in Google Scholar

Received: 2018-05-19
Accepted: 2019-11-11
Published Online: 2020-01-14
Published in Print: 2020-08-27

©2020 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 23.9.2025 von https://www.degruyterbrill.com/document/doi/10.1515/bmt-2018-0082/html
Button zum nach oben scrollen