Adolescent age estimation using voice features

Marcin D. Bugdol; Monika N. Bugdol; Maria J. Bieńkowska; Anna Lipowicz; Agata M. Wijata; Andrzej W. Mitas

doi:10.1515/bmt-2018-0082

Article

Adolescent age estimation using voice features

Marcin D. Bugdol , Monika N. Bugdol , Maria J. Bieńkowska , Anna Lipowicz , Agata M. Wijata and Andrzej W. Mitas

Published/Copyright: January 14, 2020

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Biomedical Engineering / Biomedizinische Technik Volume 65 Issue 4

Abstract

In this paper, a method for evaluating the chronological age of adolescents on the basis of their voice signal is presented. For every examined child, the vowels a, e, i, o and u were recorded in extended phonation. Sixty voice parameters were extracted from each recording. Voice recordings were supplemented with height measurement in order to check if it could improve the accuracy of the proposed solution. Predictor selection was performed using the LASSO (least absolute shrinkage and selection operator) algorithm. For age estimation, the random forest (RF) for regression method was employed and it was tested using a 10-fold cross-validation. The lowest absolute error (0.37 year ± 0.28) was obtained for boys only when all selected features were included into prediction. In all cases, the achieved accuracy was higher for boys than for girls, which results from the fact that the change of voice with age is larger for men than for women. The achieved results suggest that the presented approach can be employed for accurate age estimation during rapid development in children.

Keywords: age estimation; random forest; voice analysis

Acknowledgement

We would like to thank Bruce Turner for the English language corrections.

Author Statement
Research funding: Authors state no funding involved.
Conflict of interest: Authors declare no conflict of interest.
Informed consent: Informed consent is not applicable.
Ethical approval: The conducted research is not related to either human or animal use.

References

[1] Russell M, Series RW, Wallace JL, Brown C, Skilling A. The STAR system: an interactive pronunciation tutor for young children. Comput Speech Lang 2000;14:161–75.10.1006/csla.2000.0139Search in Google Scholar

[2] Kim HJ, Bae K, Yoon HS. Age and gender classification for a home-robot service. In: RO-MAN 2007 – The 16th IEEE International Symposium on Robot and Human Interactive Communication; 2007:122–6.10.1109/ROMAN.2007.4415065Search in Google Scholar

[3] Bugdol MD, Bugdol MN, Lipowicz AM, Mitas AW, Bienkowska MJ, Wijata AM. Prediction of menarcheal status of girls using voice features. Comput Biol Med 2018;100:296–304.10.1016/j.compbiomed.2017.11.005Search in Google Scholar PubMed

[4] Mirhassani SM, Zourmand A, Ting HN. Age estimation based on children’s voice: a Fuzzy-based decision fusion strategy. Sci World J 2014;2014:9.10.1155/2014/534064Search in Google Scholar PubMed PubMed Central

[5] Muller C, Burkhardt F. Combining short-term cepstral and long-term pitch features for automatic recognition of speaker age. In: Interspeech 2007, 8th Annual Conference of the International Speech Communication Association, Antwerp, Belgium; 2007:2277–80.10.21437/Interspeech.2007-618Search in Google Scholar

[6] Metze F, Ajmera J, Englert R, Bub U, Burkhardt F, Stegmann J, et al. Comparison of four approaches to age and gender recognition for telephone applications. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, Honolulu, HI, USA. vol. 4; 2007:IV1089–92. DOI: 10.1109/ICASSP.2007.367263.10.1109/ICASSP.2007.367263Search in Google Scholar

[7] Mahmoodi D, Marvi H, Taghizadeh M, Soleimani A, Razzazi F, Mahmoodi M. Age estimation based on speech features and support vector machine. In: CEEC’11, 3rd Computer Science and Electronic Engineering Conference, Colchester, UK; 2011:60–4. DOI: 10.1109/CEEC.2011.5995826.10.1109/CEEC.2011.5995826Search in Google Scholar

[8] Van Heerden C, Barnard E, Davel M, Van Der Walt C, Van Dyk E, Feld M, et al. Combining regression and classification methods for improving automatic speaker age recognition. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, Dallas, TX, USA; 2010:5174–7. DOI: 10.1109/ICASSP.2010.5495006.10.1109/ICASSP.2010.5495006Search in Google Scholar

[9] Li M, Han KJ, Narayanan S. Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Comput Speech Lang 2013;27:151–67.10.1016/j.csl.2012.01.008Search in Google Scholar

[10] Barkana BD, Zhou J. A new pitch-range based feature set for a speaker’s age and gender classification. Appl Acoust 2015;98:52–61.10.1016/j.apacoust.2015.04.013Search in Google Scholar

[11] Iseli M, Shue YL, Alwan A. AGE- and gender-dependent analysis of voice source characteristics. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, Toulouse, France. vol. 1; 2006:I389–92. DOI: 10.1109/ICASSP.2006.1660039.10.1109/ICASSP.2006.1660039Search in Google Scholar

[12] Bocklet T, Maier A, Bauer JG, Burkhardt F, Nöth E. Age and gender recognition for telephone applications based on GMM supervectors and support vector machines. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, Las Vegas, NV, USA; 2008:1605–8. DOI: 10.1109/ICASSP.2008.4517932.10.1109/ICASSP.2008.4517932Search in Google Scholar

[13] Dobry G, Hecht RM, Avigal M, Zigel Y. Supervector dimension reduction for efficient speaker age estimation based on the acoustic speech signal. IEEE T Acoust Speech 2011;19:1975–85.10.1109/TASL.2011.2104955Search in Google Scholar

[14] Meinedo H, Trancoso I. Age and gender classification using fusion of acoustic and prosodic features. In: Interspeech 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan; 2010:2818–21.10.21437/Interspeech.2010-745Search in Google Scholar

[15] Minematsu N, Sekiguchi M, Hirose K. Automatic estimation of one’s age with his/her speech based upon acoustic modeling techniques of speakers. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, Orlando, FL, USA. vol. 1; 2002:I/137–40. DOI: 10.1109/ICASSP.2002.5743673.10.1109/ICASSP.2002.5743673Search in Google Scholar

[16] Boersma P, Weenink D. Praat: doing phonetics by computer [Computer program] Version 6.0.05; 2015. Available from: http://www.praat.org.Search in Google Scholar

[17] Datta AK, Singh SS, Ranjan S, Soubhik C, Kartik M, Anirban P. Signal Analysis of Hindustani Classical Music. Singapore: Springer; 2017.10.1007/978-981-10-3959-1Search in Google Scholar

[18] Shmilovitz D. On the definition of total harmonic distortion and its effect on measurement interpretation. IEEE T Power Deliver 2005;20:526–8.10.1109/TPWRD.2004.839744Search in Google Scholar

[19] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. Springer Series in Statistics. New York, NY, USA: Springer New York Inc.; 2001.10.1007/978-0-387-21606-5Search in Google Scholar

[20] Breiman L. Random forests. Mach Learn 2001;45:5–32.10.1023/A:1010933404324Search in Google Scholar

Received: 2018-05-19

Accepted: 2019-11-11

Published Online: 2020-01-14

Published in Print: 2020-08-27

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/bmt-2018-0082

Keywords for this article

age estimation; random forest; voice analysis