Startseite Technik 2. Unsupervised auditory filterbank learning for infant cry classification
Kapitel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

2. Unsupervised auditory filterbank learning for infant cry classification

  • Hardik B. Sailor und Hemant A. Patil
Veröffentlichen auch Sie bei De Gruyter Brill
Acoustic Analysis of Pathologies
Ein Kapitel aus dem Buch Acoustic Analysis of Pathologies

Abstract

The infant cry classification is a socially relevant problem where the task is to classify the normal versus pathological cry signals. Since the cry signals are very different from the speech signals, there is a need of better feature representation for infant cry signals. Recently, representation learning is very popular in various signal processing areas including the medical domain. In this chapter, we propose to use unsupervised auditory filterbank learning using convolutional restricted Boltzmann machine (ConvRBM). Analysis of the subband filters shows that they are very distinct compared to the subband filters learned from the speech signals. Various cry models were analyzed using ConvRBM spectrogram for normal and pathological cry signals. The infant cry classification experiments were performed on the two databases, namely, DA-IICT Infant Cry and Baby Chillanto. The experimental results show that the proposed features perform better than the standard mel-frequency cepstral coefficients (MFCC) using various statistically meaningful performance measures. In particular, our proposed ConvRBM-based features obtained an absolute improvement of 2% on the DA-IICT Infant Cry database and 0.58% on the Baby Chillanto database in the classification accuracy. Since, the auditory filterbanks are learned from the infant cry signals, it is optimal to represent the statistical structures in the infant cry signals. Hence, it performs better then standard handcrafted feature sets such as the MFCC.

Abstract

The infant cry classification is a socially relevant problem where the task is to classify the normal versus pathological cry signals. Since the cry signals are very different from the speech signals, there is a need of better feature representation for infant cry signals. Recently, representation learning is very popular in various signal processing areas including the medical domain. In this chapter, we propose to use unsupervised auditory filterbank learning using convolutional restricted Boltzmann machine (ConvRBM). Analysis of the subband filters shows that they are very distinct compared to the subband filters learned from the speech signals. Various cry models were analyzed using ConvRBM spectrogram for normal and pathological cry signals. The infant cry classification experiments were performed on the two databases, namely, DA-IICT Infant Cry and Baby Chillanto. The experimental results show that the proposed features perform better than the standard mel-frequency cepstral coefficients (MFCC) using various statistically meaningful performance measures. In particular, our proposed ConvRBM-based features obtained an absolute improvement of 2% on the DA-IICT Infant Cry database and 0.58% on the Baby Chillanto database in the classification accuracy. Since, the auditory filterbanks are learned from the infant cry signals, it is optimal to represent the statistical structures in the infant cry signals. Hence, it performs better then standard handcrafted feature sets such as the MFCC.

Heruntergeladen am 1.10.2025 von https://www.degruyterbrill.com/document/doi/10.1515/9781501513138-002/html
Button zum nach oben scrollen