2. Unsupervised auditory filterbank learning for infant cry classification
-
Hardik B. Sailor
Abstract
The infant cry classification is a socially relevant problem where the task is to classify the normal versus pathological cry signals. Since the cry signals are very different from the speech signals, there is a need of better feature representation for infant cry signals. Recently, representation learning is very popular in various signal processing areas including the medical domain. In this chapter, we propose to use unsupervised auditory filterbank learning using convolutional restricted Boltzmann machine (ConvRBM). Analysis of the subband filters shows that they are very distinct compared to the subband filters learned from the speech signals. Various cry models were analyzed using ConvRBM spectrogram for normal and pathological cry signals. The infant cry classification experiments were performed on the two databases, namely, DA-IICT Infant Cry and Baby Chillanto. The experimental results show that the proposed features perform better than the standard mel-frequency cepstral coefficients (MFCC) using various statistically meaningful performance measures. In particular, our proposed ConvRBM-based features obtained an absolute improvement of 2% on the DA-IICT Infant Cry database and 0.58% on the Baby Chillanto database in the classification accuracy. Since, the auditory filterbanks are learned from the infant cry signals, it is optimal to represent the statistical structures in the infant cry signals. Hence, it performs better then standard handcrafted feature sets such as the MFCC.
Abstract
The infant cry classification is a socially relevant problem where the task is to classify the normal versus pathological cry signals. Since the cry signals are very different from the speech signals, there is a need of better feature representation for infant cry signals. Recently, representation learning is very popular in various signal processing areas including the medical domain. In this chapter, we propose to use unsupervised auditory filterbank learning using convolutional restricted Boltzmann machine (ConvRBM). Analysis of the subband filters shows that they are very distinct compared to the subband filters learned from the speech signals. Various cry models were analyzed using ConvRBM spectrogram for normal and pathological cry signals. The infant cry classification experiments were performed on the two databases, namely, DA-IICT Infant Cry and Baby Chillanto. The experimental results show that the proposed features perform better than the standard mel-frequency cepstral coefficients (MFCC) using various statistically meaningful performance measures. In particular, our proposed ConvRBM-based features obtained an absolute improvement of 2% on the DA-IICT Infant Cry database and 0.58% on the Baby Chillanto database in the classification accuracy. Since, the auditory filterbanks are learned from the infant cry signals, it is optimal to represent the statistical structures in the infant cry signals. Hence, it performs better then standard handcrafted feature sets such as the MFCC.
Kapitel in diesem Buch
- Frontmatter I
- Acknowledgments V
- Computers hearing children’s cries and pathologies – a foreword VII
- Contents XI
- List of contributors XIII
- Editors’ introduction XV
- 1. Understanding infant cry analysis for pathology classification 1
- 2. Unsupervised auditory filterbank learning for infant cry classification 63
- 3. Acoustic and prosodic analysis of vocalizations of 18-month-old toddlers with autism spectrum disorder 93
- 4. Computer-aided speech therapy for dysarthric speakers: Statistical acoustic modeling for automated verification of pronunciation accuracy 127
- 5. Communication improves when human or computer listeners adapt to dysarthria 181
- 6. Role of music on infant developments 199
Kapitel in diesem Buch
- Frontmatter I
- Acknowledgments V
- Computers hearing children’s cries and pathologies – a foreword VII
- Contents XI
- List of contributors XIII
- Editors’ introduction XV
- 1. Understanding infant cry analysis for pathology classification 1
- 2. Unsupervised auditory filterbank learning for infant cry classification 63
- 3. Acoustic and prosodic analysis of vocalizations of 18-month-old toddlers with autism spectrum disorder 93
- 4. Computer-aided speech therapy for dysarthric speakers: Statistical acoustic modeling for automated verification of pronunciation accuracy 127
- 5. Communication improves when human or computer listeners adapt to dysarthria 181
- 6. Role of music on infant developments 199