5. Non-audible murmur to audible speech conversion
-
Nirmesh J. Shah
Abstract
In the absence of vocal fold vibrations, movement of articulators which produced the respiratory sound can be captured by the soft tissue of the head using the nonaudible murmur (NAM) microphone. NAM is one of the silent speech interface techniques, which can be used by the patients who are suffering from the vocal fold-related disorders. Though NAM microphone is able to capture very small NAM produced by the patients, it suffers from the degradation of quality due to lack of radiation effect at the lips and the lowpass nature of the soft tissue, which attenuate the high-frequency-related information. Hence, it is mostly unintelligible. In this chapter, we propose to use deep learning-based techniques to improve the intelligibility of the NAM signal. In particular, we propose to use deep neural network-based conversion technique with rectifier linear unit as the nonlinear activation function. The proposed system converts this less intelligible NAM signal to the audible speech signal. In particular, we propose to develop a two-stage model where the first stage converts the NAM signal to the whispered speech signal and the second model will convert this whispered speech signal to the normal audible speech. We compared the performance of our proposed system with respect to the state-of-the-art Gaussian mixture model (GMM)-based method. From the objective evaluation, we found that there is 25% of relative improvement compared to the GMM-based NAM-to-whisper (NAM2WHSP) system. In addition, our second-stage speaker-independent whisper-to-speech conversion system further helps NAM-to-speech conversion system to extract linguistic message present in the NAM signal. In particular, from the subjective test, we found 4.15% of the absolute decrease in word error rate compared to the NAM2WHSP system.
Abstract
In the absence of vocal fold vibrations, movement of articulators which produced the respiratory sound can be captured by the soft tissue of the head using the nonaudible murmur (NAM) microphone. NAM is one of the silent speech interface techniques, which can be used by the patients who are suffering from the vocal fold-related disorders. Though NAM microphone is able to capture very small NAM produced by the patients, it suffers from the degradation of quality due to lack of radiation effect at the lips and the lowpass nature of the soft tissue, which attenuate the high-frequency-related information. Hence, it is mostly unintelligible. In this chapter, we propose to use deep learning-based techniques to improve the intelligibility of the NAM signal. In particular, we propose to use deep neural network-based conversion technique with rectifier linear unit as the nonlinear activation function. The proposed system converts this less intelligible NAM signal to the audible speech signal. In particular, we propose to develop a two-stage model where the first stage converts the NAM signal to the whispered speech signal and the second model will convert this whispered speech signal to the normal audible speech. We compared the performance of our proposed system with respect to the state-of-the-art Gaussian mixture model (GMM)-based method. From the objective evaluation, we found that there is 25% of relative improvement compared to the GMM-based NAM-to-whisper (NAM2WHSP) system. In addition, our second-stage speaker-independent whisper-to-speech conversion system further helps NAM-to-speech conversion system to extract linguistic message present in the NAM signal. In particular, from the subjective test, we found 4.15% of the absolute decrease in word error rate compared to the NAM2WHSP system.
Kapitel in diesem Buch
- Frontmatter I
- Foreword V
- Acknowledgments IX
- Contents XI
- List of contributors XIII
- Introduction 1
-
Part I: Comparative analysis of methods for speaker identification, speech recognition, and intelligibility modification in the dysarthric speaker population
- 1. State-of-the-art speaker recognition methods applied to speakers with dysarthria 7
- 2. Enhancement of continuous dysarthric speech 35
- 3. Assessment and intelligibility modification for dysarthric speech 67
-
Part II: New approaches to speech reconstruction and enhancement via conversion of non-acoustic signals
- 4. Analysis and quality conversion of nonacoustic signals: the physiological microphone (PMIC) 97
- 5. Non-audible murmur to audible speech conversion 125
-
Part III: Use of novel speech diagnostic and therapeutic intervention software for speech enhancement and rehabilitation
- 6. Application of speech signal processing for assessment and treatment of voice and speech disorders 153
- 7. A mobile phone-based platform for asynchronous speech therapy 195
Kapitel in diesem Buch
- Frontmatter I
- Foreword V
- Acknowledgments IX
- Contents XI
- List of contributors XIII
- Introduction 1
-
Part I: Comparative analysis of methods for speaker identification, speech recognition, and intelligibility modification in the dysarthric speaker population
- 1. State-of-the-art speaker recognition methods applied to speakers with dysarthria 7
- 2. Enhancement of continuous dysarthric speech 35
- 3. Assessment and intelligibility modification for dysarthric speech 67
-
Part II: New approaches to speech reconstruction and enhancement via conversion of non-acoustic signals
- 4. Analysis and quality conversion of nonacoustic signals: the physiological microphone (PMIC) 97
- 5. Non-audible murmur to audible speech conversion 125
-
Part III: Use of novel speech diagnostic and therapeutic intervention software for speech enhancement and rehabilitation
- 6. Application of speech signal processing for assessment and treatment of voice and speech disorders 153
- 7. A mobile phone-based platform for asynchronous speech therapy 195