Home Technology 5. Non-audible murmur to audible speech conversion
Chapter
Licensed
Unlicensed Requires Authentication

5. Non-audible murmur to audible speech conversion

  • Nirmesh J. Shah and Hemant A. Patil
Become an author with De Gruyter Brill

Abstract

In the absence of vocal fold vibrations, movement of articulators which produced the respiratory sound can be captured by the soft tissue of the head using the nonaudible murmur (NAM) microphone. NAM is one of the silent speech interface techniques, which can be used by the patients who are suffering from the vocal fold-related disorders. Though NAM microphone is able to capture very small NAM produced by the patients, it suffers from the degradation of quality due to lack of radiation effect at the lips and the lowpass nature of the soft tissue, which attenuate the high-frequency-related information. Hence, it is mostly unintelligible. In this chapter, we propose to use deep learning-based techniques to improve the intelligibility of the NAM signal. In particular, we propose to use deep neural network-based conversion technique with rectifier linear unit as the nonlinear activation function. The proposed system converts this less intelligible NAM signal to the audible speech signal. In particular, we propose to develop a two-stage model where the first stage converts the NAM signal to the whispered speech signal and the second model will convert this whispered speech signal to the normal audible speech. We compared the performance of our proposed system with respect to the state-of-the-art Gaussian mixture model (GMM)-based method. From the objective evaluation, we found that there is 25% of relative improvement compared to the GMM-based NAM-to-whisper (NAM2WHSP) system. In addition, our second-stage speaker-independent whisper-to-speech conversion system further helps NAM-to-speech conversion system to extract linguistic message present in the NAM signal. In particular, from the subjective test, we found 4.15% of the absolute decrease in word error rate compared to the NAM2WHSP system.

Abstract

In the absence of vocal fold vibrations, movement of articulators which produced the respiratory sound can be captured by the soft tissue of the head using the nonaudible murmur (NAM) microphone. NAM is one of the silent speech interface techniques, which can be used by the patients who are suffering from the vocal fold-related disorders. Though NAM microphone is able to capture very small NAM produced by the patients, it suffers from the degradation of quality due to lack of radiation effect at the lips and the lowpass nature of the soft tissue, which attenuate the high-frequency-related information. Hence, it is mostly unintelligible. In this chapter, we propose to use deep learning-based techniques to improve the intelligibility of the NAM signal. In particular, we propose to use deep neural network-based conversion technique with rectifier linear unit as the nonlinear activation function. The proposed system converts this less intelligible NAM signal to the audible speech signal. In particular, we propose to develop a two-stage model where the first stage converts the NAM signal to the whispered speech signal and the second model will convert this whispered speech signal to the normal audible speech. We compared the performance of our proposed system with respect to the state-of-the-art Gaussian mixture model (GMM)-based method. From the objective evaluation, we found that there is 25% of relative improvement compared to the GMM-based NAM-to-whisper (NAM2WHSP) system. In addition, our second-stage speaker-independent whisper-to-speech conversion system further helps NAM-to-speech conversion system to extract linguistic message present in the NAM signal. In particular, from the subjective test, we found 4.15% of the absolute decrease in word error rate compared to the NAM2WHSP system.

Chapters in this book

  1. Frontmatter I
  2. Foreword V
  3. Acknowledgments IX
  4. Contents XI
  5. List of contributors XIII
  6. Introduction 1
  7. Part I: Comparative analysis of methods for speaker identification, speech recognition, and intelligibility modification in the dysarthric speaker population
  8. 1. State-of-the-art speaker recognition methods applied to speakers with dysarthria 7
  9. 2. Enhancement of continuous dysarthric speech 35
  10. 3. Assessment and intelligibility modification for dysarthric speech 67
  11. Part II: New approaches to speech reconstruction and enhancement via conversion of non-acoustic signals
  12. 4. Analysis and quality conversion of nonacoustic signals: the physiological microphone (PMIC) 97
  13. 5. Non-audible murmur to audible speech conversion 125
  14. Part III: Use of novel speech diagnostic and therapeutic intervention software for speech enhancement and rehabilitation
  15. 6. Application of speech signal processing for assessment and treatment of voice and speech disorders 153
  16. 7. A mobile phone-based platform for asynchronous speech therapy 195
Downloaded on 3.10.2025 from https://www.degruyterbrill.com/document/doi/10.1515/9781501501265-006/html
Scroll to top button