Startseite Technik 5. Non-audible murmur to audible speech conversion
Kapitel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

5. Non-audible murmur to audible speech conversion

  • Nirmesh J. Shah und Hemant A. Patil
Veröffentlichen auch Sie bei De Gruyter Brill

Abstract

In the absence of vocal fold vibrations, movement of articulators which produced the respiratory sound can be captured by the soft tissue of the head using the nonaudible murmur (NAM) microphone. NAM is one of the silent speech interface techniques, which can be used by the patients who are suffering from the vocal fold-related disorders. Though NAM microphone is able to capture very small NAM produced by the patients, it suffers from the degradation of quality due to lack of radiation effect at the lips and the lowpass nature of the soft tissue, which attenuate the high-frequency-related information. Hence, it is mostly unintelligible. In this chapter, we propose to use deep learning-based techniques to improve the intelligibility of the NAM signal. In particular, we propose to use deep neural network-based conversion technique with rectifier linear unit as the nonlinear activation function. The proposed system converts this less intelligible NAM signal to the audible speech signal. In particular, we propose to develop a two-stage model where the first stage converts the NAM signal to the whispered speech signal and the second model will convert this whispered speech signal to the normal audible speech. We compared the performance of our proposed system with respect to the state-of-the-art Gaussian mixture model (GMM)-based method. From the objective evaluation, we found that there is 25% of relative improvement compared to the GMM-based NAM-to-whisper (NAM2WHSP) system. In addition, our second-stage speaker-independent whisper-to-speech conversion system further helps NAM-to-speech conversion system to extract linguistic message present in the NAM signal. In particular, from the subjective test, we found 4.15% of the absolute decrease in word error rate compared to the NAM2WHSP system.

Abstract

In the absence of vocal fold vibrations, movement of articulators which produced the respiratory sound can be captured by the soft tissue of the head using the nonaudible murmur (NAM) microphone. NAM is one of the silent speech interface techniques, which can be used by the patients who are suffering from the vocal fold-related disorders. Though NAM microphone is able to capture very small NAM produced by the patients, it suffers from the degradation of quality due to lack of radiation effect at the lips and the lowpass nature of the soft tissue, which attenuate the high-frequency-related information. Hence, it is mostly unintelligible. In this chapter, we propose to use deep learning-based techniques to improve the intelligibility of the NAM signal. In particular, we propose to use deep neural network-based conversion technique with rectifier linear unit as the nonlinear activation function. The proposed system converts this less intelligible NAM signal to the audible speech signal. In particular, we propose to develop a two-stage model where the first stage converts the NAM signal to the whispered speech signal and the second model will convert this whispered speech signal to the normal audible speech. We compared the performance of our proposed system with respect to the state-of-the-art Gaussian mixture model (GMM)-based method. From the objective evaluation, we found that there is 25% of relative improvement compared to the GMM-based NAM-to-whisper (NAM2WHSP) system. In addition, our second-stage speaker-independent whisper-to-speech conversion system further helps NAM-to-speech conversion system to extract linguistic message present in the NAM signal. In particular, from the subjective test, we found 4.15% of the absolute decrease in word error rate compared to the NAM2WHSP system.

Kapitel in diesem Buch

  1. Frontmatter I
  2. Foreword V
  3. Acknowledgments IX
  4. Contents XI
  5. List of contributors XIII
  6. Introduction 1
  7. Part I: Comparative analysis of methods for speaker identification, speech recognition, and intelligibility modification in the dysarthric speaker population
  8. 1. State-of-the-art speaker recognition methods applied to speakers with dysarthria 7
  9. 2. Enhancement of continuous dysarthric speech 35
  10. 3. Assessment and intelligibility modification for dysarthric speech 67
  11. Part II: New approaches to speech reconstruction and enhancement via conversion of non-acoustic signals
  12. 4. Analysis and quality conversion of nonacoustic signals: the physiological microphone (PMIC) 97
  13. 5. Non-audible murmur to audible speech conversion 125
  14. Part III: Use of novel speech diagnostic and therapeutic intervention software for speech enhancement and rehabilitation
  15. 6. Application of speech signal processing for assessment and treatment of voice and speech disorders 153
  16. 7. A mobile phone-based platform for asynchronous speech therapy 195
Heruntergeladen am 25.9.2025 von https://www.degruyterbrill.com/document/doi/10.1515/9781501501265-006/html
Button zum nach oben scrollen