1. State-of-the-art speaker recognition methods applied to speakers with dysarthria
-
Mohammed Senoussaoui
Abstract
Speech-based biometrics is one of the most effective ways for identity management and one of the preferred methods by users and companies given its flexibility, speed and reduced cost. Current state-of-the-art speaker recognition systems are known to be strongly dependent on the condition of the speech material provided as input and can be affected by unexpected variability presented during testing, such as environmental noise, changes in vocal effort or pathological speech due to speech and/or voice disorders. In this chapter, we are particularly interested in understanding the effects of dysarthric speech on automatic speaker identification performance. We explore several state-of-theart feature representations, including i-vectors, bottleneck neural-networkbased features, as well as a covariance-based feature representation. High-level features, such as i-vectors and covariance-based features, are built on top of four different low-level presentations of dysarthric/controlled speech signal. When evaluated on TORGO and NEMOURS databases, our best single system accuracy was 98.7%, thus outperforming results previously reported for these databases.
Abstract
Speech-based biometrics is one of the most effective ways for identity management and one of the preferred methods by users and companies given its flexibility, speed and reduced cost. Current state-of-the-art speaker recognition systems are known to be strongly dependent on the condition of the speech material provided as input and can be affected by unexpected variability presented during testing, such as environmental noise, changes in vocal effort or pathological speech due to speech and/or voice disorders. In this chapter, we are particularly interested in understanding the effects of dysarthric speech on automatic speaker identification performance. We explore several state-of-theart feature representations, including i-vectors, bottleneck neural-networkbased features, as well as a covariance-based feature representation. High-level features, such as i-vectors and covariance-based features, are built on top of four different low-level presentations of dysarthric/controlled speech signal. When evaluated on TORGO and NEMOURS databases, our best single system accuracy was 98.7%, thus outperforming results previously reported for these databases.
Kapitel in diesem Buch
- Frontmatter I
- Foreword V
- Acknowledgments IX
- Contents XI
- List of contributors XIII
- Introduction 1
-
Part I: Comparative analysis of methods for speaker identification, speech recognition, and intelligibility modification in the dysarthric speaker population
- 1. State-of-the-art speaker recognition methods applied to speakers with dysarthria 7
- 2. Enhancement of continuous dysarthric speech 35
- 3. Assessment and intelligibility modification for dysarthric speech 67
-
Part II: New approaches to speech reconstruction and enhancement via conversion of non-acoustic signals
- 4. Analysis and quality conversion of nonacoustic signals: the physiological microphone (PMIC) 97
- 5. Non-audible murmur to audible speech conversion 125
-
Part III: Use of novel speech diagnostic and therapeutic intervention software for speech enhancement and rehabilitation
- 6. Application of speech signal processing for assessment and treatment of voice and speech disorders 153
- 7. A mobile phone-based platform for asynchronous speech therapy 195
Kapitel in diesem Buch
- Frontmatter I
- Foreword V
- Acknowledgments IX
- Contents XI
- List of contributors XIII
- Introduction 1
-
Part I: Comparative analysis of methods for speaker identification, speech recognition, and intelligibility modification in the dysarthric speaker population
- 1. State-of-the-art speaker recognition methods applied to speakers with dysarthria 7
- 2. Enhancement of continuous dysarthric speech 35
- 3. Assessment and intelligibility modification for dysarthric speech 67
-
Part II: New approaches to speech reconstruction and enhancement via conversion of non-acoustic signals
- 4. Analysis and quality conversion of nonacoustic signals: the physiological microphone (PMIC) 97
- 5. Non-audible murmur to audible speech conversion 125
-
Part III: Use of novel speech diagnostic and therapeutic intervention software for speech enhancement and rehabilitation
- 6. Application of speech signal processing for assessment and treatment of voice and speech disorders 153
- 7. A mobile phone-based platform for asynchronous speech therapy 195