Methodological issues for spontaneous speech corpora compilation
-
Heliana Ribeiro De Mello
Abstract
Spontaneous Speech Corpus Compilation has been going through a growing period in the past 20 years. This is due majorly to technological advances that have been achieved allowing for highly accurate recording in vivo, new insights coming from empirically-based linguistic theory, concerns for the documentation of threatened languages and the high degree of relevance of findings to speech recognition applications. This paper discusses methodologies associated to spontaneous speech corpus compilation which shed light on specific aspects of relevance to the understanding of linguistic phenomena that pertain to spoken language. The compilation process of C-ORAL-BRASIL I, an informal spontaneous speech Brazilian Portuguese corpus, among other examples, is used as the basis for the discussion carried.
Abstract
Spontaneous Speech Corpus Compilation has been going through a growing period in the past 20 years. This is due majorly to technological advances that have been achieved allowing for highly accurate recording in vivo, new insights coming from empirically-based linguistic theory, concerns for the documentation of threatened languages and the high degree of relevance of findings to speech recognition applications. This paper discusses methodologies associated to spontaneous speech corpus compilation which shed light on specific aspects of relevance to the understanding of linguistic phenomena that pertain to spoken language. The compilation process of C-ORAL-BRASIL I, an informal spontaneous speech Brazilian Portuguese corpus, among other examples, is used as the basis for the discussion carried.
Chapters in this book
- Prelim pages i
- Table of contents v
- Acknowledgements vii
- Introduction: Spoken corpora and linguistic studies 1
-
Section I: Experiences and requirements of spoken corpora compilation
- Methodological issues for spontaneous speech corpora compilation 27
- A multilingual speech corpus of North-Germanic languages 69
- Methodological considerations for the development and use of sign language acquisition corpora 84
-
Section II: Multilevel corpus annotation
- The grammatical annotation of speech corpora 105
- The IPIC resource and a cross-linguistic analysis of information structure in Italian and Brazilian Portuguese 129
- The variation of action verbs in multilingual spontaneous speech corpora 152
-
Section III: Prosody and its functional levels
- Speech and corpora 191
- Corpus design for studying the expression of emotion in speech 210
- Illocution, attitudes and prosody 233
- Exploring the prosody of stance 271
-
Section IV: Syntax and Information Structure
- Prosody and information structure 297
- The notion of sentence and other discourse units in corpus annotation 331
- Syntactic properties of spontaneous speech in the Language into Act Theory 365
- Prosodic constraints for discourse markers 411
- Appendix 468
- Index 496
Chapters in this book
- Prelim pages i
- Table of contents v
- Acknowledgements vii
- Introduction: Spoken corpora and linguistic studies 1
-
Section I: Experiences and requirements of spoken corpora compilation
- Methodological issues for spontaneous speech corpora compilation 27
- A multilingual speech corpus of North-Germanic languages 69
- Methodological considerations for the development and use of sign language acquisition corpora 84
-
Section II: Multilevel corpus annotation
- The grammatical annotation of speech corpora 105
- The IPIC resource and a cross-linguistic analysis of information structure in Italian and Brazilian Portuguese 129
- The variation of action verbs in multilingual spontaneous speech corpora 152
-
Section III: Prosody and its functional levels
- Speech and corpora 191
- Corpus design for studying the expression of emotion in speech 210
- Illocution, attitudes and prosody 233
- Exploring the prosody of stance 271
-
Section IV: Syntax and Information Structure
- Prosody and information structure 297
- The notion of sentence and other discourse units in corpus annotation 331
- Syntactic properties of spontaneous speech in the Language into Act Theory 365
- Prosodic constraints for discourse markers 411
- Appendix 468
- Index 496