The grammatical annotation of speech corpora: Techniques and perspectives

Eckhard Bick

Chapter

The grammatical annotation of speech corpora

Techniques and perspectives

Eckhard Bick

Published by

View more publications by John Benjamins Publishing Company

To Publisher Page

This chapter is in the book Spoken Corpora and Linguistic Studies

Abstract

This chapter discusses the grammatical annotation of speech corpora on the one hand (C-ORAL-Brasil, NURC) and speech-like text on the other (e-mail, chat, tv-news, parliamentary discussions), drawing on Portuguese data for the former and English data for the latter. We try to identify and compare linguistic orality markers (“speechlikeness”) in different genres, and argue that broad-coverage Constraint Grammar parsers such as PALAVRAS and EngGram can be adapted to these features, and used across the text-speech divide. Special topics include emoticons, phonetic variation and syntactic features. For ordinary speech corpora we propose a system of two-level annotation, where overlaps, retractions and phonetic variation are maintained as meta-tagging, while allowing conventional annotation of an orthographically normalized textual layer. In the absence of punctuation, syntactic segmentation can be achieved by exploiting prosodic breaks as delimiters in parsing rules. With the exception of chat data, our modified “oral” CG parsers perform reasonably close to their written language counterparts, even for true transcribed speech, achieving accuracy rates (F-scores) above 98% for PoS tags and 93–95% for syntactic function.

You are currently not able to access this content.

Abstract

You are currently not able to access this content.

Chapters in this book

Prelim pages i
Table of contents v
Acknowledgements vii
Introduction: Spoken corpora and linguistic studies 1
Section I: Experiences and requirements of spoken corpora compilation
Methodological issues for spontaneous speech corpora compilation 27
A multilingual speech corpus of North-Germanic languages 69
Methodological considerations for the development and use of sign language acquisition corpora 84
Section II: Multilevel corpus annotation
The grammatical annotation of speech corpora 105
The IPIC resource and a cross-linguistic analysis of information structure in Italian and Brazilian Portuguese 129
The variation of action verbs in multilingual spontaneous speech corpora 152
Section III: Prosody and its functional levels
Speech and corpora 191
Corpus design for studying the expression of emotion in speech 210
Illocution, attitudes and prosody 233
Exploring the prosody of stance 271
Section IV: Syntax and Information Structure
Prosody and information structure 297
The notion of sentence and other discourse units in corpus annotation 331
Syntactic properties of spontaneous speech in the Language into Act Theory 365
Prosodic constraints for discourse markers 411
Appendix 468
Index 496

https://doi.org/10.1075/scl.61.04bic

Chapters in this book

Prelim pages i
Table of contents v
Acknowledgements vii
Introduction: Spoken corpora and linguistic studies 1
Section I: Experiences and requirements of spoken corpora compilation
Methodological issues for spontaneous speech corpora compilation 27
A multilingual speech corpus of North-Germanic languages 69
Methodological considerations for the development and use of sign language acquisition corpora 84
Section II: Multilevel corpus annotation
The grammatical annotation of speech corpora 105
The IPIC resource and a cross-linguistic analysis of information structure in Italian and Brazilian Portuguese 129
The variation of action verbs in multilingual spontaneous speech corpora 152
Section III: Prosody and its functional levels
Speech and corpora 191
Corpus design for studying the expression of emotion in speech 210
Illocution, attitudes and prosody 233
Exploring the prosody of stance 271
Section IV: Syntax and Information Structure
Prosody and information structure 297
The notion of sentence and other discourse units in corpus annotation 331
Syntactic properties of spontaneous speech in the Language into Act Theory 365
Prosodic constraints for discourse markers 411
Appendix 468
Index 496

The grammatical annotation of speech corpora

Abstract

Chapter PDF View

Abstract

Chapters in this book

Chapters in this book

Chapters in this book