Contrastive corpus annotation in the CONTRANOT project
-
Julia Lavid-López
Abstract
In this paper we outline a number of issues and problems which arise during the process of contrastive human-coded corpus annotation of certain semantic and discourse categories within the framework of the CONTRANOT project, aimed at the creation and validation of contrastive functional descriptions through corpus analysis and annotation. Human-coded corpus annotation is a preliminary step for the training of computer algorithms which allow the automation of the annotation of large corpora, but it can also serve as a mechanism for testing aspects of linguistic theories empirically, such as theory formation and theory-redefinition, as well as enriching theories with quantitative information. The work reported in this paper focuses on the annotation of the category of Thematisation, on the one hand, and on Modality, on the other, to illustrate the challenges researchers have to face when confronted with the task of developing well-designed and reliable annotation procedures for complex linguistic phenomena in a contrastive manner. We describe the annotation tasks and procedures developed so far, which include the design of annotation schemas on the basis of available linguistic theories and the testing of their reliability through agreement studies. We also evaluate and discuss the results of the annotations on the basis of their relevance for the theoretical characterisation of the investigated phenomena. We expect that our work will have an impact in the area of contrastive textual analysis, and that it will pave the way for the development of automated annotation systems for computational applications.
Abstract
In this paper we outline a number of issues and problems which arise during the process of contrastive human-coded corpus annotation of certain semantic and discourse categories within the framework of the CONTRANOT project, aimed at the creation and validation of contrastive functional descriptions through corpus analysis and annotation. Human-coded corpus annotation is a preliminary step for the training of computer algorithms which allow the automation of the annotation of large corpora, but it can also serve as a mechanism for testing aspects of linguistic theories empirically, such as theory formation and theory-redefinition, as well as enriching theories with quantitative information. The work reported in this paper focuses on the annotation of the category of Thematisation, on the one hand, and on Modality, on the other, to illustrate the challenges researchers have to face when confronted with the task of developing well-designed and reliable annotation procedures for complex linguistic phenomena in a contrastive manner. We describe the annotation tasks and procedures developed so far, which include the design of annotation schemas on the basis of available linguistic theories and the testing of their reliability through agreement studies. We also evaluate and discuss the results of the annotations on the basis of their relevance for the theoretical characterisation of the investigated phenomena. We expect that our work will have an impact in the area of contrastive textual analysis, and that it will pave the way for the development of automated annotation systems for computational applications.
Chapters in this book
- Prelim pages i
- Table of contents v
- Contributors vii
-
Introduction
- On the relatedness of functionalism and pragmatics 1
-
I. Methods in the analysis of language and discourse
- Developing comprehensive criteria of adequacy 19
- A method of analysing recontextualisation in the communication of science 37
- Contrastive corpus annotation in the CONTRANOT project 57
- Form and function in evaluative language 87
- Life before Nation 111
-
II. Pragmatics and grammar
- A lexico-paradigmatic approach to English setting-constructions 133
- How did we think? 149
- The adverb truly in Present-Day English 169
-
III. Current trends in pragmatics and discourse analysis
- Nominal reference and the dynamics of discourse 189
- ‘Pragmatic punting’ and prosody 209
- Besides as a connective 223
- Searle and Sinclair on communicative acts 243
- Strategies of (in)directness in Spanish speakers’ production of complaints and disagreements in English and Spanish 261
- Name index 285
- Term index 289
Chapters in this book
- Prelim pages i
- Table of contents v
- Contributors vii
-
Introduction
- On the relatedness of functionalism and pragmatics 1
-
I. Methods in the analysis of language and discourse
- Developing comprehensive criteria of adequacy 19
- A method of analysing recontextualisation in the communication of science 37
- Contrastive corpus annotation in the CONTRANOT project 57
- Form and function in evaluative language 87
- Life before Nation 111
-
II. Pragmatics and grammar
- A lexico-paradigmatic approach to English setting-constructions 133
- How did we think? 149
- The adverb truly in Present-Day English 169
-
III. Current trends in pragmatics and discourse analysis
- Nominal reference and the dynamics of discourse 189
- ‘Pragmatic punting’ and prosody 209
- Besides as a connective 223
- Searle and Sinclair on communicative acts 243
- Strategies of (in)directness in Spanish speakers’ production of complaints and disagreements in English and Spanish 261
- Name index 285
- Term index 289