John Benjamins Publishing Company
Competing target hypotheses in the Falko corpus
-
, and
Abstract
Error annotation is a key feature of modern learner corpora. Error identification is always based on some kind of reconstructed learner utterance (target hypothesis). Since a single target hypothesis can only cover a certain amount of linguistic information while ignoring other aspects, the need for multiple target hypotheses becomes apparent. Using the German learner corpus Falko as an example, we therefore argue for a flexible multi-layer stand-off corpus architecture where competing target hypotheses can be coded in parallel. Surface differences between the learner text and the target hypotheses can then be exploited for automatic error annotation.
Abstract
Error annotation is a key feature of modern learner corpora. Error identification is always based on some kind of reconstructed learner utterance (target hypothesis). Since a single target hypothesis can only cover a certain amount of linguistic information while ignoring other aspects, the need for multiple target hypotheses becomes apparent. Using the German learner corpus Falko as an example, we therefore argue for a flexible multi-layer stand-off corpus architecture where competing target hypotheses can be coded in parallel. Surface differences between the learner text and the target hypotheses can then be exploited for automatic error annotation.
Chapters in this book
- Prelim pages i
- Table of contents v
-
Section 1. Introduction
- Introduction 3
- Learner corpora 9
-
Section 2. Compilation, annotation and exchangeability of learner corpus data
- Developing corpus interoperability for phonetic investigation of learner corpora 33
- Learner corpora and second language acquisition 65
- Competing target hypotheses in the Falko corpus 101
-
Section 3. Automatic approaches to the identification of learner language features in learner corpus data
- Using learner corpora for automatic error detection and correction 127
- Automatic suprasegmental parameter extraction in learner corpora 151
- Criterial feature extraction using parallel learner corpora and machine learning 169
-
Section 4. Analysis of learner corpus data
- Phonological acquisition in the French-English interlanguage 207
- Prosody in a contrastive learner corpus 227
- A corpus-based comparison of syntactic complexity in NNS and NS university students’ writing 249
- Analysing coherence in upper-intermediate learner writing 265
- Statistical tests for the analysis of learner corpus data 287
- Index 311
Chapters in this book
- Prelim pages i
- Table of contents v
-
Section 1. Introduction
- Introduction 3
- Learner corpora 9
-
Section 2. Compilation, annotation and exchangeability of learner corpus data
- Developing corpus interoperability for phonetic investigation of learner corpora 33
- Learner corpora and second language acquisition 65
- Competing target hypotheses in the Falko corpus 101
-
Section 3. Automatic approaches to the identification of learner language features in learner corpus data
- Using learner corpora for automatic error detection and correction 127
- Automatic suprasegmental parameter extraction in learner corpora 151
- Criterial feature extraction using parallel learner corpora and machine learning 169
-
Section 4. Analysis of learner corpus data
- Phonological acquisition in the French-English interlanguage 207
- Prosody in a contrastive learner corpus 227
- A corpus-based comparison of syntactic complexity in NNS and NS university students’ writing 249
- Analysing coherence in upper-intermediate learner writing 265
- Statistical tests for the analysis of learner corpus data 287
- Index 311