Using learner corpora for automatic error detection and correction
-
Michael Gamon
Abstract
In this chapter we discuss the use and importance of learner corpora for the development and evaluation of automatic systems for learner error detection and correction. We argue that learner corpora are crucial in three main areas in this process. First, these corpora play an important role in identifying and quantifying common error types, in order to prioritize development of error-specific algorithms. Second, learner corpora provide valuable training data for machine-learned approaches which are dominant in the field of natural language processing today. Finally, the evaluation of error detection and correction systems is most reliable and realistic when performed on real learner data.
Abstract
In this chapter we discuss the use and importance of learner corpora for the development and evaluation of automatic systems for learner error detection and correction. We argue that learner corpora are crucial in three main areas in this process. First, these corpora play an important role in identifying and quantifying common error types, in order to prioritize development of error-specific algorithms. Second, learner corpora provide valuable training data for machine-learned approaches which are dominant in the field of natural language processing today. Finally, the evaluation of error detection and correction systems is most reliable and realistic when performed on real learner data.
Chapters in this book
- Prelim pages i
- Table of contents v
-
Section 1. Introduction
- Introduction 3
- Learner corpora 9
-
Section 2. Compilation, annotation and exchangeability of learner corpus data
- Developing corpus interoperability for phonetic investigation of learner corpora 33
- Learner corpora and second language acquisition 65
- Competing target hypotheses in the Falko corpus 101
-
Section 3. Automatic approaches to the identification of learner language features in learner corpus data
- Using learner corpora for automatic error detection and correction 127
- Automatic suprasegmental parameter extraction in learner corpora 151
- Criterial feature extraction using parallel learner corpora and machine learning 169
-
Section 4. Analysis of learner corpus data
- Phonological acquisition in the French-English interlanguage 207
- Prosody in a contrastive learner corpus 227
- A corpus-based comparison of syntactic complexity in NNS and NS university students’ writing 249
- Analysing coherence in upper-intermediate learner writing 265
- Statistical tests for the analysis of learner corpus data 287
- Index 311
Chapters in this book
- Prelim pages i
- Table of contents v
-
Section 1. Introduction
- Introduction 3
- Learner corpora 9
-
Section 2. Compilation, annotation and exchangeability of learner corpus data
- Developing corpus interoperability for phonetic investigation of learner corpora 33
- Learner corpora and second language acquisition 65
- Competing target hypotheses in the Falko corpus 101
-
Section 3. Automatic approaches to the identification of learner language features in learner corpus data
- Using learner corpora for automatic error detection and correction 127
- Automatic suprasegmental parameter extraction in learner corpora 151
- Criterial feature extraction using parallel learner corpora and machine learning 169
-
Section 4. Analysis of learner corpus data
- Phonological acquisition in the French-English interlanguage 207
- Prosody in a contrastive learner corpus 227
- A corpus-based comparison of syntactic complexity in NNS and NS university students’ writing 249
- Analysing coherence in upper-intermediate learner writing 265
- Statistical tests for the analysis of learner corpus data 287
- Index 311