Automatic error tagging of spelling mistakes in learner corpora
-
Paul Rayson
Abstract
Manual error tagging of learner corpus data is time consuming and creates a bottleneck in the analysis of learner corpora. This had led researchers to apply techniques from the area of natural language processing to assist in the automatic analysis of such data. This chapter presents the novel application of a hybrid approach to the detection of spelling errors in learner data. The Variant Detector (VARD) software was developed to match historical spelling variants to modern equivalents with the intention of improving the accuracy and robustness of corpus linguistics techniques when applied to historical corpora. Here, we describe its application to detect spelling errors in written learner corpora consisting of 50,000 words from each of three learner backgrounds (French, German and Spanish).
Abstract
Manual error tagging of learner corpus data is time consuming and creates a bottleneck in the analysis of learner corpora. This had led researchers to apply techniques from the area of natural language processing to assist in the automatic analysis of such data. This chapter presents the novel application of a hybrid approach to the detection of spelling errors in learner data. The Variant Detector (VARD) software was developed to match historical spelling variants to modern equivalents with the intention of improving the accuracy and robustness of corpus linguistics techniques when applied to historical corpora. Here, we describe its application to detect spelling errors in written learner corpora consisting of 50,000 words from each of three learner backgrounds (French, German and Spanish).
Kapitel in diesem Buch
- Prelim pages i
- Table of contents v
- Acknowledgements ix
- List of contributors xi
- Preface xiii
- Putting corpora to good uses 1
- Frequency, corpora and language learning 7
- Learner corpora and contrastive interlanguage analysis 33
- The use of small corpora for tracing the development of academic literacies 63
- Revisiting apprentice texts 85
- Automatic error tagging of spelling mistakes in learner corpora 109
- Data mining with learner corpora 127
- Learners and users – Who do we want corpus data from? 155
- Learner knowledge of phrasal verbs 173
- Corpora and the new Englishes 209
- Towards a new generation of corpus-derived lexical resources for language learning 237
- Automating the creation of dictionaries 257
- addendumSelect list of publications by Sylviane Granger 283
- Subject index 289
- Name index 293
Kapitel in diesem Buch
- Prelim pages i
- Table of contents v
- Acknowledgements ix
- List of contributors xi
- Preface xiii
- Putting corpora to good uses 1
- Frequency, corpora and language learning 7
- Learner corpora and contrastive interlanguage analysis 33
- The use of small corpora for tracing the development of academic literacies 63
- Revisiting apprentice texts 85
- Automatic error tagging of spelling mistakes in learner corpora 109
- Data mining with learner corpora 127
- Learners and users – Who do we want corpus data from? 155
- Learner knowledge of phrasal verbs 173
- Corpora and the new Englishes 209
- Towards a new generation of corpus-derived lexical resources for language learning 237
- Automating the creation of dictionaries 257
- addendumSelect list of publications by Sylviane Granger 283
- Subject index 289
- Name index 293