Challenges in the compilation, annotation, and analysis of learner corpus data
-
Marcus Callies
Abstract
This chapter highlights and discusses the special characteristics of learner corpus data and the challenges they may present for corpus compilation, annotation, and analysis. Because learner corpus and SLA researchers use their data to study L2 production and development, it is of utmost importance that the data are valid, that is, they represent “authentic” L2 production, which means that the data must stem from the studied learners’ own language production. I discuss challenges in three areas: (1) multilingual practices and metalinguistic language use, (2) lexical and constructional bias, often brought about by the wording of task instructions or writing prompts that learners are asked to respond to, and (3) learner corpus annotation in view of the “discourse of deficit” in SLA. For each of these challenges solutions as to how they can be met are offered.
Abstract
This chapter highlights and discusses the special characteristics of learner corpus data and the challenges they may present for corpus compilation, annotation, and analysis. Because learner corpus and SLA researchers use their data to study L2 production and development, it is of utmost importance that the data are valid, that is, they represent “authentic” L2 production, which means that the data must stem from the studied learners’ own language production. I discuss challenges in three areas: (1) multilingual practices and metalinguistic language use, (2) lexical and constructional bias, often brought about by the wording of task instructions or writing prompts that learners are asked to respond to, and (3) learner corpus annotation in view of the “discourse of deficit” in SLA. For each of these challenges solutions as to how they can be met are offered.
Chapters in this book
- 日本言語政策学会 / Japan Association for Language Policy. 言語政策 / Language Policy 10. 2014 i
- Table of contents v
- Acknowledgements vii
- From fallacies and pitfalls to solutions and future directions 1
- Engaging with bad (meta)data in historical corpus linguistics 9
- Named entities as potentially problematic items in corpora 35
- Challenges in the compilation, annotation, and analysis of learner corpus data 55
- Early newspapers as data for corpus linguistics (and Digital Humanities) 68
- Open Corpus Linguistics – or How to overcome common problems in dealing with corpus data by adopting open research practices 89
- Text length and short texts 106
- Corpus genre categories 126
- Modeling fine-grained sociolinguistic variation 142
- Subject index 171
Chapters in this book
- 日本言語政策学会 / Japan Association for Language Policy. 言語政策 / Language Policy 10. 2014 i
- Table of contents v
- Acknowledgements vii
- From fallacies and pitfalls to solutions and future directions 1
- Engaging with bad (meta)data in historical corpus linguistics 9
- Named entities as potentially problematic items in corpora 35
- Challenges in the compilation, annotation, and analysis of learner corpus data 55
- Early newspapers as data for corpus linguistics (and Digital Humanities) 68
- Open Corpus Linguistics – or How to overcome common problems in dealing with corpus data by adopting open research practices 89
- Text length and short texts 106
- Corpus genre categories 126
- Modeling fine-grained sociolinguistic variation 142
- Subject index 171