What is missing in learner corpus design?
-
Yukio Tono
Abstract
This chapter discusses what is missing in learner corpus design. Learner corpus researchers are sometimes not fully aware of the basic principles of corpus design and collection that most corpus linguists should know. I will first discuss theoretical and methodological issues related to learner corpus design and collection, focusing on sampling, representativeness, and corpus size. Then, I will review three relevant studies (Biber 1993; Tomasello & Stahl 2004; Mukherjee & Rohrbach 2006) in order to better understand corpus design issues such as parameters of corpus sampling, effects of sample size, and variations in learner corpus design. Finally, the chapter concludes by discussing critical assessment and future directions in terms of issues of design as well as data collection in learner corpus research.
Abstract
This chapter discusses what is missing in learner corpus design. Learner corpus researchers are sometimes not fully aware of the basic principles of corpus design and collection that most corpus linguists should know. I will first discuss theoretical and methodological issues related to learner corpus design and collection, focusing on sampling, representativeness, and corpus size. Then, I will review three relevant studies (Biber 1993; Tomasello & Stahl 2004; Mukherjee & Rohrbach 2006) in order to better understand corpus design issues such as parameters of corpus sampling, effects of sample size, and variations in learner corpus design. Finally, the chapter concludes by discussing critical assessment and future directions in terms of issues of design as well as data collection in learner corpus research.
Chapters in this book
- Prelim pages i
- Table of contents v
-
Section 1. Introduction
- Spanish learner corpus research 3
- What is missing in learner corpus design? 33
-
Section 2. Compilation, annotation and exploitation of learner corpus data
- Learner Spanish on computer 55
- PoS-tagging a Spanish oral learner corpus 89
- The LANGSNAP longitudinal learner corpus 117
- The Aprescrilov corpus, or broadening the horizon of Spanish language learning in Flanders 143
- Spanish Corpus Proficiency Level Training Website and Corpus 169
-
Section 3. Analysis of learner corpus data
- Factors that can have an impact on the processes of perceiving Spanish/L2 199
- Pragmatic principles in anaphora resolution at the syntax-discourse interface 235
- Discourse markers in CEDEL2 and SPLLOC corpora of learner Spanish 267
- A corpus study of Spanish as a Foreign Language learners’ collocation production 299
- Index 333
Chapters in this book
- Prelim pages i
- Table of contents v
-
Section 1. Introduction
- Spanish learner corpus research 3
- What is missing in learner corpus design? 33
-
Section 2. Compilation, annotation and exploitation of learner corpus data
- Learner Spanish on computer 55
- PoS-tagging a Spanish oral learner corpus 89
- The LANGSNAP longitudinal learner corpus 117
- The Aprescrilov corpus, or broadening the horizon of Spanish language learning in Flanders 143
- Spanish Corpus Proficiency Level Training Website and Corpus 169
-
Section 3. Analysis of learner corpus data
- Factors that can have an impact on the processes of perceiving Spanish/L2 199
- Pragmatic principles in anaphora resolution at the syntax-discourse interface 235
- Discourse markers in CEDEL2 and SPLLOC corpora of learner Spanish 267
- A corpus study of Spanish as a Foreign Language learners’ collocation production 299
- Index 333