PoS-tagging a Spanish oral learner corpus
-
Leonardo Campillos Llanos
Abstract
This chapter explains the methodology that was followed to Part of Speech tag the Spanish oral learner corpus CORELE (Corpus Oral de Español como Lengua Extranjera; Campillos Llanos 2014). The data consist of forty interviews with learners at lower intermediate level from more than nine mother tongue (L1) backgrounds, and four interviews with native speakers (control group). The annotation was performed with the GRAMPAL tagger (Moreno & Guirao 2006). The learner corpus amounted to 52,759 lexical units (LUs), and the native corpus, to 8,643 LUs. The interface is available online and allows the user to explore learners’ interlanguage by searching data according to word form, lemma, L1, and/or proficiency level. I present a sample study on learners’ production of articles following the Contrastive Interlanguage Analysis approach (Granger 1996).
Abstract
This chapter explains the methodology that was followed to Part of Speech tag the Spanish oral learner corpus CORELE (Corpus Oral de Español como Lengua Extranjera; Campillos Llanos 2014). The data consist of forty interviews with learners at lower intermediate level from more than nine mother tongue (L1) backgrounds, and four interviews with native speakers (control group). The annotation was performed with the GRAMPAL tagger (Moreno & Guirao 2006). The learner corpus amounted to 52,759 lexical units (LUs), and the native corpus, to 8,643 LUs. The interface is available online and allows the user to explore learners’ interlanguage by searching data according to word form, lemma, L1, and/or proficiency level. I present a sample study on learners’ production of articles following the Contrastive Interlanguage Analysis approach (Granger 1996).
Kapitel in diesem Buch
- Prelim pages i
- Table of contents v
-
Section 1. Introduction
- Spanish learner corpus research 3
- What is missing in learner corpus design? 33
-
Section 2. Compilation, annotation and exploitation of learner corpus data
- Learner Spanish on computer 55
- PoS-tagging a Spanish oral learner corpus 89
- The LANGSNAP longitudinal learner corpus 117
- The Aprescrilov corpus, or broadening the horizon of Spanish language learning in Flanders 143
- Spanish Corpus Proficiency Level Training Website and Corpus 169
-
Section 3. Analysis of learner corpus data
- Factors that can have an impact on the processes of perceiving Spanish/L2 199
- Pragmatic principles in anaphora resolution at the syntax-discourse interface 235
- Discourse markers in CEDEL2 and SPLLOC corpora of learner Spanish 267
- A corpus study of Spanish as a Foreign Language learners’ collocation production 299
- Index 333
Kapitel in diesem Buch
- Prelim pages i
- Table of contents v
-
Section 1. Introduction
- Spanish learner corpus research 3
- What is missing in learner corpus design? 33
-
Section 2. Compilation, annotation and exploitation of learner corpus data
- Learner Spanish on computer 55
- PoS-tagging a Spanish oral learner corpus 89
- The LANGSNAP longitudinal learner corpus 117
- The Aprescrilov corpus, or broadening the horizon of Spanish language learning in Flanders 143
- Spanish Corpus Proficiency Level Training Website and Corpus 169
-
Section 3. Analysis of learner corpus data
- Factors that can have an impact on the processes of perceiving Spanish/L2 199
- Pragmatic principles in anaphora resolution at the syntax-discourse interface 235
- Discourse markers in CEDEL2 and SPLLOC corpora of learner Spanish 267
- A corpus study of Spanish as a Foreign Language learners’ collocation production 299
- Index 333