Statistical tests for the analysis of learner corpus data
-
Stefan Th. Gries
Abstract
This paper is an overview of several basic statistical tools in corpus-based SLA research. I first discuss a few issues relevant to the analysis of learner corpus data. Then, I illustrate a few widespread quantitative techniques and statistical visualizations and exemplify them on the basis of corpus data on the genitive alternation – the of-genitive vs. the s-genitive from German learners and native speakers of English. The statistical methods discussed include a test for differences between frequencies (the chi-squared test), tests for differences between means/medians (the U-test), and a more advanced multifactorial extension, binary logistic regression.
Abstract
This paper is an overview of several basic statistical tools in corpus-based SLA research. I first discuss a few issues relevant to the analysis of learner corpus data. Then, I illustrate a few widespread quantitative techniques and statistical visualizations and exemplify them on the basis of corpus data on the genitive alternation – the of-genitive vs. the s-genitive from German learners and native speakers of English. The statistical methods discussed include a test for differences between frequencies (the chi-squared test), tests for differences between means/medians (the U-test), and a more advanced multifactorial extension, binary logistic regression.
Chapters in this book
- Prelim pages i
- Table of contents v
-
Section 1. Introduction
- Introduction 3
- Learner corpora 9
-
Section 2. Compilation, annotation and exchangeability of learner corpus data
- Developing corpus interoperability for phonetic investigation of learner corpora 33
- Learner corpora and second language acquisition 65
- Competing target hypotheses in the Falko corpus 101
-
Section 3. Automatic approaches to the identification of learner language features in learner corpus data
- Using learner corpora for automatic error detection and correction 127
- Automatic suprasegmental parameter extraction in learner corpora 151
- Criterial feature extraction using parallel learner corpora and machine learning 169
-
Section 4. Analysis of learner corpus data
- Phonological acquisition in the French-English interlanguage 207
- Prosody in a contrastive learner corpus 227
- A corpus-based comparison of syntactic complexity in NNS and NS university students’ writing 249
- Analysing coherence in upper-intermediate learner writing 265
- Statistical tests for the analysis of learner corpus data 287
- Index 311
Chapters in this book
- Prelim pages i
- Table of contents v
-
Section 1. Introduction
- Introduction 3
- Learner corpora 9
-
Section 2. Compilation, annotation and exchangeability of learner corpus data
- Developing corpus interoperability for phonetic investigation of learner corpora 33
- Learner corpora and second language acquisition 65
- Competing target hypotheses in the Falko corpus 101
-
Section 3. Automatic approaches to the identification of learner language features in learner corpus data
- Using learner corpora for automatic error detection and correction 127
- Automatic suprasegmental parameter extraction in learner corpora 151
- Criterial feature extraction using parallel learner corpora and machine learning 169
-
Section 4. Analysis of learner corpus data
- Phonological acquisition in the French-English interlanguage 207
- Prosody in a contrastive learner corpus 227
- A corpus-based comparison of syntactic complexity in NNS and NS university students’ writing 249
- Analysing coherence in upper-intermediate learner writing 265
- Statistical tests for the analysis of learner corpus data 287
- Index 311