Statistische Modellierung
-
Stefan Th. Gries
Abstract
This paper provides an overview of central aspects of statistical modeling of linguistic data. Starting from a general definition of model, the paper discusses the goals of modeling as well as a variety of issues bearing upon the formulation/definition of statistical models. It then surveys model selection, the choice of ‘the best model’ and three fundamental notions affecting the interpretation of models. Finally, the paper turns to validation and replicability and addresses a variety of challenges researchers face during modeling processes.
Danksagung
Ich danke Stefanie Wulff, Anke Lüdeling und zwei Reviewern für ihren Input zu früheren Stadien dieses Artikels.
Literatur
Baayen, R. Harald. 2008. Analyzing linguistic data: a practical introduction to statistics using R. Cambridge: Cambridge University Press.10.1017/CBO9780511801686Search in Google Scholar
Baayen, R. Harald. 2010a. A real experiment is a factorial experiment? The Mental Lexicon 5(1). 149–157.10.1075/ml.5.1.06baaSearch in Google Scholar
Baayen, R. Harald. 2010b. Demythologizing the word frequency effect: a discriminative learning perspective. The Mental Lexicon 5(3). 436–461.10.1075/ml.5.3.10baaSearch in Google Scholar
Baayen, R. Harald. 2011. Corpus linguistics and naive discriminative learning. Brazilian Journal of Applied Linguistics 11(2). 295–328.10.1590/S1984-63982011000200003Search in Google Scholar
Clark, Herbert H. 1973. The Language-as-Fixed-Effect Fallacy: a critique of language statistics in psychological research. Journal of Verbal Learning and Verbal Behavior 12(4). 335–359.10.1016/S0022-5371(73)80014-3Search in Google Scholar
Crawley, Michael J. 2007. The R book. Chichester: John Wiley.Search in Google Scholar
Cook, Dianne & Deborah F. Swayne. 2007. Interactive and dynamic graphics for data analysis. With R and GGobi. New York: Springer.10.1007/978-0-387-71762-3Search in Google Scholar
Faraway, Julian J. 2005. Linear models with R. Boca Raton, FL: Chapman & Hall/CRC.Search in Google Scholar
Faraway, Julian J. 2006. Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models. Boca Raton, FL: Chapman & Hall/CRC.Search in Google Scholar
Forster, Kenneth I. & R. G. Dickinson. 1976. More on the language-as-fixed-effect fallacy: Monte Carlo estimates of error rates for F1, F2, F′, and min F′. Journal of Verbal Learning and Verbal Behavior 15(2). 135–142.10.1016/0022-5371(76)90014-1Search in Google Scholar
Fox, John. 1997. Applied regression analysis, linear models, and related methods. Thousand Oaks, CA: Sage Publications.Search in Google Scholar
Fox, John. 2008. Applied regression analysis and generalized linear models. 2nd ed. Thousand Oaks, CA: Sage PublicationsSearch in Google Scholar
Gelman, Andrew & Jennifer Hill. 2007. Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press.10.1017/CBO9780511790942Search in Google Scholar
Gries, Stefan Th. 1999. Particle movement: a cognitive and functional approach. Cognitive Linguistics 10(2). 105–145.10.1515/cogl.1999.005Search in Google Scholar
Gries, Stefan Th. 2006. Exploring variability within and between corpora: some methodological considerations. Corpora 1(2). 109–151.10.3366/cor.2006.1.2.109Search in Google Scholar
Gries, Stefan Th. 2008. Statistik für Sprachwissenschaftler. Göttingen: Vandenhoeck & Ruprecht.10.1515/glot-2009-0025Search in Google Scholar
Gries, Stefan Th. 2009. Statistics for linguistics with R: a practical introduction. Berlin & New York: Mouton de Gruyter.10.1515/9783110216042Search in Google Scholar
Gries, Stefan Th. 2011. Commentary. In Kathryn Allan & Justyna Robinson (Hrsg.), Current methods in historical semantics, 184–195. Berlin & New York: Mouton de Gruyter.Search in Google Scholar
Harrell, Frank E. Jr. 2001. Regression modeling strategies. With applications to linear models, logistic regression, and survival analysis. New York: Springer.10.1007/978-1-4757-3462-1Search in Google Scholar
Hastie, Trevor & Robert Tibshirani. 1990. Generalized additive models. Boca Raton, FL: Chapman & Hall/CRC.Search in Google Scholar
Hastie, Trevor, Robert Tibshirani & Jerome Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction. 2nd ed. New York: Springer.Search in Google Scholar
Hilpert, Martin. 2011. Diachronic collostructional analysis: how to use it and how to deal with confounding factors. In Kathryn Allan & Justyna Robinson (Hrsg.), Current methods in historical semantics, 130–160. Berlin & New York: Mouton de Gruyter.10.1515/9783110252903.133Search in Google Scholar
Jankowski, Bridget. 2004. A transatlantic perspective of variation and change in English deontic modality. Toronto Working Papers in Linguistics 23(2). 85–113.Search in Google Scholar
Jarvis, Scott. 2011. Data mining with learner corpora: choosing L1 classifiers for L1 detection. In Fanny Meunier, Sylvie De Cock, Gaëtanelle Gilquin & Magali Paquot (Hrsg.), A taste for corpora. In honor of Sylviane Granger, 127–154. Amsterdam & Philadelphia: John Benjamins.10.1075/scl.45.10jarSearch in Google Scholar
Johnson, Keith. 2008. Quantitative methods in linguistics. Malden, MA: Blackwell.Search in Google Scholar
Keen, Kevin J. 2010. Graphics for statistics and data analysis with R. Boca Raton, FL: CRC Press, Taylor & Francis Group.Search in Google Scholar
Leech, Geoffrey, Brian Francis & Xfueng Xu. 1994. The use of computer corpora in the textual demonstrability of gradience in linguistic categories. In Catherine Fuchs & Bernard Victorri (Hrsg.), Continuity in linguistic semantics, 57–76. Amsterdam & Philadelphia: John Benjamins.10.1075/lis.19.07leeSearch in Google Scholar
Molinaro, Anette M., Richard Simon & Ruth M. Pfeiffer. 2005. Prediction error estimation: a comparison of resampling methods. Bioinformatics 21(15). 3301–3307.10.1093/bioinformatics/bti499Search in Google Scholar
Murrell, Paul. 2011. R graphics. 2nd ed. Boca Raton, FL: CRC Press, Taylor & Francis Group.Search in Google Scholar
Paolillo, John C. 2002. Analyzing linguistic variation: statistical models and methods. Stanford, CA: CSLI Publications.Search in Google Scholar
R Development Core Team. 2011. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. URL <http://www.R-project.org/>.Search in Google Scholar
Sarkar, Deepayan. 2008. Lattice: multivariate data visualization with R. New York: Springer.10.1007/978-0-387-75969-2Search in Google Scholar
Suzuki, Ryota & Hidetoshi Shimodaira. 2011. pvclust: Hierarchical clustering with p-values via multiscale bootstrap resampling. R package version 1.2–2. <http://CRAN.R-project.org/package=pvclust>.Search in Google Scholar
Teich, Elke & Peter Fankhauser. 2010. Exploring a corpus of scientific texts using data mining. In Stefan Th. Gries, Stefanie Wulff & Mark Davies (Hrsg.), Corpus-linguistic applications: current studies, new directions, 233–247. Amsterdam: Rodopi.10.1163/9789042028012_016Search in Google Scholar
Torgo, Luís. 2011. Data mining with R: learning with case studies. Boca Raton, FL: Chapman & Hall/CRC.10.1201/b10328Search in Google Scholar
Tryk, H. Edward. 1986. Subjective scaling of word frequency. The American Journal of Psychology 81(2). 170–177.10.2307/1421261Search in Google Scholar
Tuttle, Siri G. & Olga Lovick. 2007. Intonational marking of discourse units in two Dena’ina narratives. Nouveaux Cahiers de Linguistique Française 28. 305–316.Search in Google Scholar
Unwin, Antony, Martin Theus & Heike Hofmann. 2006. Graphics of large datasets: visualizing a million. New York: Springer.Search in Google Scholar
Wickham, Hadley. 2009. ggplot2: elegant graphics for data analysis. New York: Springer.10.1007/978-0-387-98141-3Search in Google Scholar
© 2012 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Articles in the same Issue
- Inhalt
- Aufsätze
- Sprache und Geschlecht im universitären Diskurs [Language and Gender in University Discourse]
- Sprache und Geschlecht im universitären Diskurs
- Themenschwerpunkt „Empirie und Modellierung“
- Einleitung [Introduction]
- Einleitung zum Themenheft Empirie und Modellierung
- Statistische Modellierung [Statistical Modeling]
- Statistische Modellierung
- Ereigniskorrelierte Potenziale (EKPs) [Event-Related Potentials (ERPs)]
- Ereigniskorrelierte Potenziale (EKPs)
- Computationelle Neurolinguistik [Computational Neurolinguistics]
- Computationelle Neurolinguistik
- Berichte
- Empirische Zugänge zur Semantik. Arbeitstagung Linguistische Pragmatik an der Georg-August-Universität Göttingen am 22. Februar 2011 [Empirical Approaches to Semantics. Working Meeting on Linguistic Pragmatics at the University of Göttingen, on February 22nd 2011]
- Empirische Zugänge zur Semantik
- Historische Semantik im Dialog der Fachkulturen. 2. Jahrestagung der Gesellschaft für germanistische Sprachgeschichte e. V. in Heidelberg [Historical Semantics in the Dialogue of Professional Cultures. 2nd Annual Conference of the Society for German Language History in Heidelberg]
- Historische Semantik im Dialog der Fachkulturen
- Linguistik im Internet
- Bildlinguistik (Text und Bild) im Internet [Image Linguistics (Text and Image) on the Internet]
- Bildlinguistik (Text & Bild) im Internet
- Neue Bücher 2011
- Neue Bücher 2011
- Zeitschriftenschau [Journal Review]
- Zeitschriftenschau
Articles in the same Issue
- Inhalt
- Aufsätze
- Sprache und Geschlecht im universitären Diskurs [Language and Gender in University Discourse]
- Sprache und Geschlecht im universitären Diskurs
- Themenschwerpunkt „Empirie und Modellierung“
- Einleitung [Introduction]
- Einleitung zum Themenheft Empirie und Modellierung
- Statistische Modellierung [Statistical Modeling]
- Statistische Modellierung
- Ereigniskorrelierte Potenziale (EKPs) [Event-Related Potentials (ERPs)]
- Ereigniskorrelierte Potenziale (EKPs)
- Computationelle Neurolinguistik [Computational Neurolinguistics]
- Computationelle Neurolinguistik
- Berichte
- Empirische Zugänge zur Semantik. Arbeitstagung Linguistische Pragmatik an der Georg-August-Universität Göttingen am 22. Februar 2011 [Empirical Approaches to Semantics. Working Meeting on Linguistic Pragmatics at the University of Göttingen, on February 22nd 2011]
- Empirische Zugänge zur Semantik
- Historische Semantik im Dialog der Fachkulturen. 2. Jahrestagung der Gesellschaft für germanistische Sprachgeschichte e. V. in Heidelberg [Historical Semantics in the Dialogue of Professional Cultures. 2nd Annual Conference of the Society for German Language History in Heidelberg]
- Historische Semantik im Dialog der Fachkulturen
- Linguistik im Internet
- Bildlinguistik (Text und Bild) im Internet [Image Linguistics (Text and Image) on the Internet]
- Bildlinguistik (Text & Bild) im Internet
- Neue Bücher 2011
- Neue Bücher 2011
- Zeitschriftenschau [Journal Review]
- Zeitschriftenschau