Home German Linguistics Statistische Modellierung
Article
Licensed
Unlicensed Requires Authentication

Statistische Modellierung

  • Stefan Th. Gries
Published/Copyright: January 1, 2012

Abstract

This paper provides an overview of central aspects of statistical modeling of linguistic data. Starting from a general definition of model, the paper discusses the goals of modeling as well as a variety of issues bearing upon the formulation/definition of statistical models. It then surveys model selection, the choice of ‘the best model’ and three fundamental notions affecting the interpretation of models. Finally, the paper turns to validation and replicability and addresses a variety of challenges researchers face during modeling processes.

Danksagung

Ich danke Stefanie Wulff, Anke Lüdeling und zwei Reviewern für ihren Input zu früheren Stadien dieses Artikels.

Literatur

Baayen, R. Harald. 2008. Analyzing linguistic data: a practical introduction to statistics using R. Cambridge: Cambridge University Press.10.1017/CBO9780511801686Search in Google Scholar

Baayen, R. Harald. 2010a. A real experiment is a factorial experiment? The Mental Lexicon 5(1). 149–157.10.1075/ml.5.1.06baaSearch in Google Scholar

Baayen, R. Harald. 2010b. Demythologizing the word frequency effect: a discriminative learning perspective. The Mental Lexicon 5(3). 436–461.10.1075/ml.5.3.10baaSearch in Google Scholar

Baayen, R. Harald. 2011. Corpus linguistics and naive discriminative learning. Brazilian Journal of Applied Linguistics 11(2). 295–328.10.1590/S1984-63982011000200003Search in Google Scholar

Clark, Herbert H. 1973. The Language-as-Fixed-Effect Fallacy: a critique of language statistics in psychological research. Journal of Verbal Learning and Verbal Behavior 12(4). 335–359.10.1016/S0022-5371(73)80014-3Search in Google Scholar

Crawley, Michael J. 2007. The R book. Chichester: John Wiley.Search in Google Scholar

Cook, Dianne & Deborah F. Swayne. 2007. Interactive and dynamic graphics for data analysis. With R and GGobi. New York: Springer.10.1007/978-0-387-71762-3Search in Google Scholar

Faraway, Julian J. 2005. Linear models with R. Boca Raton, FL: Chapman & Hall/CRC.Search in Google Scholar

Faraway, Julian J. 2006. Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models. Boca Raton, FL: Chapman & Hall/CRC.Search in Google Scholar

Forster, Kenneth I. & R. G. Dickinson. 1976. More on the language-as-fixed-effect fallacy: Monte Carlo estimates of error rates for F1, F2, F′, and min F′. Journal of Verbal Learning and Verbal Behavior 15(2). 135–142.10.1016/0022-5371(76)90014-1Search in Google Scholar

Fox, John. 1997. Applied regression analysis, linear models, and related methods. Thousand Oaks, CA: Sage Publications.Search in Google Scholar

Fox, John. 2008. Applied regression analysis and generalized linear models. 2nd ed. Thousand Oaks, CA: Sage PublicationsSearch in Google Scholar

Gelman, Andrew & Jennifer Hill. 2007. Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press.10.1017/CBO9780511790942Search in Google Scholar

Gries, Stefan Th. 1999. Particle movement: a cognitive and functional approach. Cognitive Linguistics 10(2). 105–145.10.1515/cogl.1999.005Search in Google Scholar

Gries, Stefan Th. 2006. Exploring variability within and between corpora: some methodological considerations. Corpora 1(2). 109–151.10.3366/cor.2006.1.2.109Search in Google Scholar

Gries, Stefan Th. 2008. Statistik für Sprachwissenschaftler. Göttingen: Vandenhoeck & Ruprecht.10.1515/glot-2009-0025Search in Google Scholar

Gries, Stefan Th. 2009. Statistics for linguistics with R: a practical introduction. Berlin & New York: Mouton de Gruyter.10.1515/9783110216042Search in Google Scholar

Gries, Stefan Th. 2011. Commentary. In Kathryn Allan & Justyna Robinson (Hrsg.), Current methods in historical semantics, 184–195. Berlin & New York: Mouton de Gruyter.Search in Google Scholar

Harrell, Frank E. Jr. 2001. Regression modeling strategies. With applications to linear models, logistic regression, and survival analysis. New York: Springer.10.1007/978-1-4757-3462-1Search in Google Scholar

Hastie, Trevor & Robert Tibshirani. 1990. Generalized additive models. Boca Raton, FL: Chapman & Hall/CRC.Search in Google Scholar

Hastie, Trevor, Robert Tibshirani & Jerome Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction. 2nd ed. New York: Springer.Search in Google Scholar

Hilpert, Martin. 2011. Diachronic collostructional analysis: how to use it and how to deal with confounding factors. In Kathryn Allan & Justyna Robinson (Hrsg.), Current methods in historical semantics, 130–160. Berlin & New York: Mouton de Gruyter.10.1515/9783110252903.133Search in Google Scholar

Jankowski, Bridget. 2004. A transatlantic perspective of variation and change in English deontic modality. Toronto Working Papers in Linguistics 23(2). 85–113.Search in Google Scholar

Jarvis, Scott. 2011. Data mining with learner corpora: choosing L1 classifiers for L1 detection. In Fanny Meunier, Sylvie De Cock, Gaëtanelle Gilquin & Magali Paquot (Hrsg.), A taste for corpora. In honor of Sylviane Granger, 127–154. Amsterdam & Philadelphia: John Benjamins.10.1075/scl.45.10jarSearch in Google Scholar

Johnson, Keith. 2008. Quantitative methods in linguistics. Malden, MA: Blackwell.Search in Google Scholar

Keen, Kevin J. 2010. Graphics for statistics and data analysis with R. Boca Raton, FL: CRC Press, Taylor & Francis Group.Search in Google Scholar

Leech, Geoffrey, Brian Francis & Xfueng Xu. 1994. The use of computer corpora in the textual demonstrability of gradience in linguistic categories. In Catherine Fuchs & Bernard Victorri (Hrsg.), Continuity in linguistic semantics, 57–76. Amsterdam & Philadelphia: John Benjamins.10.1075/lis.19.07leeSearch in Google Scholar

Molinaro, Anette M., Richard Simon & Ruth M. Pfeiffer. 2005. Prediction error estimation: a comparison of resampling methods. Bioinformatics 21(15). 3301–3307.10.1093/bioinformatics/bti499Search in Google Scholar

Murrell, Paul. 2011. R graphics. 2nd ed. Boca Raton, FL: CRC Press, Taylor & Francis Group.Search in Google Scholar

Paolillo, John C. 2002. Analyzing linguistic variation: statistical models and methods. Stanford, CA: CSLI Publications.Search in Google Scholar

R Development Core Team. 2011. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. URL <http://www.R-project.org/>.Search in Google Scholar

Sarkar, Deepayan. 2008. Lattice: multivariate data visualization with R. New York: Springer.10.1007/978-0-387-75969-2Search in Google Scholar

Suzuki, Ryota & Hidetoshi Shimodaira. 2011. pvclust: Hierarchical clustering with p-values via multiscale bootstrap resampling. R package version 1.2–2. <http://CRAN.R-project.org/package=pvclust>.Search in Google Scholar

Teich, Elke & Peter Fankhauser. 2010. Exploring a corpus of scientific texts using data mining. In Stefan Th. Gries, Stefanie Wulff & Mark Davies (Hrsg.), Corpus-linguistic applications: current studies, new directions, 233–247. Amsterdam: Rodopi.10.1163/9789042028012_016Search in Google Scholar

Torgo, Luís. 2011. Data mining with R: learning with case studies. Boca Raton, FL: Chapman & Hall/CRC.10.1201/b10328Search in Google Scholar

Tryk, H. Edward. 1986. Subjective scaling of word frequency. The American Journal of Psychology 81(2). 170–177.10.2307/1421261Search in Google Scholar

Tuttle, Siri G. & Olga Lovick. 2007. Intonational marking of discourse units in two Dena’ina narratives. Nouveaux Cahiers de Linguistique Française 28. 305–316.Search in Google Scholar

Unwin, Antony, Martin Theus & Heike Hofmann. 2006. Graphics of large datasets: visualizing a million. New York: Springer.Search in Google Scholar

Wickham, Hadley. 2009. ggplot2: elegant graphics for data analysis. New York: Springer.10.1007/978-0-387-98141-3Search in Google Scholar

Online erschienen: 2012
Erschienen im Druck: 2012

© 2012 Walter de Gruyter GmbH & Co. KG, Berlin/Boston

Articles in the same Issue

  1. Inhalt
  2. Aufsätze
  3. Sprache und Geschlecht im universitären Diskurs [Language and Gender in University Discourse]
  4. Sprache und Geschlecht im universitären Diskurs
  5. Themenschwerpunkt „Empirie und Modellierung“
  6. Einleitung [Introduction]
  7. Einleitung zum Themenheft Empirie und Modellierung
  8. Statistische Modellierung [Statistical Modeling]
  9. Statistische Modellierung
  10. Ereigniskorrelierte Potenziale (EKPs) [Event-Related Potentials (ERPs)]
  11. Ereigniskorrelierte Potenziale (EKPs)
  12. Computationelle Neurolinguistik [Computational Neurolinguistics]
  13. Computationelle Neurolinguistik
  14. Berichte
  15. Empirische Zugänge zur Semantik. Arbeitstagung Linguistische Pragmatik an der Georg-August-Universität Göttingen am 22. Februar 2011 [Empirical Approaches to Semantics. Working Meeting on Linguistic Pragmatics at the University of Göttingen, on February 22nd 2011]
  16. Empirische Zugänge zur Semantik
  17. Historische Semantik im Dialog der Fachkulturen. 2. Jahrestagung der Gesellschaft für germanistische Sprachgeschichte e. V. in Heidelberg [Historical Semantics in the Dialogue of Professional Cultures. 2nd Annual Conference of the Society for German Language History in Heidelberg]
  18. Historische Semantik im Dialog der Fachkulturen
  19. Linguistik im Internet
  20. Bildlinguistik (Text und Bild) im Internet [Image Linguistics (Text and Image) on the Internet]
  21. Bildlinguistik (Text & Bild) im Internet
  22. Neue Bücher 2011
  23. Neue Bücher 2011
  24. Zeitschriftenschau [Journal Review]
  25. Zeitschriftenschau
Downloaded on 25.1.2026 from https://www.degruyterbrill.com/document/doi/10.1515/zgl-2012-0004/pdf
Scroll to top button