Home Linguistics & Semiotics Text length and short texts
Chapter
Licensed
Unlicensed Requires Authentication

Text length and short texts

An overview of the problem
  • Aatu Liimatta
View more publications by John Benjamins Publishing Company
Challenges in Corpus Linguistics
This chapter is in the book Challenges in Corpus Linguistics

Abstract

Variation in text length is an unavoidable confounder in quantitative text-analytic corpus-linguistic studies. Texts can be difficult to compare across text lengths, particularly if many of them are short, due to the difficulty of calculating meaningful frequencies for the lexical items and linguistic features of interest. Traditionally, this has been less of an issue, since texts in many of the genres typically studied in linguistics have been relatively long. However, the rise of social media has brought the issue to the forefront. In this chapter, I describe the problem of text length and short texts together with a number of solutions and workarounds to this and related problems.

Abstract

Variation in text length is an unavoidable confounder in quantitative text-analytic corpus-linguistic studies. Texts can be difficult to compare across text lengths, particularly if many of them are short, due to the difficulty of calculating meaningful frequencies for the lexical items and linguistic features of interest. Traditionally, this has been less of an issue, since texts in many of the genres typically studied in linguistics have been relatively long. However, the rise of social media has brought the issue to the forefront. In this chapter, I describe the problem of text length and short texts together with a number of solutions and workarounds to this and related problems.

Downloaded on 5.3.2026 from https://www.degruyterbrill.com/document/doi/10.1075/scl.118.07lii/html
Scroll to top button