Home Linguistics & Semiotics Describing a translational corpus
Chapter
Licensed
Unlicensed Requires Authentication

Describing a translational corpus

  • Michael P. Oakes
View more publications by John Benjamins Publishing Company

Abstract

There are a number of different ways to describe a single corpus. We consider how the frequencies of linguistic features may be quantified, such as in terms of their “average” occurrence, dispersion among text segments, and whether they follow the familiar “bell curve” characteristic of a normal distribution. We describe how to determine the required corpus size so that these things can be measured with the required degree of confidence. We consider “aboutness”: the extent to which individual linguistic features characterise the corpus as a whole. We describe the vocabulary richness, the extent to which the author of a text constantly brings in new vocabulary, and collocations: groups of words which are found together more often than one would expect by chance.

Abstract

There are a number of different ways to describe a single corpus. We consider how the frequencies of linguistic features may be quantified, such as in terms of their “average” occurrence, dispersion among text segments, and whether they follow the familiar “bell curve” characteristic of a normal distribution. We describe how to determine the required corpus size so that these things can be measured with the required degree of confidence. We consider “aboutness”: the extent to which individual linguistic features characterise the corpus as a whole. We describe the vocabulary richness, the extent to which the author of a text constantly brings in new vocabulary, and collocations: groups of words which are found together more often than one would expect by chance.

Downloaded on 11.9.2025 from https://www.degruyterbrill.com/document/doi/10.1075/scl.51.05oak/html
Scroll to top button