How Random is a Corpus? The Library Metaphor

Stefan Evert

doi:10.1515/zaa-2006-0208

Article

How Random is a Corpus? The Library Metaphor

Stefan Evert

Published/Copyright: April 1, 2006

Published by

Become an author with De Gruyter Brill

Author Information

From the journal Zeitschrift für Anglistik und Amerikanistik Volume 54 Issue 2

Abstract

There is a stark contrast between the random sample model underlying the statistical analysis of corpus frequency data and our intuitive knowledge that sentences are more than random bags of words. The 'library metaphor' illustrates how randomness results from the selection of a corpus as the basis for a linguistic study. At the same time it reveals two reasons why corpus data do not fully meet the assumptions of the random sample model. Finally, practicable methods for identifying and quantifying non-randomness are introduced and demonstrated on the example of passive verb forms.

Online erschienen: 2006-4-1

Erschienen im Druck: 2006-4-1

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/zaa-2006-0208