Agile corpus creation

Holger Voormann; Ulrike Gut

doi:10.1515/CLLT.2008.010

Article

Agile corpus creation

Holger Voormann and Ulrike Gut

Published/Copyright: December 9, 2008

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

Corpus Linguistics and Linguistic Theory

From the journal Volume 4 Issue 2

Abstract

In the past decades language corpora have become indispensable tools for linguistic research and the development of linguistic theory. However, it is not yet widely acknowledged that the quality of corpus-based research and theories depends crucially on the quality of the corpora, not only in terms of their content and size but especially as far as the accuracy and richness of the annotations are concerned. Neither has much systematic thought gone into the effectiveness of the traditional corpus creation process regarding this problem. This paper proposes a novel approach to corpus creation – agile corpus creation – that addresses the problem of simultaneously maximizing corpus size as well as the quality and quantity of manual and automatic annotations while minimizing the time and cost involved in corpus creation. The central aspects of agile corpus creation lie in the reorganization of the traditional linear and separate phases of corpus design, data collection, data annotation and corpus analysis and in the recognition of potential sources of errors during corpus creation.

Keywords:: corpus creation process; cyclic process model; corpus design; quality of corpora

Published Online: 2008-12-09

Published in Print: 2008-November

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/CLLT.2008.010

Keywords for this article

corpus creation process; cyclic process model; corpus design; quality of corpora