Presented to you through Paradigm Publishing Services

John Benjamins Publishing Company

Chapter
Licensed
Unlicensed Requires Authentication

Working with parallel corpora

Usefulness and usability

Abstract

Although parallel corpora are vital for cross-linguistic and natural language processing (NLP) research, most have been designed for just one particular purpose, which may unnecessarily restrict their usefulness and usability. My argument is that the usefulness of existing parallel corpora increases exponentially when data so obtained are combined with those yielded by comparable and/or monolingual corpora. Usability criteria such as the choice of processing tools and adherence to international standards, among others, also have an impact on corpus usefulness. This chapter proposes courses of action that serve to improve the recycling and reprocessing of available resources. It also presents a corpus-based, post-editing and quality assessment application as an illustration of the multifarious uses parallel corpora may serve.

Abstract

Although parallel corpora are vital for cross-linguistic and natural language processing (NLP) research, most have been designed for just one particular purpose, which may unnecessarily restrict their usefulness and usability. My argument is that the usefulness of existing parallel corpora increases exponentially when data so obtained are combined with those yielded by comparable and/or monolingual corpora. Usability criteria such as the choice of processing tools and adherence to international standards, among others, also have an impact on corpus usefulness. This chapter proposes courses of action that serve to improve the recycling and reprocessing of available resources. It also presents a corpus-based, post-editing and quality assessment application as an illustration of the multifarious uses parallel corpora may serve.

Downloaded on 14.4.2026 from https://www.degruyterbrill.com/document/doi/10.1075/scl.90.04rab/html
Scroll to top button