Home Linguistics & Semiotics Innovations in parallel corpus alignment and retrieval
Chapter
Licensed
Unlicensed Requires Authentication

Innovations in parallel corpus alignment and retrieval

  • Martin Volk
View more publications by John Benjamins Publishing Company

Abstract

In this chapter, we give an overview of parallel corpus annotation, alignment and retrieval. We present standard annotation methods such as Part-of-Speech tagging, lemmatization and dependency parsing, but we also introduce language-specific methods, for example for dealing with split verbs or truncated compounds in German. Our corpus annotation includes the identification of code-switching within sentences as a special case of language identification. We argue for careful sentence and word alignment for parallel corpora. And we explain how word alignment is the basis for a wide range of applications from translation variant ranking to lemma disambiguation.

Abstract

In this chapter, we give an overview of parallel corpus annotation, alignment and retrieval. We present standard annotation methods such as Part-of-Speech tagging, lemmatization and dependency parsing, but we also introduce language-specific methods, for example for dealing with split verbs or truncated compounds in German. Our corpus annotation includes the identification of code-switching within sentences as a special case of language identification. We argue for careful sentence and word alignment for parallel corpora. And we explain how word alignment is the basis for a wide range of applications from translation variant ranking to lemma disambiguation.

Downloaded on 12.2.2026 from https://www.degruyterbrill.com/document/doi/10.1075/scl.90.05vol/html
Scroll to top button