InterCorp: A parallel corpus of 40 languages

Petr Čermák

Chapter

InterCorp

A parallel corpus of 40 languages

Petr Čermák

Published by

John Benjamins Publishing Company

View more publications by John Benjamins Publishing Company

This chapter is in the book Parallel Corpora for Contrastive and Translation Studies

Abstract

This chapter presents the current version of InterCorp, a parallel corpus created at the Faculty of Arts, Charles University in Prague. The corpus contains texts in Czech aligned with one or more foreign-language version(s), including Czech and 39 other languages. The chapter analyses its structure and technical parameters, and describes some technical tools used with the corpus (Kontext, a corpus query interface, and InterText, a parallel text alignment editor created specifically for the project). Similarly, the contribution discusses Treq (Translation Equivalents Database), a collection of bilingual Czech-foreign language dictionaries built automatically from InterCorp. In the last section of the chapter, the possibilities for methodological and linguistic exploitation of the corpus are discussed.

You are currently not able to access this content.

Abstract

You are currently not able to access this content.

Chapters in this book

Prelim pages i
Table of contents v
Acknowledgments ix
Parallel corpora in focus 1
Part I. Parallel corpora
Comparable parallel corpora 19
Living with parallel corpora 39
Working with parallel corpora 57
Innovations in parallel corpus alignment and retrieval 79
Part II. Parallel corpora
InterCorp 93
Corpus PaGeS 103
Building EPTIC 123
Enriching parallel corpora with multimedia and lexical semantics 141
Discourse annotation in the MULTINOT corpus 159
PEST 183
Indexation and analysis of a parallel corpus using CQPweb 197
P-ACTRES 2.0 215
An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus 233
Part III. Parallel corpora
Strategies for building high quality bilingual lexicons from comparable corpora 251
Discovering bilingual collocations in parallel corpora 267
Normalization of shorthand forms in French text messages using word embedding and machine translation 281
Index 299

https://doi.org/10.1075/scl.90.06cer

Chapters in this book

Prelim pages i
Table of contents v
Acknowledgments ix
Parallel corpora in focus 1
Part I. Parallel corpora
Comparable parallel corpora 19
Living with parallel corpora 39
Working with parallel corpora 57
Innovations in parallel corpus alignment and retrieval 79
Part II. Parallel corpora
InterCorp 93
Corpus PaGeS 103
Building EPTIC 123
Enriching parallel corpora with multimedia and lexical semantics 141
Discourse annotation in the MULTINOT corpus 159
PEST 183
Indexation and analysis of a parallel corpus using CQPweb 197
P-ACTRES 2.0 215
An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus 233
Part III. Parallel corpora
Strategies for building high quality bilingual lexicons from comparable corpora 251
Discovering bilingual collocations in parallel corpora 267
Normalization of shorthand forms in French text messages using word embedding and machine translation 281
Index 299

InterCorp

Abstract

Chapter PDF View

Abstract

Chapters in this book

Chapters in this book

Chapters in this book