Presented to you through Paradigm Publishing Services

John Benjamins Publishing Company

Visit our Partner Page See all our books

Chapter

Enriching parallel corpora with multimedia and lexical semantics

From the CLUVI Corpus to WordNet and SemCor

Abstract

In this chapter, I present the main characteristics of the CLUVI Corpus, an open collection of sentence-level aligned parallel corpora with over 44 million words in nine specialised domains (fiction, computing, popular science, biblical texts, law, consumer information, economy, tourism, and film subtitling) and different language combinations including Galician, Spanish, English, French, Portuguese, Catalan, Italian, Basque and Latin. Then, I present the methodology developed for extending the film subtitles section of the CLUVI Corpus with multimedia data. Finally, I discuss the resources and methods used to build the SensoGal Corpus, a SemCor-based English-Galician parallel corpus semantically annotated based on WordNet and aligned at the sentence and word levels.

You are currently not able to access this content.

Abstract

In this chapter, I present the main characteristics of the CLUVI Corpus, an open collection of sentence-level aligned parallel corpora with over 44 million words in nine specialised domains (fiction, computing, popular science, biblical texts, law, consumer information, economy, tourism, and film subtitling) and different language combinations including Galician, Spanish, English, French, Portuguese, Catalan, Italian, Basque and Latin. Then, I present the methodology developed for extending the film subtitles section of the CLUVI Corpus with multimedia data. Finally, I discuss the resources and methods used to build the SensoGal Corpus, a SemCor-based English-Galician parallel corpus semantically annotated based on WordNet and aligned at the sentence and word levels.

You are currently not able to access this content.

Chapters in this book

Prelim pages i
Table of contents v
Acknowledgments ix
Parallel corpora in focus 1
Part I. Parallel corpora
Comparable parallel corpora 19
Living with parallel corpora 39
Working with parallel corpora 57
Innovations in parallel corpus alignment and retrieval 79
Part II. Parallel corpora
InterCorp 93
Corpus PaGeS 103
Building EPTIC 123
Enriching parallel corpora with multimedia and lexical semantics 141
Discourse annotation in the MULTINOT corpus 159
PEST 183
Indexation and analysis of a parallel corpus using CQPweb 197
P-ACTRES 2.0 215
An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus 233
Part III. Parallel corpora
Strategies for building high quality bilingual lexicons from comparable corpora 251
Discovering bilingual collocations in parallel corpora 267
Normalization of shorthand forms in French text messages using word embedding and machine translation 281
Index 299

Parallel Corpora for Contrastive and Translation Studies

This chapter is in the book Parallel Corpora for Contrastive and Translation Studies

https://doi.org/10.1075/scl.90.09gom

Chapters in this book

Prelim pages i
Table of contents v
Acknowledgments ix
Parallel corpora in focus 1
Part I. Parallel corpora
Comparable parallel corpora 19
Living with parallel corpora 39
Working with parallel corpora 57
Innovations in parallel corpus alignment and retrieval 79
Part II. Parallel corpora
InterCorp 93
Corpus PaGeS 103
Building EPTIC 123
Enriching parallel corpora with multimedia and lexical semantics 141
Discourse annotation in the MULTINOT corpus 159
PEST 183
Indexation and analysis of a parallel corpus using CQPweb 197
P-ACTRES 2.0 215
An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus 233
Part III. Parallel corpora
Strategies for building high quality bilingual lexicons from comparable corpora 251
Discovering bilingual collocations in parallel corpora 267
Normalization of shorthand forms in French text messages using word embedding and machine translation 281
Index 299

Downloaded on 7.4.2026 from https://www.degruyterbrill.com/document/doi/10.1075/scl.90.09gom/html