P-ACTRES 2.0: A parallel corpus for cross-linguistic research

Hugo Sanjurjo-González; Marlén Izquierdo

Chapter

P-ACTRES 2.0

A parallel corpus for cross-linguistic research

Hugo Sanjurjo-González and Marlén Izquierdo

Published by

John Benjamins Publishing Company

View more publications by John Benjamins Publishing Company

This chapter is in the book Parallel Corpora for Contrastive and Translation Studies

Abstract

This chapter describes an updated version of the ACTRES Parallel Corpus (P-ACTRES 2.0), an English-Spanish bidirectional corpus that contains over 4 million words. The composition of the corpus is recounted, regarding the number of words in each direction, and the types of texts included together with the linguistic variants that users will find in the corpus. Its composition is shaped by research purposes as well as availability issues. The computerization process is also explained, while commenting on the text processing, alignment and tagging. The chapter concludes with a brief demonstration of the usefulness and usability of P-ACTRES 2.0 in cross-linguistic research, be it contrastive linguistics or translation studies either independently or, most importantly, jointly.

You are currently not able to access this content.

Abstract

You are currently not able to access this content.

Chapters in this book

Prelim pages i
Table of contents v
Acknowledgments ix
Parallel corpora in focus 1
Part I. Parallel corpora
Comparable parallel corpora 19
Living with parallel corpora 39
Working with parallel corpora 57
Innovations in parallel corpus alignment and retrieval 79
Part II. Parallel corpora
InterCorp 93
Corpus PaGeS 103
Building EPTIC 123
Enriching parallel corpora with multimedia and lexical semantics 141
Discourse annotation in the MULTINOT corpus 159
PEST 183
Indexation and analysis of a parallel corpus using CQPweb 197
P-ACTRES 2.0 215
An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus 233
Part III. Parallel corpora
Strategies for building high quality bilingual lexicons from comparable corpora 251
Discovering bilingual collocations in parallel corpora 267
Normalization of shorthand forms in French text messages using word embedding and machine translation 281
Index 299

https://doi.org/10.1075/scl.90.13san

Chapters in this book

Prelim pages i
Table of contents v
Acknowledgments ix
Parallel corpora in focus 1
Part I. Parallel corpora
Comparable parallel corpora 19
Living with parallel corpora 39
Working with parallel corpora 57
Innovations in parallel corpus alignment and retrieval 79
Part II. Parallel corpora
InterCorp 93
Corpus PaGeS 103
Building EPTIC 123
Enriching parallel corpora with multimedia and lexical semantics 141
Discourse annotation in the MULTINOT corpus 159
PEST 183
Indexation and analysis of a parallel corpus using CQPweb 197
P-ACTRES 2.0 215
An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus 233
Part III. Parallel corpora
Strategies for building high quality bilingual lexicons from comparable corpora 251
Discovering bilingual collocations in parallel corpora 267
Normalization of shorthand forms in French text messages using word embedding and machine translation 281
Index 299

P-ACTRES 2.0

Abstract

Chapter PDF View

Abstract

Chapters in this book

Chapters in this book

Chapters in this book