P-ACTRES 2.0
-
Hugo Sanjurjo-González
Abstract
This chapter describes an updated version of the ACTRES Parallel Corpus (P-ACTRES 2.0), an English-Spanish bidirectional corpus that contains over 4 million words. The composition of the corpus is recounted, regarding the number of words in each direction, and the types of texts included together with the linguistic variants that users will find in the corpus. Its composition is shaped by research purposes as well as availability issues. The computerization process is also explained, while commenting on the text processing, alignment and tagging. The chapter concludes with a brief demonstration of the usefulness and usability of P-ACTRES 2.0 in cross-linguistic research, be it contrastive linguistics or translation studies either independently or, most importantly, jointly.
Abstract
This chapter describes an updated version of the ACTRES Parallel Corpus (P-ACTRES 2.0), an English-Spanish bidirectional corpus that contains over 4 million words. The composition of the corpus is recounted, regarding the number of words in each direction, and the types of texts included together with the linguistic variants that users will find in the corpus. Its composition is shaped by research purposes as well as availability issues. The computerization process is also explained, while commenting on the text processing, alignment and tagging. The chapter concludes with a brief demonstration of the usefulness and usability of P-ACTRES 2.0 in cross-linguistic research, be it contrastive linguistics or translation studies either independently or, most importantly, jointly.
Chapters in this book
- Prelim pages i
- Table of contents v
- Acknowledgments ix
- Parallel corpora in focus 1
-
Part I. Parallel corpora
- Comparable parallel corpora 19
- Living with parallel corpora 39
- Working with parallel corpora 57
- Innovations in parallel corpus alignment and retrieval 79
-
Part II. Parallel corpora
- InterCorp 93
- Corpus PaGeS 103
- Building EPTIC 123
- Enriching parallel corpora with multimedia and lexical semantics 141
- Discourse annotation in the MULTINOT corpus 159
- PEST 183
- Indexation and analysis of a parallel corpus using CQPweb 197
- P-ACTRES 2.0 215
- An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus 233
-
Part III. Parallel corpora
- Strategies for building high quality bilingual lexicons from comparable corpora 251
- Discovering bilingual collocations in parallel corpora 267
- Normalization of shorthand forms in French text messages using word embedding and machine translation 281
- Index 299
Chapters in this book
- Prelim pages i
- Table of contents v
- Acknowledgments ix
- Parallel corpora in focus 1
-
Part I. Parallel corpora
- Comparable parallel corpora 19
- Living with parallel corpora 39
- Working with parallel corpora 57
- Innovations in parallel corpus alignment and retrieval 79
-
Part II. Parallel corpora
- InterCorp 93
- Corpus PaGeS 103
- Building EPTIC 123
- Enriching parallel corpora with multimedia and lexical semantics 141
- Discourse annotation in the MULTINOT corpus 159
- PEST 183
- Indexation and analysis of a parallel corpus using CQPweb 197
- P-ACTRES 2.0 215
- An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus 233
-
Part III. Parallel corpora
- Strategies for building high quality bilingual lexicons from comparable corpora 251
- Discovering bilingual collocations in parallel corpora 267
- Normalization of shorthand forms in French text messages using word embedding and machine translation 281
- Index 299