Comparable parallel corpora
-
Lidun Hareide
Abstract
Are papers presented in corpus-based translation studies truly scientific? These are normally done on only one language pair, often on purpose-made parallel corpora, and can normally not be replicated. Therefore their value is limited in a strictly scientific sense. The use of comparable parallel corpora allows both for the replication of studies, and the testing of complex hypotheses like Halverson’s Gravitational Pull hypothesis. This chapter defines and discusses the concept of comparable parallel corpora, and exemplifies their value by illustrating their use. The chapter also presents hopes for the future, as new groundbreaking technology that will allow the linguist to create her own parallel corpora without the aid of computer scientists is currently being launched at the University of León in Spain.
Abstract
Are papers presented in corpus-based translation studies truly scientific? These are normally done on only one language pair, often on purpose-made parallel corpora, and can normally not be replicated. Therefore their value is limited in a strictly scientific sense. The use of comparable parallel corpora allows both for the replication of studies, and the testing of complex hypotheses like Halverson’s Gravitational Pull hypothesis. This chapter defines and discusses the concept of comparable parallel corpora, and exemplifies their value by illustrating their use. The chapter also presents hopes for the future, as new groundbreaking technology that will allow the linguist to create her own parallel corpora without the aid of computer scientists is currently being launched at the University of León in Spain.
Chapters in this book
- Prelim pages i
- Table of contents v
- Acknowledgments ix
- Parallel corpora in focus 1
-
Part I. Parallel corpora
- Comparable parallel corpora 19
- Living with parallel corpora 39
- Working with parallel corpora 57
- Innovations in parallel corpus alignment and retrieval 79
-
Part II. Parallel corpora
- InterCorp 93
- Corpus PaGeS 103
- Building EPTIC 123
- Enriching parallel corpora with multimedia and lexical semantics 141
- Discourse annotation in the MULTINOT corpus 159
- PEST 183
- Indexation and analysis of a parallel corpus using CQPweb 197
- P-ACTRES 2.0 215
- An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus 233
-
Part III. Parallel corpora
- Strategies for building high quality bilingual lexicons from comparable corpora 251
- Discovering bilingual collocations in parallel corpora 267
- Normalization of shorthand forms in French text messages using word embedding and machine translation 281
- Index 299
Chapters in this book
- Prelim pages i
- Table of contents v
- Acknowledgments ix
- Parallel corpora in focus 1
-
Part I. Parallel corpora
- Comparable parallel corpora 19
- Living with parallel corpora 39
- Working with parallel corpora 57
- Innovations in parallel corpus alignment and retrieval 79
-
Part II. Parallel corpora
- InterCorp 93
- Corpus PaGeS 103
- Building EPTIC 123
- Enriching parallel corpora with multimedia and lexical semantics 141
- Discourse annotation in the MULTINOT corpus 159
- PEST 183
- Indexation and analysis of a parallel corpus using CQPweb 197
- P-ACTRES 2.0 215
- An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus 233
-
Part III. Parallel corpora
- Strategies for building high quality bilingual lexicons from comparable corpora 251
- Discovering bilingual collocations in parallel corpora 267
- Normalization of shorthand forms in French text messages using word embedding and machine translation 281
- Index 299