PEST
-
Mikhail Mikhailov
, Miia Santalahti and Julia Souma
Abstract
This chapter introduces the Parallel Electronic corpus of State Treaties (PEST). The current plan is to compile a parallel corpus, which will include treaties concluded between Russia and Finland, Finland and Sweden, and Sweden and Russia. In addition, there will be a subcorpus of international conventions in all three languages plus English, to be used as reference data. The chapter describes the structure of the subcorpora (number of documents, their chronological distribution and topics featured), and it also addresses the challenges of balancing such a corpus. In the future, this material can be used for studies ranging from lexicon and semantics to grammar, style, discourse, translation studies, and language for special purposes.
Abstract
This chapter introduces the Parallel Electronic corpus of State Treaties (PEST). The current plan is to compile a parallel corpus, which will include treaties concluded between Russia and Finland, Finland and Sweden, and Sweden and Russia. In addition, there will be a subcorpus of international conventions in all three languages plus English, to be used as reference data. The chapter describes the structure of the subcorpora (number of documents, their chronological distribution and topics featured), and it also addresses the challenges of balancing such a corpus. In the future, this material can be used for studies ranging from lexicon and semantics to grammar, style, discourse, translation studies, and language for special purposes.
Chapters in this book
- Prelim pages i
- Table of contents v
- Acknowledgments ix
- Parallel corpora in focus 1
-
Part I. Parallel corpora
- Comparable parallel corpora 19
- Living with parallel corpora 39
- Working with parallel corpora 57
- Innovations in parallel corpus alignment and retrieval 79
-
Part II. Parallel corpora
- InterCorp 93
- Corpus PaGeS 103
- Building EPTIC 123
- Enriching parallel corpora with multimedia and lexical semantics 141
- Discourse annotation in the MULTINOT corpus 159
- PEST 183
- Indexation and analysis of a parallel corpus using CQPweb 197
- P-ACTRES 2.0 215
- An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus 233
-
Part III. Parallel corpora
- Strategies for building high quality bilingual lexicons from comparable corpora 251
- Discovering bilingual collocations in parallel corpora 267
- Normalization of shorthand forms in French text messages using word embedding and machine translation 281
- Index 299
Chapters in this book
- Prelim pages i
- Table of contents v
- Acknowledgments ix
- Parallel corpora in focus 1
-
Part I. Parallel corpora
- Comparable parallel corpora 19
- Living with parallel corpora 39
- Working with parallel corpora 57
- Innovations in parallel corpus alignment and retrieval 79
-
Part II. Parallel corpora
- InterCorp 93
- Corpus PaGeS 103
- Building EPTIC 123
- Enriching parallel corpora with multimedia and lexical semantics 141
- Discourse annotation in the MULTINOT corpus 159
- PEST 183
- Indexation and analysis of a parallel corpus using CQPweb 197
- P-ACTRES 2.0 215
- An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus 233
-
Part III. Parallel corpora
- Strategies for building high quality bilingual lexicons from comparable corpora 251
- Discovering bilingual collocations in parallel corpora 267
- Normalization of shorthand forms in French text messages using word embedding and machine translation 281
- Index 299