Chapter 2. ZHEN
-
Yi Gu
and Ana Frankenberg-Garcia
Abstract
Most Chinese-English parallel corpora consist of English source texts translated into Chinese. This chapter discusses the need for corpora representative of the under-resourced Chinese into English translation direction. After a brief overview of the current Chinese-English translation scenario and an analysis of existing parallel corpora for this language pair, we discuss problems in mining contemporary Chinese to English translations and issues in Chinese to English parallel text alignment. We then introduce ZHEN, a corpus of circa one-million characters of contemporary simplified Chinese source texts from a range of text types aligned with authentic translations into English. Its aim is to contribute to our understanding of Chinese to English translation norms and of features of English translated from Chinese.
Abstract
Most Chinese-English parallel corpora consist of English source texts translated into Chinese. This chapter discusses the need for corpora representative of the under-resourced Chinese into English translation direction. After a brief overview of the current Chinese-English translation scenario and an analysis of existing parallel corpora for this language pair, we discuss problems in mining contemporary Chinese to English translations and issues in Chinese to English parallel text alignment. We then introduce ZHEN, a corpus of circa one-million characters of contemporary simplified Chinese source texts from a range of text types aligned with authentic translations into English. Its aim is to contribute to our understanding of Chinese to English translation norms and of features of English translated from Chinese.
Chapters in this book
- Prelim pages i
- Table of contents v
- Corpus resources and tools 1
-
Part I. Corpus resources and tools
- Chapter 1. Now what ? 23
- Chapter 2. ZHEN 49
- Chapter 3. Word alignment in a parallel corpus of Old English prose 75
- Chapter 4. Semantic textual similarity based on deep learning 101
- Chapter 5. TAligner 3.0 125
- Chapter 6. Developing a corpus-informed tool for Spanish professionals writing specialised texts in English 147
-
Part II. Corpus-based studies and explorations
- Chapter 7. English and Spanish discourse markers in translation 177
- Chapter 8. The discourse markers well and so and their equivalents in the Portuguese and Turkish subparts of the TED-MDB corpus 209
- Chapter 9. Variation of evidential values in discourse domains 233
- Chapter 10. The translation for dubbing of Westerns in Spain 257
- Chapter 11. Generic analysis of mobile application reviews in English and Spanish 283
- Chapter 12. Exploring variation in translation with probabilistic language models 307
- Chapter 13. Binomial adverbs in Germanic and Romance Languages 325
- Index 343
Chapters in this book
- Prelim pages i
- Table of contents v
- Corpus resources and tools 1
-
Part I. Corpus resources and tools
- Chapter 1. Now what ? 23
- Chapter 2. ZHEN 49
- Chapter 3. Word alignment in a parallel corpus of Old English prose 75
- Chapter 4. Semantic textual similarity based on deep learning 101
- Chapter 5. TAligner 3.0 125
- Chapter 6. Developing a corpus-informed tool for Spanish professionals writing specialised texts in English 147
-
Part II. Corpus-based studies and explorations
- Chapter 7. English and Spanish discourse markers in translation 177
- Chapter 8. The discourse markers well and so and their equivalents in the Portuguese and Turkish subparts of the TED-MDB corpus 209
- Chapter 9. Variation of evidential values in discourse domains 233
- Chapter 10. The translation for dubbing of Westerns in Spain 257
- Chapter 11. Generic analysis of mobile application reviews in English and Spanish 283
- Chapter 12. Exploring variation in translation with probabilistic language models 307
- Chapter 13. Binomial adverbs in Germanic and Romance Languages 325
- Index 343