Treebanks in translation studies
-
Oliver Čulo
Abstract
The CroCo Dependeny Treebank comprises a collection of parallel texts of both English and German originals from eight different registers with their German and English translations respectively. In addition to the original multi-layer annotation and alignment of the CroCo Corpus (part-of-speech and phrase structure) we added treebank information (dependencies) to a sample of the parallel texts and aligned the nodes of the tree. This deep annotation and alignment allows us to query the corpus for both crossing edges (e.g. an aligned word pair, which realizes different syntactic functions in the source and target text) and dropped leaves and cut branches (e.g. words or phrases that have no aligned counterparts or incomplete alignments). On this basis, translation shifts on various linguistic levels and combinations thereof can be extracted and classified automatically. Patterns like these will be examined and possible factors triggering shifts named so far, register, grammatical contrast and typical translation strategies, as well as commonalities and differences in valence across English and German are discussed in the light of a possible dimension for categorisation of shifts.
Abstract
The CroCo Dependeny Treebank comprises a collection of parallel texts of both English and German originals from eight different registers with their German and English translations respectively. In addition to the original multi-layer annotation and alignment of the CroCo Corpus (part-of-speech and phrase structure) we added treebank information (dependencies) to a sample of the parallel texts and aligned the nodes of the tree. This deep annotation and alignment allows us to query the corpus for both crossing edges (e.g. an aligned word pair, which realizes different syntactic functions in the source and target text) and dropped leaves and cut branches (e.g. words or phrases that have no aligned counterparts or incomplete alignments). On this basis, translation shifts on various linguistic levels and combinations thereof can be extracted and classified automatically. Patterns like these will be examined and possible factors triggering shifts named so far, register, grammatical contrast and typical translation strategies, as well as commonalities and differences in valence across English and German are discussed in the light of a possible dimension for categorisation of shifts.
Chapters in this book
- Prelim pages i
- Table of contents v
- Introduction xi
-
Section 1. Learner and attrition corpora
- The LeaP corpus 3
- Technological and methodological challenges in creating, annotating and sharing a learner corpus of spoken German 25
- Creation and analysis of a reading comprehension exercise corpus 47
- The ALeSKo learner corpus 71
- Corpora of spoken Spanish by simultaneous and successive German-Spanish bilingual and Spanish monolingual children 97
- Monolingual and bilingual phonoprosodic corpora of child German and child Spanish 107
- Pragmatic corpus analysis, exemplified by Turkish-German bilingual and monolingual data 123
- Corpus of Polish spoken in Germany 153
- The HABLA-corpus (German-French and German-Italian) 163
-
Section 2. Language contact corpora
- The Hamburg Corpus of Argentinean Spanish (HaCASpa) 183
- Ad hoc contact phenomena or established features of a contact variety? 199
- Phonoprosodic corpus of spoken Catalan (PhonCAT) 215
- Researching the intelligibility of a (German) dialect 231
- Annotating ambiguity 245
-
Section 3. Interpreting corpora
- Sharing community interpreting corpora 275
- CoSi – A Corpus of Consecutive and Simultaneous Interpreting 295
- The corpus “Interpreting in Hospitals” 305
-
Section 4. Comparable and parallel corpora
- The GeWiss corpus 319
- Korpus C4 339
- Treebanks in translation studies 347
-
Section 5. Corpus tools
- Multilingual phonological corpus analysis 365
- Finding the balance between strict defaults and total openness 383
- General index 401
- Corpora index 405
- Language index 407
Chapters in this book
- Prelim pages i
- Table of contents v
- Introduction xi
-
Section 1. Learner and attrition corpora
- The LeaP corpus 3
- Technological and methodological challenges in creating, annotating and sharing a learner corpus of spoken German 25
- Creation and analysis of a reading comprehension exercise corpus 47
- The ALeSKo learner corpus 71
- Corpora of spoken Spanish by simultaneous and successive German-Spanish bilingual and Spanish monolingual children 97
- Monolingual and bilingual phonoprosodic corpora of child German and child Spanish 107
- Pragmatic corpus analysis, exemplified by Turkish-German bilingual and monolingual data 123
- Corpus of Polish spoken in Germany 153
- The HABLA-corpus (German-French and German-Italian) 163
-
Section 2. Language contact corpora
- The Hamburg Corpus of Argentinean Spanish (HaCASpa) 183
- Ad hoc contact phenomena or established features of a contact variety? 199
- Phonoprosodic corpus of spoken Catalan (PhonCAT) 215
- Researching the intelligibility of a (German) dialect 231
- Annotating ambiguity 245
-
Section 3. Interpreting corpora
- Sharing community interpreting corpora 275
- CoSi – A Corpus of Consecutive and Simultaneous Interpreting 295
- The corpus “Interpreting in Hospitals” 305
-
Section 4. Comparable and parallel corpora
- The GeWiss corpus 319
- Korpus C4 339
- Treebanks in translation studies 347
-
Section 5. Corpus tools
- Multilingual phonological corpus analysis 365
- Finding the balance between strict defaults and total openness 383
- General index 401
- Corpora index 405
- Language index 407