Treebanks in translation studies: The CroCo Dependency Treebank

Oliver Čulo; Silvia Hansen-Schirra

Chapter

Treebanks in translation studies

The CroCo Dependency Treebank

Oliver Čulo and Silvia Hansen-Schirra

Published by

John Benjamins Publishing Company

View more publications by John Benjamins Publishing Company

This chapter is in the book Multilingual Corpora and Multilingual Corpus Analysis

Abstract

The CroCo Dependeny Treebank comprises a collection of parallel texts of both English and German originals from eight different registers with their German and English translations respectively. In addition to the original multi-layer annotation and alignment of the CroCo Corpus (part-of-speech and phrase structure) we added treebank information (dependencies) to a sample of the parallel texts and aligned the nodes of the tree. This deep annotation and alignment allows us to query the corpus for both crossing edges (e.g. an aligned word pair, which realizes different syntactic functions in the source and target text) and dropped leaves and cut branches (e.g. words or phrases that have no aligned counterparts or incomplete alignments). On this basis, translation shifts on various linguistic levels and combinations thereof can be extracted and classified automatically. Patterns like these will be examined and possible factors triggering shifts named so far, register, grammatical contrast and typical translation strategies, as well as commonalities and differences in valence across English and German are discussed in the light of a possible dimension for categorisation of shifts.

You are currently not able to access this content.

Abstract

You are currently not able to access this content.

Chapters in this book

Prelim pages i
Table of contents v
Introduction xi
Section 1. Learner and attrition corpora
The LeaP corpus 3
Technological and methodological challenges in creating, annotating and sharing a learner corpus of spoken German 25
Creation and analysis of a reading comprehension exercise corpus 47
The ALeSKo learner corpus 71
Corpora of spoken Spanish by simultaneous and successive German-Spanish bilingual and Spanish monolingual children 97
Monolingual and bilingual phonoprosodic corpora of child German and child Spanish 107
Pragmatic corpus analysis, exemplified by Turkish-German bilingual and monolingual data 123
Corpus of Polish spoken in Germany 153
The HABLA-corpus (German-French and German-Italian) 163
Section 2. Language contact corpora
The Hamburg Corpus of Argentinean Spanish (HaCASpa) 183
Ad hoc contact phenomena or established features of a contact variety? 199
Phonoprosodic corpus of spoken Catalan (PhonCAT) 215
Researching the intelligibility of a (German) dialect 231
Annotating ambiguity 245
Section 3. Interpreting corpora
Sharing community interpreting corpora 275
CoSi – A Corpus of Consecutive and Simultaneous Interpreting 295
The corpus “Interpreting in Hospitals” 305
Section 4. Comparable and parallel corpora
The GeWiss corpus 319
Korpus C4 339
Treebanks in translation studies 347
Section 5. Corpus tools
Multilingual phonological corpus analysis 365
Finding the balance between strict defaults and total openness 383
General index 401
Corpora index 405
Language index 407

https://doi.org/10.1075/hsm.14.25cul

Chapters in this book

Prelim pages i
Table of contents v
Introduction xi
Section 1. Learner and attrition corpora
The LeaP corpus 3
Technological and methodological challenges in creating, annotating and sharing a learner corpus of spoken German 25
Creation and analysis of a reading comprehension exercise corpus 47
The ALeSKo learner corpus 71
Corpora of spoken Spanish by simultaneous and successive German-Spanish bilingual and Spanish monolingual children 97
Monolingual and bilingual phonoprosodic corpora of child German and child Spanish 107
Pragmatic corpus analysis, exemplified by Turkish-German bilingual and monolingual data 123
Corpus of Polish spoken in Germany 153
The HABLA-corpus (German-French and German-Italian) 163
Section 2. Language contact corpora
The Hamburg Corpus of Argentinean Spanish (HaCASpa) 183
Ad hoc contact phenomena or established features of a contact variety? 199
Phonoprosodic corpus of spoken Catalan (PhonCAT) 215
Researching the intelligibility of a (German) dialect 231
Annotating ambiguity 245
Section 3. Interpreting corpora
Sharing community interpreting corpora 275
CoSi – A Corpus of Consecutive and Simultaneous Interpreting 295
The corpus “Interpreting in Hospitals” 305
Section 4. Comparable and parallel corpora
The GeWiss corpus 319
Korpus C4 339
Treebanks in translation studies 347
Section 5. Corpus tools
Multilingual phonological corpus analysis 365
Finding the balance between strict defaults and total openness 383
General index 401
Corpora index 405
Language index 407

Treebanks in translation studies

Abstract

Chapter PDF View

Abstract

Chapters in this book

Chapters in this book

Chapters in this book