Finding the balance between strict defaults and total openness: Collecting and managing metadata for spoken language corpora with the EXMARaLDA Corpus Manager

Kai Wörner

Kapitel

Finding the balance between strict defaults and total openness

Collecting and managing metadata for spoken language corpora with the EXMARaLDA Corpus Manager

Kai Wörner

Veröffentlicht von

John Benjamins Publishing Company

Weitere Titel anzeigen von John Benjamins Publishing Company

Suche

Ein Kapitel aus dem Buch Multilingual Corpora and Multilingual Corpus Analysis

Abstract

This paper presents the metadata model of the EXMARaLDA system and its implementations. It will first take a look on existing metadata schemes for transcriptions of spoken language as well as written texts and emphasize on their advantages and disadvantages. The paper will justify the decisions against existing models that led to a new data model that does not prescribe many metadata items and relies on XML files. It will conclude with a brief outlook on ongoing efforts to standardize metadata.

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Abstract

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Kapitel in diesem Buch

Prelim pages i
Table of contents v
Introduction xi
Section 1. Learner and attrition corpora
The LeaP corpus 3
Technological and methodological challenges in creating, annotating and sharing a learner corpus of spoken German 25
Creation and analysis of a reading comprehension exercise corpus 47
The ALeSKo learner corpus 71
Corpora of spoken Spanish by simultaneous and successive German-Spanish bilingual and Spanish monolingual children 97
Monolingual and bilingual phonoprosodic corpora of child German and child Spanish 107
Pragmatic corpus analysis, exemplified by Turkish-German bilingual and monolingual data 123
Corpus of Polish spoken in Germany 153
The HABLA-corpus (German-French and German-Italian) 163
Section 2. Language contact corpora
The Hamburg Corpus of Argentinean Spanish (HaCASpa) 183
Ad hoc contact phenomena or established features of a contact variety? 199
Phonoprosodic corpus of spoken Catalan (PhonCAT) 215
Researching the intelligibility of a (German) dialect 231
Annotating ambiguity 245
Section 3. Interpreting corpora
Sharing community interpreting corpora 275
CoSi – A Corpus of Consecutive and Simultaneous Interpreting 295
The corpus “Interpreting in Hospitals” 305
Section 4. Comparable and parallel corpora
The GeWiss corpus 319
Korpus C4 339
Treebanks in translation studies 347
Section 5. Corpus tools
Multilingual phonological corpus analysis 365
Finding the balance between strict defaults and total openness 383
General index 401
Corpora index 405
Language index 407

https://doi.org/10.1075/hsm.14.28wor

Kapitel in diesem Buch

Prelim pages i
Table of contents v
Introduction xi
Section 1. Learner and attrition corpora
The LeaP corpus 3
Technological and methodological challenges in creating, annotating and sharing a learner corpus of spoken German 25
Creation and analysis of a reading comprehension exercise corpus 47
The ALeSKo learner corpus 71
Corpora of spoken Spanish by simultaneous and successive German-Spanish bilingual and Spanish monolingual children 97
Monolingual and bilingual phonoprosodic corpora of child German and child Spanish 107
Pragmatic corpus analysis, exemplified by Turkish-German bilingual and monolingual data 123
Corpus of Polish spoken in Germany 153
The HABLA-corpus (German-French and German-Italian) 163
Section 2. Language contact corpora
The Hamburg Corpus of Argentinean Spanish (HaCASpa) 183
Ad hoc contact phenomena or established features of a contact variety? 199
Phonoprosodic corpus of spoken Catalan (PhonCAT) 215
Researching the intelligibility of a (German) dialect 231
Annotating ambiguity 245
Section 3. Interpreting corpora
Sharing community interpreting corpora 275
CoSi – A Corpus of Consecutive and Simultaneous Interpreting 295
The corpus “Interpreting in Hospitals” 305
Section 4. Comparable and parallel corpora
The GeWiss corpus 319
Korpus C4 339
Treebanks in translation studies 347
Section 5. Corpus tools
Multilingual phonological corpus analysis 365
Finding the balance between strict defaults and total openness 383
General index 401
Corpora index 405
Language index 407

Finding the balance between strict defaults and total openness

Abstract

Kapitel PDF Ansicht

Abstract

Kapitel in diesem Buch

Kapitel in diesem Buch

Kapitel in diesem Buch