The GeWiss corpus
-
Christian Fandrych
Abstract
Research on academic language has flourished in recent years, including academic German. The corpus resources available for larger, empirically based research projects remain, however, limited, even with regard to written academic language, and they are practically non-extant for spoken academic language. A detailed, empirical analysis of linguistic conventions and formulaic language used in (oral) academic communication is, however, all the more important in a day and age where our academic landscapes are becoming ever more internationalised. GeWiss aims to lay a foundation for such research: With GeWiss we are currently constructing a parallel corpus consisting of spoken academic language data from German, English, and Polish. Our contribution outlines the corpus design and the methodological procedures used for the construction of GeWiss. The project focuses, at least initially, on two key genres: Academic papers / student presentations and oral examinations. These are recorded, transcribed and stored in a searchable database. The corpus will comprise native speaker data from Polish, English, and German academics and students, as well as German as a Foreign Language (GFL) data of nonnative speakers of German. These are recorded at all three partner institutions. GeWiss will therefore be the first corpus comprising GFL learner data. The paper focuses on corpus design as well as data collection and transcription. It also discusses selected research questions for which GeWiss can serve as an empirical basis.
Abstract
Research on academic language has flourished in recent years, including academic German. The corpus resources available for larger, empirically based research projects remain, however, limited, even with regard to written academic language, and they are practically non-extant for spoken academic language. A detailed, empirical analysis of linguistic conventions and formulaic language used in (oral) academic communication is, however, all the more important in a day and age where our academic landscapes are becoming ever more internationalised. GeWiss aims to lay a foundation for such research: With GeWiss we are currently constructing a parallel corpus consisting of spoken academic language data from German, English, and Polish. Our contribution outlines the corpus design and the methodological procedures used for the construction of GeWiss. The project focuses, at least initially, on two key genres: Academic papers / student presentations and oral examinations. These are recorded, transcribed and stored in a searchable database. The corpus will comprise native speaker data from Polish, English, and German academics and students, as well as German as a Foreign Language (GFL) data of nonnative speakers of German. These are recorded at all three partner institutions. GeWiss will therefore be the first corpus comprising GFL learner data. The paper focuses on corpus design as well as data collection and transcription. It also discusses selected research questions for which GeWiss can serve as an empirical basis.
Chapters in this book
- Prelim pages i
- Table of contents v
- Introduction xi
-
Section 1. Learner and attrition corpora
- The LeaP corpus 3
- Technological and methodological challenges in creating, annotating and sharing a learner corpus of spoken German 25
- Creation and analysis of a reading comprehension exercise corpus 47
- The ALeSKo learner corpus 71
- Corpora of spoken Spanish by simultaneous and successive German-Spanish bilingual and Spanish monolingual children 97
- Monolingual and bilingual phonoprosodic corpora of child German and child Spanish 107
- Pragmatic corpus analysis, exemplified by Turkish-German bilingual and monolingual data 123
- Corpus of Polish spoken in Germany 153
- The HABLA-corpus (German-French and German-Italian) 163
-
Section 2. Language contact corpora
- The Hamburg Corpus of Argentinean Spanish (HaCASpa) 183
- Ad hoc contact phenomena or established features of a contact variety? 199
- Phonoprosodic corpus of spoken Catalan (PhonCAT) 215
- Researching the intelligibility of a (German) dialect 231
- Annotating ambiguity 245
-
Section 3. Interpreting corpora
- Sharing community interpreting corpora 275
- CoSi – A Corpus of Consecutive and Simultaneous Interpreting 295
- The corpus “Interpreting in Hospitals” 305
-
Section 4. Comparable and parallel corpora
- The GeWiss corpus 319
- Korpus C4 339
- Treebanks in translation studies 347
-
Section 5. Corpus tools
- Multilingual phonological corpus analysis 365
- Finding the balance between strict defaults and total openness 383
- General index 401
- Corpora index 405
- Language index 407
Chapters in this book
- Prelim pages i
- Table of contents v
- Introduction xi
-
Section 1. Learner and attrition corpora
- The LeaP corpus 3
- Technological and methodological challenges in creating, annotating and sharing a learner corpus of spoken German 25
- Creation and analysis of a reading comprehension exercise corpus 47
- The ALeSKo learner corpus 71
- Corpora of spoken Spanish by simultaneous and successive German-Spanish bilingual and Spanish monolingual children 97
- Monolingual and bilingual phonoprosodic corpora of child German and child Spanish 107
- Pragmatic corpus analysis, exemplified by Turkish-German bilingual and monolingual data 123
- Corpus of Polish spoken in Germany 153
- The HABLA-corpus (German-French and German-Italian) 163
-
Section 2. Language contact corpora
- The Hamburg Corpus of Argentinean Spanish (HaCASpa) 183
- Ad hoc contact phenomena or established features of a contact variety? 199
- Phonoprosodic corpus of spoken Catalan (PhonCAT) 215
- Researching the intelligibility of a (German) dialect 231
- Annotating ambiguity 245
-
Section 3. Interpreting corpora
- Sharing community interpreting corpora 275
- CoSi – A Corpus of Consecutive and Simultaneous Interpreting 295
- The corpus “Interpreting in Hospitals” 305
-
Section 4. Comparable and parallel corpora
- The GeWiss corpus 319
- Korpus C4 339
- Treebanks in translation studies 347
-
Section 5. Corpus tools
- Multilingual phonological corpus analysis 365
- Finding the balance between strict defaults and total openness 383
- General index 401
- Corpora index 405
- Language index 407