Korpora als primäre Quellen von Tourlex
-
Carolina Flinz
Abstract
Corpora found their way into lexicography a long time ago and are also used as the primary sources of many contemporary dictionaries. Their use in the lexicographical process has opened up a variety of new possibilities (Lemnitzer/ Zinsmeister 2015: 170) that were previously unthinkable with traditional collections of documents. Corpora are also the lexicographic primary source of Tourlex, a newly conceived bilingual wiki-based resource, designed to support future employees of the tourism industry, with a special focus on collocations (Flinz 2019). In particular specific partial corpora are used for different lexicographic work steps (Wolf 2010: 23): the creation of a small specialized comparison corpus (Lemnitzer/Zinsmeister 2015: 138) based on the text type „General Terms and Conditions of Travel“ and large corpora, such as the reference corpus DeReKo and German Web 2013. In addition, documents and contexts were also researched from the Internet. The purpose of this paper is to reflect on the corpora that have been used as a data basis for Tourlex in order to show that both small and large corpora can be used for different lexicographic purposes: The prerequisite for the use of different corpora, however, is that the respective objectives are defined in advance. After an overview of the use of corpora in the phase of data collection in the lexicographic process of dictionary projects (§ 2), the primary sources (cf. Wiegand 1998: 140) of Tourlex and the used approach are presented (§ 3). In the fourth section, the extraction of the lemma candidate list and the finding of equivalence relations both of individual lexemes and of collocations are described in detail. The model used is systematized and its application is presented exemplarily.
Abstract
Corpora found their way into lexicography a long time ago and are also used as the primary sources of many contemporary dictionaries. Their use in the lexicographical process has opened up a variety of new possibilities (Lemnitzer/ Zinsmeister 2015: 170) that were previously unthinkable with traditional collections of documents. Corpora are also the lexicographic primary source of Tourlex, a newly conceived bilingual wiki-based resource, designed to support future employees of the tourism industry, with a special focus on collocations (Flinz 2019). In particular specific partial corpora are used for different lexicographic work steps (Wolf 2010: 23): the creation of a small specialized comparison corpus (Lemnitzer/Zinsmeister 2015: 138) based on the text type „General Terms and Conditions of Travel“ and large corpora, such as the reference corpus DeReKo and German Web 2013. In addition, documents and contexts were also researched from the Internet. The purpose of this paper is to reflect on the corpora that have been used as a data basis for Tourlex in order to show that both small and large corpora can be used for different lexicographic purposes: The prerequisite for the use of different corpora, however, is that the respective objectives are defined in advance. After an overview of the use of corpora in the phase of data collection in the lexicographic process of dictionary projects (§ 2), the primary sources (cf. Wiegand 1998: 140) of Tourlex and the used approach are presented (§ 3). In the fourth section, the extraction of the lemma candidate list and the finding of equivalence relations both of individual lexemes and of collocations are described in detail. The model used is systematized and its application is presented exemplarily.
Kapitel in diesem Buch
- Frontmatter I
- Inhaltsverzeichnis V
- Korpora in der Lexikographie und Phraseologie VII
- Expanding the use of corpora in the lexicographic process of online dictionaries 1
- Zur Komplexität der phraseologischen Bedeutung. Lexikographische Aspekte 21
- Lexikographische Behandlung von ausgewählten nicht lemmatisierten deutschen Idiomen 35
- Korpora als primäre Quellen von Tourlex 57
- Zugriff auf Korpusbelege in deutschen einsprachigen Onlinewörterbüchern aus der Perspektive des Deutschen als Fremdsprache 85
- Zur Erfassung von Phraseologismen in Wörterbüchern seit dem Mittelhochdeutschen bis zum „Deutschen Wörterbuch“ von Jacob und Wilhelm Grimm. Digitalisierte historische Wörterbücher als Textkorpora 105
- Deutsche geflügelte Worte literarischer Provenienz in Wörterbüchern und Lexika 123
- Informationsspektrum und Angabeklassen in der lexikografischen Ressource zum gesprochenen Deutsch: LeGeDe 141
- Quantitative und qualitative Ansätze zu Stichwortkandidaten für die lexikografische Ressource zum gesprochenen Deutsch: LeGeDe 175
- Deutsche Fußballsprache in Live-Kommentaren 197
- Zur Geschichte der Kolloquiumsreihe zur Lexikographie und Wörterbuchforschung in Südost- und Osteuropa (2000–2018). Begründet von H. E. Wiegand und P. Petkov 225
Kapitel in diesem Buch
- Frontmatter I
- Inhaltsverzeichnis V
- Korpora in der Lexikographie und Phraseologie VII
- Expanding the use of corpora in the lexicographic process of online dictionaries 1
- Zur Komplexität der phraseologischen Bedeutung. Lexikographische Aspekte 21
- Lexikographische Behandlung von ausgewählten nicht lemmatisierten deutschen Idiomen 35
- Korpora als primäre Quellen von Tourlex 57
- Zugriff auf Korpusbelege in deutschen einsprachigen Onlinewörterbüchern aus der Perspektive des Deutschen als Fremdsprache 85
- Zur Erfassung von Phraseologismen in Wörterbüchern seit dem Mittelhochdeutschen bis zum „Deutschen Wörterbuch“ von Jacob und Wilhelm Grimm. Digitalisierte historische Wörterbücher als Textkorpora 105
- Deutsche geflügelte Worte literarischer Provenienz in Wörterbüchern und Lexika 123
- Informationsspektrum und Angabeklassen in der lexikografischen Ressource zum gesprochenen Deutsch: LeGeDe 141
- Quantitative und qualitative Ansätze zu Stichwortkandidaten für die lexikografische Ressource zum gesprochenen Deutsch: LeGeDe 175
- Deutsche Fußballsprache in Live-Kommentaren 197
- Zur Geschichte der Kolloquiumsreihe zur Lexikographie und Wörterbuchforschung in Südost- und Osteuropa (2000–2018). Begründet von H. E. Wiegand und P. Petkov 225