Startseite Chapter 1. Collecting data for the Rhapsodie treebank
Kapitel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Chapter 1. Collecting data for the Rhapsodie treebank

Corpus design and ethical issues
  • Anne Lacheret-Dujour , Paola Pietrandrea , Olivier Baude , Nicolas Obin , Anne-Catherine Simon und Atanas Tchobanov
Weitere Titel anzeigen von John Benjamins Publishing Company
Rhapsodie
Ein Kapitel aus dem Buch Rhapsodie

Abstract

This chapter is devoted to the development of the Rhapsodie repository. We describe the selection of data to be annotated, the principles used to document the data and discuss the theoretical assumptions underlying the Rhapsodie project. The aim was to provide a corpus to study the interface between discourse, syntax, and prosody in French and the variation of intonosyntactic features according to discourse genre in the marking of informational structure as well as expressivity in unelicited speech. At the beginning of the Rhapsodie project such data were under-represented and the need for spoken corpora of this type in French was strongly felt. Consequently, several challenges had to be addressed. First, we discuss the different obstacles and challenging questions we faced with respect to the development of a well-balanced corpus of different discourse genres produced in different speech situations, such as the nature of the data and the type of information to include in the metadata. Then, we present the sources from which the samples were extracted, legal and ethical issues, and the methodology adopted to encode the metadata.

Abstract

This chapter is devoted to the development of the Rhapsodie repository. We describe the selection of data to be annotated, the principles used to document the data and discuss the theoretical assumptions underlying the Rhapsodie project. The aim was to provide a corpus to study the interface between discourse, syntax, and prosody in French and the variation of intonosyntactic features according to discourse genre in the marking of informational structure as well as expressivity in unelicited speech. At the beginning of the Rhapsodie project such data were under-represented and the need for spoken corpora of this type in French was strongly felt. Consequently, several challenges had to be addressed. First, we discuss the different obstacles and challenging questions we faced with respect to the development of a well-balanced corpus of different discourse genres produced in different speech situations, such as the nature of the data and the type of information to include in the metadata. Then, we present the sources from which the samples were extracted, legal and ethical issues, and the methodology adopted to encode the metadata.

Heruntergeladen am 21.9.2025 von https://www.degruyterbrill.com/document/doi/10.1075/scl.89.02lac/html
Button zum nach oben scrollen