Startseite Reliability vs. granularity in discourse annotation: What is the trade-off?
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Reliability vs. granularity in discourse annotation: What is the trade-off?

  • Ludivine Crible EMAIL logo und Liesbeth Degand
Veröffentlicht/Copyright: 23. März 2017

Abstract

We report on the results of an annotation experiment comparing naïve and expert coders in a sense disambiguation task consisting in the assignment of function labels to discourse markers (e.g. well, but, I mean) in spoken French and English using a taxonomy specifically designed for speech. Our qualitative-quantitative assessment of its reliability led us to suggest fundamental revisions of the structure of the taxonomy, striving to find a better balance between reliability and granularity. The resulting model articulates two independent levels of annotation (domains and functions) which, once combined, provide a robust tool for the analysis of discourse markers and relate them to more general functions of spoken language.

Funding statement: Fédération Wallonie-Bruxelles, (Grant/Award Number: ‘12/17-044’).

References

Asher, Nicholas & Alex Lascarides. 2003. Logics of conversation. Cambridge: Cambridge University Press.Suche in Google Scholar

Benamara, Farah & Maite Taboada. 2015. Mapping different rhetorical relation annotations: A proposal. In Proceedings of the 4th Joint Conference on Lexical and Computational Semantics (*SEM), collocated with NAACL, Denver, USA.10.18653/v1/S15-1016Suche in Google Scholar

Bolly, Catherine, Ludivine Crible, Liesbeth Degand & Deniz Uygur-Distexhe. 2015. MDMA. Identification et annotation des marqueurs discursifs “potentiels” en contexte [MDMA. Identification and annotation of “potential” discourse markers in context]. Discours 15.10.4000/discours.9009Suche in Google Scholar

Bolly, Catherine, Ludine Crible, Liesbeth Degand & Deniz Uygur-Distexhe. In press. Towards a model for discourse marker annotation in spoken French: From potential to feature-based discourse markers. In Chiara Fedriani & Andrea Sansó (eds.), Discourse markers, pragmatics markers and modal particles: New perspectives. Amsterdam: John Benjamins.Suche in Google Scholar

Chafe, Wallace. 1994. Discourse, consciousness, and time. Chicago: University of Chicago Press.Suche in Google Scholar

Crible, Ludivine. 2014. Identifying and describing discourse markers in spoken corpora. Annotation protocol v.8. Technical report, Université catholique de Louvain.Suche in Google Scholar

Crible, Ludivine. 2016. Discourse markers and disfluencies: Integrating functional and formal annotations. In Harry Bunt (ed.), Proceedings of the 12th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (isa-12), LREC 2016 Workshop, 38–45.Suche in Google Scholar

Crible, Ludivine. 2017. Discourse markers and (dis)fluency across registers: A contrastive usage-based study in English and French. Louvain-la-Neuve: Université catholique de Louvain dissertation.Suche in Google Scholar

Crible, Ludivine. In press. Towards an operational category of discourse markers: A definition and its model. In Chiara Fedriani & Andrea Sansó (eds.), Discourse markers, pragmatics markers and modal particles: New perspectives. Amsterdam: John Benjamins.Suche in Google Scholar

Crible, Ludivine & Maria Josep Cuenca. Submitted. Discourse markers in speech: Distinctive features and corpus annotation.Suche in Google Scholar

Crible, Ludivine, Liesbeth Degand & Anne-Catherine Simon. 2016. Interdependence of annotation levels in a functional taxonomy for discourse markers in spoken corpora. In Liesbeth Degand, Csilla Dér, Péter Furkó & Bonnie Webber (eds.), Proceedings of the TextLink Second Action Conference, 36–39. Budapest: L’Harmattan Kiado.Suche in Google Scholar

Crible, Ludivine & Sandrine Zufferey. 2015. Using a unified taxonomy to annotate discourse markers in speech and writing. In Harry Bunt (ed.), Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (isa-11), IWCS 2015 Workshop, 14–22.Suche in Google Scholar

Cuenca, Maria Josep. 2013. The fuzzy boundaries between discourse marking and modal marking. In Liesbeth Degand, Bonnie Cornillie & Paola Pietrandrea (eds.), Discourse markers and modal particles. Categorization and description, 191–216. Amsterdam: John Benjamins.10.1075/pbns.234.08cueSuche in Google Scholar

Das, Debopam, Maite Taboada & Paul McFetridge. 2015. RST signalling corpus. LDC2015T10. Distributed through the Linguistic Data Consortium.Suche in Google Scholar

Degand, Liesbeth, Laurence Martin & Anne-Catherine Simon. 2014. Unités discursives de base et leur périphérie gauche dans LOCAS-F, un corpus oral multigenres annoté [Basic discourse units and their left periphery in LOCAS-F, an annotated multigenre spoken corpus]. In CMLF 20144ème Congrès Mondial de Linguistique Française 2014. Berlin, Germany: EDP Sciences.10.1051/shsconf/20140801211Suche in Google Scholar

Degand, Liesbeth & Anne-Marie Simon-Vandenbergen. 2011. Grammaticalization and (inter)-subjectification of discourse markers. Linguistics 49. 287–294.10.1515/ling.2011.008Suche in Google Scholar

Enschot, Renske van, Wilbert Spooren, Antal van den Bosch, Christian Burgers, Liesbeth Degand, Jacqueline Evers-Vermeul, Florian Kunneman, Christine Liebrecht, Yvette Linders & Alfons Maes. Submitted. Taming our wild data: on intercoder reliability in discourse research.Suche in Google Scholar

Fischer, Kerstin. 2000. From cognitive semantics to lexical pragmatics: The functional polysemy of discourse particles. Berlin: Walter de Gruyter.10.1515/9783110828641Suche in Google Scholar

Fleiss, Joseph. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76(5). 378–382.10.1037/h0031619Suche in Google Scholar

Fraser, Bruce. 1996. Pragmatic markers. Pragmatics 6(2). 167–190.10.1075/prag.6.2.03fraSuche in Google Scholar

Geertzen, Jeroen & Harry Bunt. 2006. Measuring annotator agreement in a complex hierarchical dialogue act annotation scheme. In Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, 126–133.10.3115/1654595.1654619Suche in Google Scholar

González, Montserrat. 2005. Pragmatic markers and discourse coherence relations in English and Catalan oral narrative. Discourse Studies 7(1). 53–86.10.1177/1461445605048767Suche in Google Scholar

Halliday, Michael. 1970. Functional diversity in language as seen from a consideration of modality and mood in English. Foundations of Language: International Journal of Language and Philosophy 6. 322–361.Suche in Google Scholar

Halliday, Michael. 1987. Spoken and written modes of meaning. In Rosalind Horowitz & S. Jay Samuels (eds.), Comprehending oral and written language, 55–82. New York, NY: Academic Press.Suche in Google Scholar

Hansen, Maj-Britt Mosegaard. 1998. The function of discourse particles. A study with special reference to spoken standard French. Amsterdam & Philadelphia: John Benjamins.10.1075/pbns.53Suche in Google Scholar

Lapshinova-Koltunski, Ekaterina, Anna Nedoluzhko & Kerstin Kunz. 2015. Across languages and genres: Creating a universal annotation scheme for textual relation. In Proceedings of LAW IX at NAACL HLT 2015, Denver, USA.10.3115/v1/W15-1620Suche in Google Scholar

Leech, Geoffrey. 1997. Introducing corpus annotation. In Roger Garside, Geoffrey Leech & Tony McEnery (eds.), Corpus Annotation, 1–18. London, Longman.10.4324/9781315841366Suche in Google Scholar

Mann, William & Sandra Thompson. 1988. Rhetorical structure theory: Toward a functional theory of text organization. Text 8(3). 243–281.10.1515/text.1.1988.8.3.243Suche in Google Scholar

Marcu, Daniel, Estibaliz Amorrortu & Magdalena Romera. 1999. Experiments in constructing a corpus of discourse trees. In Proceedings of the ACL Workshop on Standards and Tools for Discourse Tagging, 48–57.Suche in Google Scholar

Martin, Laurence, Liesbeth Degand & Anne-Catherine. Simon 2014. Forme et fonction de la périphérie gauche dans un corpus oral multigenres annoté [Form and function of the left periphery in a multigenre annotated spoken corpus]. Corpus 13. 243–265.10.4000/corpus.2509Suche in Google Scholar

Maschler, Yael. 2009. Metalanguage in interaction. Hebrew discourse markers. Amsterdam: John Benjamins.10.1075/pbns.181Suche in Google Scholar

Maschler, Yael & Deborah Schiffrin. 2015. Discourse markers: Language, meaning, and context. In Deborah Tannen, Heidi E. Hamilton & Deborah Schiffrin (eds.), The handbook of discourse analysis, 2nd edn., 189–221. Chichester, UK: John Wiley & Sons, Ltd.10.1002/9781118584194.ch9Suche in Google Scholar

Nelson, Gerald, Sean Wallis & Bas Aarts. 2002. Exploring natural language: Working with the British component of the International Corpus of English. Amsterdam: John Benjamins.10.1075/veaw.g29Suche in Google Scholar

Neuendorf, Kimberly. 2002. The content analysis guidebook. Thousand Oaks, CA: Sage Publications.Suche in Google Scholar

Nowak, Stefanie & Stefan Rüger. 2010. How reliable are annotations via crowdsourcing: A study about inter-annotator agreement for multi-label image annotation. In Proceedings of the International Conference on Multimedia Information Retrieval (MIR), Philadelphia, USA.10.1145/1743384.1743478Suche in Google Scholar

Pander Maat, Henk & Ted Sanders. 2000. Domains of use and subjectivity. On the distribution of three Dutch causal connectives. In Bernd Kortmann & Elizabeth Couper-Kuhlen (eds.), Cause, condition, concession and contrast: Cognitive and discourse perspectives, 57–82. Berlin: Mouton de Gruyter.10.1515/9783110219043-004Suche in Google Scholar

Pons Bordería, Salvador. 2006. A functional approach to the study of discourse markers. In Kerstin Fischer (ed.), Approaches to discourse particles, 77–100. Amsterdam: Elsevier.10.1163/9780080461588_006Suche in Google Scholar

Prasad, Rashmi & Harry Bunt. 2015. Semantic relations in discourse: The current state of ISO 24617-8. In Harry Bunt (ed.), Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (isa-11), IWCS Workshop 2015, 80–92.Suche in Google Scholar

Prasad, Rashmi, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio Robaldo, Aravind Joshi & Bonnie Webber. 2008. The penn discourse TreeBank 2.0. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 08), Marrackech, Morroco, 2961–2968.Suche in Google Scholar

Redeker, Gisela. 1990. Ideational and pragmatic markers of discourse structure. Journal of Pragmatics 14. 367–381.10.1016/0378-2166(90)90095-USuche in Google Scholar

Sanders, Ted, Vera Demberg, Jacqueline Evers-Vermeul, Jet Hoek, Merel Scholman & Sandrine Zufferey. Submitted. Unifying dimensions in discourse relations: How various annotation frameworks are related.Suche in Google Scholar

Sanders, Ted, Wilbert Spooren & Leo Noordman. 1992. Toward a taxonomy of coherence relations. Discourse Processes 15. 1–35.10.1080/01638539209544800Suche in Google Scholar

Schiffrin, Deborah. 1987. Discourse markers. Cambridge: Cambridge University Press.10.1017/CBO9780511611841Suche in Google Scholar

Schlamberger Brezar, Mojca. 2012. Les marqueurs discursifs “mais” et “alors” en tant qu’indicateurs du degré d’oralité dans les discours officiels, les débats télévisés et les dialogues littéraires. Linguistica 52(1). 225–236.10.4312/linguistica.52.1.225-237Suche in Google Scholar

Schmidt, Thomas & Kai Wörner. 2012. EXMARaLDA. In Jacques Durand, Gut Ulrike & Gjert Kristoffersen (eds.), Handbook on corpus phonology, 402–419. Oxford: Oxford University Press.Suche in Google Scholar

Scholman, Merel, Jacqueline Evers-Vermeul & Ted. Sanders 2016. A step-wise approach to discourse annotation: Towards a reliable categorization of coherence relations. Dialogue & Discourse 7(2). 1–28.10.5087/dad.2016.201Suche in Google Scholar

Schourup, Lawrence. 1999. Discourse markers. Lingua 107. 227–265.10.1016/S0024-3841(96)90026-1Suche in Google Scholar

Spooren, Wilbert & Liesbeth Degand. 2010. Coding coherence relations: Reliability and validity. Corpus Linguistics and Linguistic Theory 6(2). 241–266.10.1515/cllt.2010.009Suche in Google Scholar

Stede, Manfred & Arne Neumann. 2014. Potsdam commentary corpus 2.0: Annotation for discourse research. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 14), 925–929. Reykjavik, Iceland.Suche in Google Scholar

Stent, Amanda. 2000. Rhetorical structure in dialog. In Proceedings of the 2nd International Natural Language Generation Conference (INLG’2000).10.3115/1118253.1118288Suche in Google Scholar

Sweetser, Eve. 1990. From etymology to pragmatics. Cambridge: Cambridge University Press.10.1017/CBO9780511620904Suche in Google Scholar

Tonelli, Sara, Giuseppe Riccardi, Rashmi Prasad & Aravind Joshi. 2010. Annotation of discourse relations for conversational spoken dialogs. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 10), 2084–2090. Valletta, Malta.Suche in Google Scholar

Traugott, Elizabeth. 1982. From propositional to textual and expressive meanings: Some semantic-pragmatic aspects of grammaticalization. In Winfred P. Lehmann & Yakov Malkiel (eds.), Perspectives on historical linguistics, 245–271. Amsterdam & Philadelphia: Benjamins.10.1075/cilt.24.09cloSuche in Google Scholar

Traxler, Matthew, Michael Bybee & Martin Pickering. 1997. Influence of connectives on language comprehension: Eye-tracking evidence for incremental interpretation. The Quarterly Journal of Experimental Psychology A: Human Experimental Psychology 50A(3). 481–497.10.1080/027249897391982Suche in Google Scholar

Zeyrek, Deniz, Isin Demirşahin, Ayisigi Sevdik Çalli & Ruket Çakici. 2013. Turkish discourse bank: Porting a discourse annotation style to a morphologically rich language. Dialogue & Discourse 4(2). 174–184.10.5087/dad.2013.208Suche in Google Scholar

Zikánová, Šarka, Eva Hajičová, Barbora Hladká, Pavlina Jínová, Jiri Mírovský, Anna Nedoluzhko, Lucie Poláková, Katerina Rysová, Magdalena Rysová & Jan Václ. 2015. Discourse and coherence. From the sentence structure to relations in text. Prague: Institute of Formal and Applied Linguistics.Suche in Google Scholar

Zufferey, Sandrine & Liesbeth Degand. 2013. Representing the meaning of discourse connectives for multilingual purposes. Corpus Linguistics and Linguistic Theory 10. 1–24.Suche in Google Scholar

Published Online: 2017-03-23
Published in Print: 2019-05-27

© 2019 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 28.10.2025 von https://www.degruyterbrill.com/document/doi/10.1515/cllt-2016-0046/html
Button zum nach oben scrollen