Tongue, Language or Noise? Word Sense Disambiguation in Ancient Greek with Corpus-Based Methods
-
Wouter Mercelis
Wouter Mercelis (1997) studied Classics, Linguistics and Artificial Intelligence at KU Leuven. He currently works on an industrial PhD project at KU Leuven and at Brepols Publishers. In this project, he investigates the use of artificial intelligence to create a multi-layered, annotated, interactive interface of the various classical texts in Brepols' database. His main research interests are thus morphological and syntactical tagging in both Latin and Ancient Greek, as well as word sense disambiguation, and word and sentence alignment.Toon van Hal (1981) is professor at the University of Leuven, where he teaches courses in Ancient Greek linguistics and the history of linguistics. He holds degrees in classics, oriental studies and history from Leuven, Louvain-la-Neuve, Antwerp and Oslo. His research centers on premodern views on languages and linguistic thought, and on the use of digital technology to approach classical languages.Alek Keersmaekers studied Greek and English Linguistics at the University of Leuven (Belgium). He obtained a Master’s degree in Linguistics and Literature in 2015 and Master’s degrees in General Linguistics and Artificial Intelligence in 2016. He joined the research group QLVL (Quantitative Lexicology and Variational Linguistics) at the University of Leuven in 2016, after acquiring a fellowship from Research Foundation Flanders (FWO). In 2020 he defended his PhD on corpus linguistics in the Greek papyri, supervised by Dirk Speelman and co-supervised by Toon Van Hal and Mark Depauw. Since 2021 he has been working as a post-doctoral researcher on projects on Greek derivational morphology and computational semantics. He created the GLAUx corpus of Ancient Greek (https://glaux.be/).
Abstract
Corpus-based methods are underutilized in both intellectual history and the history of linguistics. This paper endeavors to demonstrate the potential for an automatically annotated corpus of Ancient Greek to enrich our understanding of intellectual history. It focuses on disambiguating the meaning of Ancient Greek words related to the concept of language by using corpus and natural language processing (NLP) methods. We adopt both a semasiological (meaning-focused) and onomasiological (word-focused) approach, with a primary focus on the terms γλῶττα and φωνή. To differentiate between their primary meanings, we employ both supervised and unsupervised techniques, relying on an ELECTRA model tailored to the Ancient Greek language. The results of our supervised approach indicate that a sample size of 150 sentences is sufficient to achieve stable precision and recall (around 0.90) in distinguishing the two main meanings of both γλῶττα and φωνή. Our initial attempt at using unsupervised techniques failed to clearly distinguish the two meanings of γλῶττα: the clusters formed were based on formal and morphological criteria, rather than semantic meaning. However, by applying a transformation to the original sentences, we were eventually able to plot fairly clear clusters based on meaning. This study is only a small step forward in the application of corpus-based methods to intellectual history. Further progress in unsupervised methods is necessary to further explore onomasiological approaches, offering promising perspectives for corpus-based investigations into intellectual history.
Abstract
Corpus-based methods are underutilized in both intellectual history and the history of linguistics. This paper endeavors to demonstrate the potential for an automatically annotated corpus of Ancient Greek to enrich our understanding of intellectual history. It focuses on disambiguating the meaning of Ancient Greek words related to the concept of language by using corpus and natural language processing (NLP) methods. We adopt both a semasiological (meaning-focused) and onomasiological (word-focused) approach, with a primary focus on the terms γλῶττα and φωνή. To differentiate between their primary meanings, we employ both supervised and unsupervised techniques, relying on an ELECTRA model tailored to the Ancient Greek language. The results of our supervised approach indicate that a sample size of 150 sentences is sufficient to achieve stable precision and recall (around 0.90) in distinguishing the two main meanings of both γλῶττα and φωνή. Our initial attempt at using unsupervised techniques failed to clearly distinguish the two meanings of γλῶττα: the clusters formed were based on formal and morphological criteria, rather than semantic meaning. However, by applying a transformation to the original sentences, we were eventually able to plot fairly clear clusters based on meaning. This study is only a small step forward in the application of corpus-based methods to intellectual history. Further progress in unsupervised methods is necessary to further explore onomasiological approaches, offering promising perspectives for corpus-based investigations into intellectual history.
Kapitel in diesem Buch
- frontmatter I
- Preface V
- Contents IX
- List of Figures and Diagrams XV
- List of Tables XVII
- Abbreviations XXI
-
Part I History of the Greek Language, Phonetics, Morphology
- Linguistic Variation and the Study of Ancient Greek Dialects 3
- Open Questions in Ancient Greek Phonology: Some New Evidence from Enclitics 33
- Post-Nasal Deaspiration in Ancient Greek: Mirage or Reality? 65
- Greek Verbs in -βω: A Survey 81
- Action Nouns in -τιζ/-σιζ as Second Members of Nominal Compounds in Greek 93
- The Syntax and Semantics of ([N+V]V) Verbal Compounds in Ancient Greek 107
- Σκορακίζω: ‘Curse (by Saying ἐζ κόρακαζ)’. About Delocutive Derivation in Ancient Greek and Performative 127
-
Part II Lexicon, Semantics
- Nature-based Metaphors as Body-part Terms in Ancient Greek: On καρπόζ ‘Wrist’ and ἀστράγαλοζ ‘Ankle(Bone)’ 147
- Grammaticalization of Adverbs in Ancient Greek: The Case of Homeric μάλα 159
- Smells like Metonymy 179
- Cultural Reconstruction through Linguistic Analysis: The Case of AG ταρχύω and ταριχ∊ύω 195
- On Hom. ἐπίφρων and πρόφρων in View of Homeric Human Physiology 211
- Analytical Constructions and Synthetic Encoding of Complex Predicates at the Semantics-Pragmatics Interface 225
-
Part III Syntax 1: Clause
- Number Agreement of a Predicate in Singular with Two or More Coordinated Noun Phrases in Nominative in Homer 247
- The Construction of the Verb μιμνήσκομαι in the Homeric Language 261
- On a Double Case Construction in Ancient Greek: The Whole-Part Construction in Homeric Greek 279
- Taking Stock of Greek Support-Verb Constructions: Synchronic and Diachronic Variability in the Documentary Papyri 297
- Hyperbaton in Herodotus: A Functional Discourse Grammar Perspective 315
- Adverb Placement in Demosthenes’ First Philippic 335
- Case Attraction in Infinitive Clauses: A Distributive Account 351
-
Part IV Syntax 2: Verb and Modality
- Dangling between Diachrony, Register and Atticism: A Language Ecology Approach to Modal Morphosyntax in Post-Classical Greek 373
- Information Source and Epistemic Modality in the Classical Usage of ἀνάγκη and ἀναγκαῖον 389
- The Preverb ἀντι- in Ancient Greek: From Space to Reciprocity 409
-
Part V Syntax 3: Coordination and Subordination
- Null-Subject Genitive Absolute and Co-Referentiality in 5th Cent. BCE Ionic and Attic Prose 435
- On the Oblique Optative in Ionic and Attic Prose Completive Sentences with ὥζ and ὅτι: Remarks Towards a Comparative Study 451
- Relativization of Syntactic and Semantic Functions in Classical Greek: A Case Study Based on Sophocles’ Heptad 467
- Backgrounding, Theticals and Periphrastic τυγχάν∊ιν 483
- βούλ∊ι/-∊σθ∊, θέλ∊ιζ/-∊τ∊ Plus Subjunctive in Classical Greek: Subordination or Coordination? 499
- Relative Clauses in Septuagint Greek: Some Preliminary Remarks 515
- Addition Clauses in Ancient Greek 535
- Pragmatic and Discursive Functions of Non-Canonical Conditional Sentences 553
- The Mixed Pattern and the Other Conjunctive Strategies in Herodotus’ Greek: An Analysis from a Typological Perspective 569
- Participle Constructions in Post-Classical Greek: The Example of the “Confessions” of Asia Minor 587
-
Part VI Pragmatics and Discourse
- Caesurae, Cola, and Discourse Acts: A Functional Discourse Grammar Approach to Homeric Colometry 609
- Vocative and ‘Terms of Address’ in the Odyssey 627
- The Pragmatics of Rhetorical Questions in Sophocles’ Tragedies: An Analysis of Antigone and Electra 641
- Verbal Impoliteness in Greek Oratory: The Case of οὗτοζ 659
- From Disjunct to Connective: The Particle οὖν in Herodotus’ Histories and its Association with Anaphoric Elements 671
- On the Use of the Interjection ὦ in the Dialogues of the Odyssey: An Analysis of (ὦ) γέρον, (ὦ) γύναι, and (ὦ) ξ∊ῖν∊ 683
- Structure and Function in Catalogic Discourse: The Case of Iliadic Androktasíai 701
-
Part VII Digital Research
- Linguistic Annotation for a Catalog of Ancient Greek Authors and Works 721
- Formulaic Networks as Prototypical Categories: Combining the Ancient Greek Dependency Treebank with the Ancient Greek WordNet for a Pilot Study on the Iliad 737
- Linguistic Complexity in Ancient Greek: Sentence Complexity and Digital Treebanks 759
- Representing Semantic Roles in Greek Treebanks 777
- “Proleptic” Arguments in the Greek Treebanks 795
- Tongue, Language or Noise? Word Sense Disambiguation in Ancient Greek with Corpus-Based Methods 813
- List of Contributors 829
- Index Locorum
- Index Rerum
Kapitel in diesem Buch
- frontmatter I
- Preface V
- Contents IX
- List of Figures and Diagrams XV
- List of Tables XVII
- Abbreviations XXI
-
Part I History of the Greek Language, Phonetics, Morphology
- Linguistic Variation and the Study of Ancient Greek Dialects 3
- Open Questions in Ancient Greek Phonology: Some New Evidence from Enclitics 33
- Post-Nasal Deaspiration in Ancient Greek: Mirage or Reality? 65
- Greek Verbs in -βω: A Survey 81
- Action Nouns in -τιζ/-σιζ as Second Members of Nominal Compounds in Greek 93
- The Syntax and Semantics of ([N+V]V) Verbal Compounds in Ancient Greek 107
- Σκορακίζω: ‘Curse (by Saying ἐζ κόρακαζ)’. About Delocutive Derivation in Ancient Greek and Performative 127
-
Part II Lexicon, Semantics
- Nature-based Metaphors as Body-part Terms in Ancient Greek: On καρπόζ ‘Wrist’ and ἀστράγαλοζ ‘Ankle(Bone)’ 147
- Grammaticalization of Adverbs in Ancient Greek: The Case of Homeric μάλα 159
- Smells like Metonymy 179
- Cultural Reconstruction through Linguistic Analysis: The Case of AG ταρχύω and ταριχ∊ύω 195
- On Hom. ἐπίφρων and πρόφρων in View of Homeric Human Physiology 211
- Analytical Constructions and Synthetic Encoding of Complex Predicates at the Semantics-Pragmatics Interface 225
-
Part III Syntax 1: Clause
- Number Agreement of a Predicate in Singular with Two or More Coordinated Noun Phrases in Nominative in Homer 247
- The Construction of the Verb μιμνήσκομαι in the Homeric Language 261
- On a Double Case Construction in Ancient Greek: The Whole-Part Construction in Homeric Greek 279
- Taking Stock of Greek Support-Verb Constructions: Synchronic and Diachronic Variability in the Documentary Papyri 297
- Hyperbaton in Herodotus: A Functional Discourse Grammar Perspective 315
- Adverb Placement in Demosthenes’ First Philippic 335
- Case Attraction in Infinitive Clauses: A Distributive Account 351
-
Part IV Syntax 2: Verb and Modality
- Dangling between Diachrony, Register and Atticism: A Language Ecology Approach to Modal Morphosyntax in Post-Classical Greek 373
- Information Source and Epistemic Modality in the Classical Usage of ἀνάγκη and ἀναγκαῖον 389
- The Preverb ἀντι- in Ancient Greek: From Space to Reciprocity 409
-
Part V Syntax 3: Coordination and Subordination
- Null-Subject Genitive Absolute and Co-Referentiality in 5th Cent. BCE Ionic and Attic Prose 435
- On the Oblique Optative in Ionic and Attic Prose Completive Sentences with ὥζ and ὅτι: Remarks Towards a Comparative Study 451
- Relativization of Syntactic and Semantic Functions in Classical Greek: A Case Study Based on Sophocles’ Heptad 467
- Backgrounding, Theticals and Periphrastic τυγχάν∊ιν 483
- βούλ∊ι/-∊σθ∊, θέλ∊ιζ/-∊τ∊ Plus Subjunctive in Classical Greek: Subordination or Coordination? 499
- Relative Clauses in Septuagint Greek: Some Preliminary Remarks 515
- Addition Clauses in Ancient Greek 535
- Pragmatic and Discursive Functions of Non-Canonical Conditional Sentences 553
- The Mixed Pattern and the Other Conjunctive Strategies in Herodotus’ Greek: An Analysis from a Typological Perspective 569
- Participle Constructions in Post-Classical Greek: The Example of the “Confessions” of Asia Minor 587
-
Part VI Pragmatics and Discourse
- Caesurae, Cola, and Discourse Acts: A Functional Discourse Grammar Approach to Homeric Colometry 609
- Vocative and ‘Terms of Address’ in the Odyssey 627
- The Pragmatics of Rhetorical Questions in Sophocles’ Tragedies: An Analysis of Antigone and Electra 641
- Verbal Impoliteness in Greek Oratory: The Case of οὗτοζ 659
- From Disjunct to Connective: The Particle οὖν in Herodotus’ Histories and its Association with Anaphoric Elements 671
- On the Use of the Interjection ὦ in the Dialogues of the Odyssey: An Analysis of (ὦ) γέρον, (ὦ) γύναι, and (ὦ) ξ∊ῖν∊ 683
- Structure and Function in Catalogic Discourse: The Case of Iliadic Androktasíai 701
-
Part VII Digital Research
- Linguistic Annotation for a Catalog of Ancient Greek Authors and Works 721
- Formulaic Networks as Prototypical Categories: Combining the Ancient Greek Dependency Treebank with the Ancient Greek WordNet for a Pilot Study on the Iliad 737
- Linguistic Complexity in Ancient Greek: Sentence Complexity and Digital Treebanks 759
- Representing Semantic Roles in Greek Treebanks 777
- “Proleptic” Arguments in the Greek Treebanks 795
- Tongue, Language or Noise? Word Sense Disambiguation in Ancient Greek with Corpus-Based Methods 813
- List of Contributors 829
- Index Locorum
- Index Rerum