Startseite Altertumswissenschaften & Ägyptologie Tongue, Language or Noise? Word Sense Disambiguation in Ancient Greek with Corpus-Based Methods
Kapitel Öffentlich zugänglich

Tongue, Language or Noise? Word Sense Disambiguation in Ancient Greek with Corpus-Based Methods

  • Wouter Mercelis

    Wouter Mercelis (1997) studied Classics, Linguistics and Artificial Intelligence at KU Leuven. He currently works on an industrial PhD project at KU Leuven and at Brepols Publishers. In this project, he investigates the use of artificial intelligence to create a multi-layered, annotated, interactive interface of the various classical texts in Brepols' database. His main research interests are thus morphological and syntactical tagging in both Latin and Ancient Greek, as well as word sense disambiguation, and word and sentence alignment.

    , Toon Van Hal

    Toon van Hal (1981) is professor at the University of Leuven, where he teaches courses in Ancient Greek linguistics and the history of linguistics. He holds degrees in classics, oriental studies and history from Leuven, Louvain-la-Neuve, Antwerp and Oslo. His research centers on premodern views on languages and linguistic thought, and on the use of digital technology to approach classical languages.

    und Alek Keersmaekers

    Alek Keersmaekers studied Greek and English Linguistics at the University of Leuven (Belgium). He obtained a Master’s degree in Linguistics and Literature in 2015 and Master’s degrees in General Linguistics and Artificial Intelligence in 2016. He joined the research group QLVL (Quantitative Lexicology and Variational Linguistics) at the University of Leuven in 2016, after acquiring a fellowship from Research Foundation Flanders (FWO). In 2020 he defended his PhD on corpus linguistics in the Greek papyri, supervised by Dirk Speelman and co-supervised by Toon Van Hal and Mark Depauw. Since 2021 he has been working as a post-doctoral researcher on projects on Greek derivational morphology and computational semantics. He created the GLAUx corpus of Ancient Greek (https://glaux.be/).

Veröffentlichen auch Sie bei De Gruyter Brill

Abstract

Corpus-based methods are underutilized in both intellectual history and the history of linguistics. This paper endeavors to demonstrate the potential for an automatically annotated corpus of Ancient Greek to enrich our understanding of intellectual history. It focuses on disambiguating the meaning of Ancient Greek words related to the concept of language by using corpus and natural language processing (NLP) methods. We adopt both a semasiological (meaning-focused) and onomasiological (word-focused) approach, with a primary focus on the terms γλῶττα and φωνή. To differentiate between their primary meanings, we employ both supervised and unsupervised techniques, relying on an ELECTRA model tailored to the Ancient Greek language. The results of our supervised approach indicate that a sample size of 150 sentences is sufficient to achieve stable precision and recall (around 0.90) in distinguishing the two main meanings of both γλῶττα and φωνή. Our initial attempt at using unsupervised techniques failed to clearly distinguish the two meanings of γλῶττα: the clusters formed were based on formal and morphological criteria, rather than semantic meaning. However, by applying a transformation to the original sentences, we were eventually able to plot fairly clear clusters based on meaning. This study is only a small step forward in the application of corpus-based methods to intellectual history. Further progress in unsupervised methods is necessary to further explore onomasiological approaches, offering promising perspectives for corpus-based investigations into intellectual history.

Abstract

Corpus-based methods are underutilized in both intellectual history and the history of linguistics. This paper endeavors to demonstrate the potential for an automatically annotated corpus of Ancient Greek to enrich our understanding of intellectual history. It focuses on disambiguating the meaning of Ancient Greek words related to the concept of language by using corpus and natural language processing (NLP) methods. We adopt both a semasiological (meaning-focused) and onomasiological (word-focused) approach, with a primary focus on the terms γλῶττα and φωνή. To differentiate between their primary meanings, we employ both supervised and unsupervised techniques, relying on an ELECTRA model tailored to the Ancient Greek language. The results of our supervised approach indicate that a sample size of 150 sentences is sufficient to achieve stable precision and recall (around 0.90) in distinguishing the two main meanings of both γλῶττα and φωνή. Our initial attempt at using unsupervised techniques failed to clearly distinguish the two meanings of γλῶττα: the clusters formed were based on formal and morphological criteria, rather than semantic meaning. However, by applying a transformation to the original sentences, we were eventually able to plot fairly clear clusters based on meaning. This study is only a small step forward in the application of corpus-based methods to intellectual history. Further progress in unsupervised methods is necessary to further explore onomasiological approaches, offering promising perspectives for corpus-based investigations into intellectual history.

Kapitel in diesem Buch

  1. frontmatter I
  2. Preface V
  3. Contents IX
  4. List of Figures and Diagrams XV
  5. List of Tables XVII
  6. Abbreviations XXI
  7. Part I History of the Greek Language, Phonetics, Morphology
  8. Linguistic Variation and the Study of Ancient Greek Dialects 3
  9. Open Questions in Ancient Greek Phonology: Some New Evidence from Enclitics 33
  10. Post-Nasal Deaspiration in Ancient Greek: Mirage or Reality? 65
  11. Greek Verbs in -βω: A Survey 81
  12. Action Nouns in -τιζ/-σιζ as Second Members of Nominal Compounds in Greek 93
  13. The Syntax and Semantics of ([N+V]V) Verbal Compounds in Ancient Greek 107
  14. Σκορακίζω: ‘Curse (by Saying ἐζ κόρακαζ)’. About Delocutive Derivation in Ancient Greek and Performative 127
  15. Part II Lexicon, Semantics
  16. Nature-based Metaphors as Body-part Terms in Ancient Greek: On καρπόζ ‘Wrist’ and ἀστράγαλοζ ‘Ankle(Bone)’ 147
  17. Grammaticalization of Adverbs in Ancient Greek: The Case of Homeric μάλα 159
  18. Smells like Metonymy 179
  19. Cultural Reconstruction through Linguistic Analysis: The Case of AG ταρχύω and ταριχ∊ύω 195
  20. On Hom. ἐπίφρων and πρόφρων in View of Homeric Human Physiology 211
  21. Analytical Constructions and Synthetic Encoding of Complex Predicates at the Semantics-Pragmatics Interface 225
  22. Part III Syntax 1: Clause
  23. Number Agreement of a Predicate in Singular with Two or More Coordinated Noun Phrases in Nominative in Homer 247
  24. The Construction of the Verb μιμνήσκομαι in the Homeric Language 261
  25. On a Double Case Construction in Ancient Greek: The Whole-Part Construction in Homeric Greek 279
  26. Taking Stock of Greek Support-Verb Constructions: Synchronic and Diachronic Variability in the Documentary Papyri 297
  27. Hyperbaton in Herodotus: A Functional Discourse Grammar Perspective 315
  28. Adverb Placement in Demosthenes’ First Philippic 335
  29. Case Attraction in Infinitive Clauses: A Distributive Account 351
  30. Part IV Syntax 2: Verb and Modality
  31. Dangling between Diachrony, Register and Atticism: A Language Ecology Approach to Modal Morphosyntax in Post-Classical Greek 373
  32. Information Source and Epistemic Modality in the Classical Usage of ἀνάγκη and ἀναγκαῖον 389
  33. The Preverb ἀντι- in Ancient Greek: From Space to Reciprocity 409
  34. Part V Syntax 3: Coordination and Subordination
  35. Null-Subject Genitive Absolute and Co-Referentiality in 5th Cent. BCE Ionic and Attic Prose 435
  36. On the Oblique Optative in Ionic and Attic Prose Completive Sentences with ὥζ and ὅτι: Remarks Towards a Comparative Study 451
  37. Relativization of Syntactic and Semantic Functions in Classical Greek: A Case Study Based on Sophocles’ Heptad 467
  38. Backgrounding, Theticals and Periphrastic τυγχάν∊ιν 483
  39. βούλ∊ι/-∊σθ∊, θέλ∊ιζ/-∊τ∊ Plus Subjunctive in Classical Greek: Subordination or Coordination? 499
  40. Relative Clauses in Septuagint Greek: Some Preliminary Remarks 515
  41. Addition Clauses in Ancient Greek 535
  42. Pragmatic and Discursive Functions of Non-Canonical Conditional Sentences 553
  43. The Mixed Pattern and the Other Conjunctive Strategies in Herodotus’ Greek: An Analysis from a Typological Perspective 569
  44. Participle Constructions in Post-Classical Greek: The Example of the “Confessions” of Asia Minor 587
  45. Part VI Pragmatics and Discourse
  46. Caesurae, Cola, and Discourse Acts: A Functional Discourse Grammar Approach to Homeric Colometry 609
  47. Vocative and ‘Terms of Address’ in the Odyssey 627
  48. The Pragmatics of Rhetorical Questions in Sophocles’ Tragedies: An Analysis of Antigone and Electra 641
  49. Verbal Impoliteness in Greek Oratory: The Case of οὗτοζ 659
  50. From Disjunct to Connective: The Particle οὖν in Herodotus’ Histories and its Association with Anaphoric Elements 671
  51. On the Use of the Interjection ὦ in the Dialogues of the Odyssey: An Analysis of (ὦ) γέρον, (ὦ) γύναι, and (ὦ) ξ∊ῖν∊ 683
  52. Structure and Function in Catalogic Discourse: The Case of Iliadic Androktasíai 701
  53. Part VII Digital Research
  54. Linguistic Annotation for a Catalog of Ancient Greek Authors and Works 721
  55. Formulaic Networks as Prototypical Categories: Combining the Ancient Greek Dependency Treebank with the Ancient Greek WordNet for a Pilot Study on the Iliad 737
  56. Linguistic Complexity in Ancient Greek: Sentence Complexity and Digital Treebanks 759
  57. Representing Semantic Roles in Greek Treebanks 777
  58. “Proleptic” Arguments in the Greek Treebanks 795
  59. Tongue, Language or Noise? Word Sense Disambiguation in Ancient Greek with Corpus-Based Methods 813
  60. List of Contributors 829
  61. Index Locorum
  62. Index Rerum
Heruntergeladen am 17.10.2025 von https://www.degruyterbrill.com/document/doi/10.1515/9783111648644-046/html
Button zum nach oben scrollen