Exploring BERT’s contextualized word embeddings: a suitable method for a lexicography-oriented analysis of argument structures?
-
Fritz Kliche
und Laura Giacomini
Abstract
According to the approach at the core of the PhraseBase project (DiMuccio- Failla/Giacomini 2022, 2017), each sense of a verb corresponds to a specific argument structure filled with specific semantic types and roles. In a case study on the English verb follow, we use contextualized word embeddings generated by BERT (Devlin et al. 2019) for the disambiguation of word senses. We retrieve instances of follow and subsequent NPs and PPs from the British National Corpus. Using k-means, we cluster the vector representations of follow depending on their contexts. We find clusters with both semantic and syntactic features, and discuss if these clusters can be mapped onto lexical units distinguished in the entry for follow in the project-related Phrase-based Active Dictionary (PAD) model. We find clusters with a coherent interpretation, as well as collocations not recorded in the PAD entry of follow, and argue that contextualized word embeddings can be used to gain useful semantic information for lexicography.
Abstract
According to the approach at the core of the PhraseBase project (DiMuccio- Failla/Giacomini 2022, 2017), each sense of a verb corresponds to a specific argument structure filled with specific semantic types and roles. In a case study on the English verb follow, we use contextualized word embeddings generated by BERT (Devlin et al. 2019) for the disambiguation of word senses. We retrieve instances of follow and subsequent NPs and PPs from the British National Corpus. Using k-means, we cluster the vector representations of follow depending on their contexts. We find clusters with both semantic and syntactic features, and discuss if these clusters can be mapped onto lexical units distinguished in the entry for follow in the project-related Phrase-based Active Dictionary (PAD) model. We find clusters with a coherent interpretation, as well as collocations not recorded in the PAD entry of follow, and argue that contextualized word embeddings can be used to gain useful semantic information for lexicography.
Kapitel in diesem Buch
- Frontmatter I
- Contents V
- Patterns of meaning in lexicography and lexicology 1
-
Section 1: Lexicographical issues: The phraseological dimension of language in learner’s lexicography and the PhraseBase project
- Introduction to the PhraseBase project 15
- A theory for a usage-based cognitive lexicography 19
- Exploring BERT’s contextualized word embeddings: a suitable method for a lexicography-oriented analysis of argument structures? 91
- Towards a phrase-based active dictionary 111
-
Section 2: Theoretical issues
- Verb senses and argument semantics: From linguistic theory to lexicographic practice 119
- Valency vs. Patterns: What do corpora tell us about argument structure? 139
- Layer upon layer, mistake after mistake – a case for learner’s dictionaries? 159
- Patterns of meanings between syntax and lexicon. a lexicological and lexicographic overview of italian partially lexically specified constructions 181
- A carry-coals-to-Newcastle exercise: The nature of phraseological units and their place in a constructicon of english 207
-
Section 3: Methodological issues
- Language awareness as a prerequisite for a successful use of lexicographic resources 239
- Regular polysemy in Spanish nouns: corpus analysis and some implications for lexicography 257
- No word is an island: The phraseological nature of lemma in interlingual comparison 277
- Analysing, compiling, and representing argument pattern structures: From form to meaning and back 297
- Index 317
Kapitel in diesem Buch
- Frontmatter I
- Contents V
- Patterns of meaning in lexicography and lexicology 1
-
Section 1: Lexicographical issues: The phraseological dimension of language in learner’s lexicography and the PhraseBase project
- Introduction to the PhraseBase project 15
- A theory for a usage-based cognitive lexicography 19
- Exploring BERT’s contextualized word embeddings: a suitable method for a lexicography-oriented analysis of argument structures? 91
- Towards a phrase-based active dictionary 111
-
Section 2: Theoretical issues
- Verb senses and argument semantics: From linguistic theory to lexicographic practice 119
- Valency vs. Patterns: What do corpora tell us about argument structure? 139
- Layer upon layer, mistake after mistake – a case for learner’s dictionaries? 159
- Patterns of meanings between syntax and lexicon. a lexicological and lexicographic overview of italian partially lexically specified constructions 181
- A carry-coals-to-Newcastle exercise: The nature of phraseological units and their place in a constructicon of english 207
-
Section 3: Methodological issues
- Language awareness as a prerequisite for a successful use of lexicographic resources 239
- Regular polysemy in Spanish nouns: corpus analysis and some implications for lexicography 257
- No word is an island: The phraseological nature of lemma in interlingual comparison 277
- Analysing, compiling, and representing argument pattern structures: From form to meaning and back 297
- Index 317