Startseite Linguistik & Semiotik Exploring BERT’s contextualized word embeddings: a suitable method for a lexicography-oriented analysis of argument structures?
Kapitel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Exploring BERT’s contextualized word embeddings: a suitable method for a lexicography-oriented analysis of argument structures?

  • Fritz Kliche und Laura Giacomini
Veröffentlichen auch Sie bei De Gruyter Brill

Abstract

According to the approach at the core of the PhraseBase project (DiMuccio- Failla/Giacomini 2022, 2017), each sense of a verb corresponds to a specific argument structure filled with specific semantic types and roles. In a case study on the English verb follow, we use contextualized word embeddings generated by BERT (Devlin et al. 2019) for the disambiguation of word senses. We retrieve instances of follow and subsequent NPs and PPs from the British National Corpus. Using k-means, we cluster the vector representations of follow depending on their contexts. We find clusters with both semantic and syntactic features, and discuss if these clusters can be mapped onto lexical units distinguished in the entry for follow in the project-related Phrase-based Active Dictionary (PAD) model. We find clusters with a coherent interpretation, as well as collocations not recorded in the PAD entry of follow, and argue that contextualized word embeddings can be used to gain useful semantic information for lexicography.

Abstract

According to the approach at the core of the PhraseBase project (DiMuccio- Failla/Giacomini 2022, 2017), each sense of a verb corresponds to a specific argument structure filled with specific semantic types and roles. In a case study on the English verb follow, we use contextualized word embeddings generated by BERT (Devlin et al. 2019) for the disambiguation of word senses. We retrieve instances of follow and subsequent NPs and PPs from the British National Corpus. Using k-means, we cluster the vector representations of follow depending on their contexts. We find clusters with both semantic and syntactic features, and discuss if these clusters can be mapped onto lexical units distinguished in the entry for follow in the project-related Phrase-based Active Dictionary (PAD) model. We find clusters with a coherent interpretation, as well as collocations not recorded in the PAD entry of follow, and argue that contextualized word embeddings can be used to gain useful semantic information for lexicography.

Heruntergeladen am 23.1.2026 von https://www.degruyterbrill.com/document/doi/10.1515/9783111545943-004/html
Button zum nach oben scrollen