Computational extraction of formulaic sequences from corpora: Two case studies of a new extraction algorithm

Alexander Wahl; Stefan Th. Gries

Chapter

Computational extraction of formulaic sequences from corpora

Two case studies of a new extraction algorithm

Alexander Wahl and Stefan Th. Gries

Published by

John Benjamins Publishing Company

View more publications by John Benjamins Publishing Company

This chapter is in the book Computational Phraseology

Abstract

We describe a new algorithm for the extraction of formulaic language from corpora. Entitled MERGE (Multi-word Expressions from the Recursive Grouping of Elements), it iteratively combines adjacent bigrams into progressively longer sequences based on lexical association strengths. We then provide empirical evidence for this approach via two case studies. First, we compare the performance of MERGE to that of another algorithm by examining the outputs of the approaches compared with manually annotated formulaic sequences from the spoken component of the British National Corpus. Second, we employ two child language corpora to examine whether MERGE can predict the formulas that the children learn based on caregiver input. Ultimately, we show that MERGE indeed performs well, offering a powerful approach for the extraction of formulas.

You are currently not able to access this content.

Abstract

You are currently not able to access this content.

Chapters in this book

https://doi.org/10.1075/ivitra.24.05wah

Computational extraction of formulaic sequences from corpora

Abstract

Chapter PDF View

Abstract

Chapters in this book

Chapters in this book

Chapters in this book