Chapter 5. Constance and variability
-
Antonio Pinna
Abstract
This paper describes the use of a corpus-driven methodology, the retrieval of part-of-speech-grams (PoS-grams), which is extremely effective for the discovery of phraseologies that might otherwise remain hidden. The PoS-gram is a string of part-of-speech categories (Stubbs 2007: 91), the tokens of which are strings of words that have been annotated with these PoS tags. A list of PoS-grams retrieved from a sample corpus can be compared with that from a reference corpus. Statistically significant items are further analysed to identify recurrent patterns and potential phraseologies. The utility of PoS-grams will be illustrated by way of analysis of a one million token corpus composed of texts from ten sections of The Guardian, the Sassari Newspaper Article Corpus (SNAC).
Abstract
This paper describes the use of a corpus-driven methodology, the retrieval of part-of-speech-grams (PoS-grams), which is extremely effective for the discovery of phraseologies that might otherwise remain hidden. The PoS-gram is a string of part-of-speech categories (Stubbs 2007: 91), the tokens of which are strings of words that have been annotated with these PoS tags. A list of PoS-grams retrieved from a sample corpus can be compared with that from a reference corpus. Statistically significant items are further analysed to identify recurrent patterns and potential phraseologies. The utility of PoS-grams will be illustrated by way of analysis of a one million token corpus composed of texts from ten sections of The Guardian, the Sassari Newspaper Article Corpus (SNAC).
Chapters in this book
- Prelim pages i
- Table of contents v
- Acknowledgements vii
- Chapter 1. Present applications and future directions in pattern-driven approaches to corpus linguistics 1
-
Part I. Methodological explorations
- Chapter 2. From lexical bundles to surprisal and language models 15
- Chapter 3. Fine-tuning lexical bundles 57
- Chapter 4. Lexical obsolescence and loss in English: 1700–2000 81
-
Part II. Patterns in utilitarian texts
- Chapter 5. Constance and variability 107
- Chapter 6. Between corpus-based and corpus-driven approaches to textual recurrence 131
- Chapter 7. Lexical bundles in Early Modern and Present-day English Acts of Parliament 159
-
Part III. Patterns in online texts
- Chapter 8. Lexical bundles in Wikipedia articles and related texts 189
- Chapter 9. Join us for this 213
- Chapter 10. I don’t want to and don’t get me wrong 251
- Chapter 11. Blogging around the world 277
- Index 311
Chapters in this book
- Prelim pages i
- Table of contents v
- Acknowledgements vii
- Chapter 1. Present applications and future directions in pattern-driven approaches to corpus linguistics 1
-
Part I. Methodological explorations
- Chapter 2. From lexical bundles to surprisal and language models 15
- Chapter 3. Fine-tuning lexical bundles 57
- Chapter 4. Lexical obsolescence and loss in English: 1700–2000 81
-
Part II. Patterns in utilitarian texts
- Chapter 5. Constance and variability 107
- Chapter 6. Between corpus-based and corpus-driven approaches to textual recurrence 131
- Chapter 7. Lexical bundles in Early Modern and Present-day English Acts of Parliament 159
-
Part III. Patterns in online texts
- Chapter 8. Lexical bundles in Wikipedia articles and related texts 189
- Chapter 9. Join us for this 213
- Chapter 10. I don’t want to and don’t get me wrong 251
- Chapter 11. Blogging around the world 277
- Index 311