Modeling fine-grained sociolinguistic variation
-
Filip Miletic
, Anne Przewozny-Desriaux and Ludovic Tanguy
Abstract
This chapter examines the use of recent data sources and computational methods to study fine-grained sociolinguistic phenomena. We deploy a custom-built corpus of tweets (Miletić et al. 2020) and neural word embeddings to investigate the use of contact-induced semantic shifts in Quebec English. Drawing on an analysis of 40 lexical items, we show that our approach is beneficial in facilitating manual inspection of vast amounts of data and establishing fine-grained patterns of language variation. While it is affected by a range of noise-related issues, which we describe in detail, coarse-grained annotation provides an efficient way of circumventing them. We use the results filtered in this way to conduct a quantitative analysis of sociolinguistic constraints on contact-induced semantic shifts, further confirming the relevance of our approach.
Abstract
This chapter examines the use of recent data sources and computational methods to study fine-grained sociolinguistic phenomena. We deploy a custom-built corpus of tweets (Miletić et al. 2020) and neural word embeddings to investigate the use of contact-induced semantic shifts in Quebec English. Drawing on an analysis of 40 lexical items, we show that our approach is beneficial in facilitating manual inspection of vast amounts of data and establishing fine-grained patterns of language variation. While it is affected by a range of noise-related issues, which we describe in detail, coarse-grained annotation provides an efficient way of circumventing them. We use the results filtered in this way to conduct a quantitative analysis of sociolinguistic constraints on contact-induced semantic shifts, further confirming the relevance of our approach.
Chapters in this book
- 日本言語政策学会 / Japan Association for Language Policy. 言語政策 / Language Policy 10. 2014 i
- Table of contents v
- Acknowledgements vii
- From fallacies and pitfalls to solutions and future directions 1
- Engaging with bad (meta)data in historical corpus linguistics 9
- Named entities as potentially problematic items in corpora 35
- Challenges in the compilation, annotation, and analysis of learner corpus data 55
- Early newspapers as data for corpus linguistics (and Digital Humanities) 68
- Open Corpus Linguistics – or How to overcome common problems in dealing with corpus data by adopting open research practices 89
- Text length and short texts 106
- Corpus genre categories 126
- Modeling fine-grained sociolinguistic variation 142
- Subject index 171
Chapters in this book
- 日本言語政策学会 / Japan Association for Language Policy. 言語政策 / Language Policy 10. 2014 i
- Table of contents v
- Acknowledgements vii
- From fallacies and pitfalls to solutions and future directions 1
- Engaging with bad (meta)data in historical corpus linguistics 9
- Named entities as potentially problematic items in corpora 35
- Challenges in the compilation, annotation, and analysis of learner corpus data 55
- Early newspapers as data for corpus linguistics (and Digital Humanities) 68
- Open Corpus Linguistics – or How to overcome common problems in dealing with corpus data by adopting open research practices 89
- Text length and short texts 106
- Corpus genre categories 126
- Modeling fine-grained sociolinguistic variation 142
- Subject index 171