Home Linguistics & Semiotics Modeling fine-grained sociolinguistic variation
Chapter
Licensed
Unlicensed Requires Authentication

Modeling fine-grained sociolinguistic variation

The promises and pitfalls of Twitter corpora and neural word embeddings
  • Filip Miletic , Anne Przewozny-Desriaux and Ludovic Tanguy
View more publications by John Benjamins Publishing Company
Challenges in Corpus Linguistics
This chapter is in the book Challenges in Corpus Linguistics

Abstract

This chapter examines the use of recent data sources and computational methods to study fine-grained sociolinguistic phenomena. We deploy a custom-built corpus of tweets (Miletić et al. 2020) and neural word embeddings to investigate the use of contact-induced semantic shifts in Quebec English. Drawing on an analysis of 40 lexical items, we show that our approach is beneficial in facilitating manual inspection of vast amounts of data and establishing fine-grained patterns of language variation. While it is affected by a range of noise-related issues, which we describe in detail, coarse-grained annotation provides an efficient way of circumventing them. We use the results filtered in this way to conduct a quantitative analysis of sociolinguistic constraints on contact-induced semantic shifts, further confirming the relevance of our approach.

Abstract

This chapter examines the use of recent data sources and computational methods to study fine-grained sociolinguistic phenomena. We deploy a custom-built corpus of tweets (Miletić et al. 2020) and neural word embeddings to investigate the use of contact-induced semantic shifts in Quebec English. Drawing on an analysis of 40 lexical items, we show that our approach is beneficial in facilitating manual inspection of vast amounts of data and establishing fine-grained patterns of language variation. While it is affected by a range of noise-related issues, which we describe in detail, coarse-grained annotation provides an efficient way of circumventing them. We use the results filtered in this way to conduct a quantitative analysis of sociolinguistic constraints on contact-induced semantic shifts, further confirming the relevance of our approach.

Downloaded on 16.2.2026 from https://www.degruyterbrill.com/document/doi/10.1075/scl.118.09mil/html
Scroll to top button