Modeling fine-grained sociolinguistic variation: The promises and pitfalls of Twitter corpora and neural word embeddings

Filip Miletic; Anne Przewozny-Desriaux; Ludovic Tanguy

Chapter

Modeling fine-grained sociolinguistic variation

The promises and pitfalls of Twitter corpora and neural word embeddings

Filip Miletic , Anne Przewozny-Desriaux and Ludovic Tanguy

Published by

View more publications by John Benjamins Publishing Company

To Publisher Page

This chapter is in the book Challenges in Corpus Linguistics

Abstract

This chapter examines the use of recent data sources and computational methods to study fine-grained sociolinguistic phenomena. We deploy a custom-built corpus of tweets (Miletić et al. 2020) and neural word embeddings to investigate the use of contact-induced semantic shifts in Quebec English. Drawing on an analysis of 40 lexical items, we show that our approach is beneficial in facilitating manual inspection of vast amounts of data and establishing fine-grained patterns of language variation. While it is affected by a range of noise-related issues, which we describe in detail, coarse-grained annotation provides an efficient way of circumventing them. We use the results filtered in this way to conduct a quantitative analysis of sociolinguistic constraints on contact-induced semantic shifts, further confirming the relevance of our approach.

You are currently not able to access this content.

Abstract

You are currently not able to access this content.

Chapters in this book

https://doi.org/10.1075/scl.118.09mil

Modeling fine-grained sociolinguistic variation

Abstract

Chapter PDF View

Abstract

Chapters in this book

Chapters in this book

Chapters in this book