Chapter 7. Constructing corpora from images and text
-
Alex Christiansen
Abstract
Visual analysis represents a significant oversight in the corpus literature, and possibly one that may lead to unintended omissions, particularly when analysing social media. In this chapter we introduce Visual Constituent Analysis (VCA), a method of multimodal corpus construction that allows researchers to construct and analyse visual aspects of online media in large-scale corpora. The chapter addresses the shortcomings of a purely textual approach to discourse analysis when dealing with social media texts and offers a solution using computer ‘Vision’-based image annotation (in our case Google Cloud Vision). Finally, we demonstrate how our approach can be used to analyse a sample of 150,000 micro-blog posts from Twitter and show the difference in level of user interaction with combined image/texts over language-only social media texts.
Abstract
Visual analysis represents a significant oversight in the corpus literature, and possibly one that may lead to unintended omissions, particularly when analysing social media. In this chapter we introduce Visual Constituent Analysis (VCA), a method of multimodal corpus construction that allows researchers to construct and analyse visual aspects of online media in large-scale corpora. The chapter addresses the shortcomings of a purely textual approach to discourse analysis when dealing with social media texts and offers a solution using computer ‘Vision’-based image annotation (in our case Google Cloud Vision). Finally, we demonstrate how our approach can be used to analyse a sample of 150,000 micro-blog posts from Twitter and show the difference in level of user interaction with combined image/texts over language-only social media texts.
Chapters in this book
- Prelim pages i
- Table of contents v
- Introduction. The expanding landscape of corpus-based studies of social media language 1
-
Part 1. Using corpus methods to investigate communities on social media
- Chapter 1. Towards a digital sociolinguistics 15
- Chapter 2. The control and censorship of linguistic resources in an online Community of Practice 41
- Chapter 3. Talking about women 63
-
Part II. Linguistic variation in short social media texts
- Chapter 4. Patterns of intra-individual variation in a Swiss WhatsApp corpus 89
- Chapter 5. Using lengthwise scaling to compare feature frequencies across text lengths on Reddit 111
- Chapter 6. Double trouble 131
-
Part III. The role of images
- Chapter 7. Constructing corpora from images and text 149
- Chapter 8. Working with images and emoji in the 🦆 Dukki Facebook Corpus 175
-
Part IV. Discussion
- Chapter 9. New developments in corpus approaches to social media 199
- Index 209
Chapters in this book
- Prelim pages i
- Table of contents v
- Introduction. The expanding landscape of corpus-based studies of social media language 1
-
Part 1. Using corpus methods to investigate communities on social media
- Chapter 1. Towards a digital sociolinguistics 15
- Chapter 2. The control and censorship of linguistic resources in an online Community of Practice 41
- Chapter 3. Talking about women 63
-
Part II. Linguistic variation in short social media texts
- Chapter 4. Patterns of intra-individual variation in a Swiss WhatsApp corpus 89
- Chapter 5. Using lengthwise scaling to compare feature frequencies across text lengths on Reddit 111
- Chapter 6. Double trouble 131
-
Part III. The role of images
- Chapter 7. Constructing corpora from images and text 149
- Chapter 8. Working with images and emoji in the 🦆 Dukki Facebook Corpus 175
-
Part IV. Discussion
- Chapter 9. New developments in corpus approaches to social media 199
- Index 209