Cluster analysis: Finding structure in linguistic data

Dagmar Divjak; Nick Fieller

Chapter

Cluster analysis

Finding structure in linguistic data

Dagmar Divjak and Nick Fieller

Published by

View more publications by John Benjamins Publishing Company

To Publisher Page

This chapter is in the book Corpus Methods for Semantics

Abstract

Cluster analysis is an exploratory data analysis technique, encompassing a number of different algorithms and methods for sorting objects into groups. Cluster analysis requires the analyst to make choices about dissimilarity measures, grouping algorithms, etc., and these choices are difficult to make without an understanding of their theoretical implications and a very good understanding of the data. This chapter provides an introduction to the distance measures and clustering algorithms most commonly used for cluster analytic work. Different from Baayen (2008), Johnson (2008) and Gries (2009), its main aim is to equip the researcher with at least a basic understanding of what is happening behind the scenes when a dataset is explored with the help of a particular cluster analytic technique.

You are currently not able to access this content.

Abstract

You are currently not able to access this content.

Chapters in this book

Prelim pages i
Table of contents v
Contributors vii
Outline 1
Section 1. Polysemy and synonymy
Polysemy and synonymy 7
Competing ‘transfer’ constructions in Dutch 39
Rethinking constructional polysemy 61
Quantifying polysemy in Cognitive Sociolinguistics 87
The many uses of run 117
Visualizing distances in a set of near-synonyms 145
A case for the multifactorial assessment of learner language 179
Dutch causative constructions 205
The semasiological structure of Polish myśleć ‘to think’ 223
A multifactorial corpus analysis of grammatical synonymy 253
A diachronic corpus-based multivariate analysis of “I think that” vs. “I think zero” 279
Section 2. Statistical techniques
Techniques and tools 307
Statistics in R 343
Frequency tables 365
Collostructional analysis 391
Cluster analysis 405
Correspondence analysis 443
Logistic regression 487
Name index 535
Subject index 541

https://doi.org/10.1075/hcp.43.16div

Chapters in this book

Prelim pages i
Table of contents v
Contributors vii
Outline 1
Section 1. Polysemy and synonymy
Polysemy and synonymy 7
Competing ‘transfer’ constructions in Dutch 39
Rethinking constructional polysemy 61
Quantifying polysemy in Cognitive Sociolinguistics 87
The many uses of run 117
Visualizing distances in a set of near-synonyms 145
A case for the multifactorial assessment of learner language 179
Dutch causative constructions 205
The semasiological structure of Polish myśleć ‘to think’ 223
A multifactorial corpus analysis of grammatical synonymy 253
A diachronic corpus-based multivariate analysis of “I think that” vs. “I think zero” 279
Section 2. Statistical techniques
Techniques and tools 307
Statistics in R 343
Frequency tables 365
Collostructional analysis 391
Cluster analysis 405
Correspondence analysis 443
Logistic regression 487
Name index 535
Subject index 541

Cluster analysis

Abstract

Chapter PDF View

Abstract

Chapters in this book

Chapters in this book

Chapters in this book