Cluster analysis
-
Dagmar Divjak
Abstract
Cluster analysis is an exploratory data analysis technique, encompassing a number of different algorithms and methods for sorting objects into groups. Cluster analysis requires the analyst to make choices about dissimilarity measures, grouping algorithms, etc., and these choices are difficult to make without an understanding of their theoretical implications and a very good understanding of the data. This chapter provides an introduction to the distance measures and clustering algorithms most commonly used for cluster analytic work. Different from Baayen (2008), Johnson (2008) and Gries (2009), its main aim is to equip the researcher with at least a basic understanding of what is happening behind the scenes when a dataset is explored with the help of a particular cluster analytic technique.
Abstract
Cluster analysis is an exploratory data analysis technique, encompassing a number of different algorithms and methods for sorting objects into groups. Cluster analysis requires the analyst to make choices about dissimilarity measures, grouping algorithms, etc., and these choices are difficult to make without an understanding of their theoretical implications and a very good understanding of the data. This chapter provides an introduction to the distance measures and clustering algorithms most commonly used for cluster analytic work. Different from Baayen (2008), Johnson (2008) and Gries (2009), its main aim is to equip the researcher with at least a basic understanding of what is happening behind the scenes when a dataset is explored with the help of a particular cluster analytic technique.
Kapitel in diesem Buch
- Prelim pages i
- Table of contents v
- Contributors vii
- Outline 1
-
Section 1. Polysemy and synonymy
- Polysemy and synonymy 7
- Competing ‘transfer’ constructions in Dutch 39
- Rethinking constructional polysemy 61
- Quantifying polysemy in Cognitive Sociolinguistics 87
- The many uses of run 117
- Visualizing distances in a set of near-synonyms 145
- A case for the multifactorial assessment of learner language 179
- Dutch causative constructions 205
- The semasiological structure of Polish myśleć ‘to think’ 223
- A multifactorial corpus analysis of grammatical synonymy 253
- A diachronic corpus-based multivariate analysis of “I think that” vs. “I think zero” 279
-
Section 2. Statistical techniques
- Techniques and tools 307
- Statistics in R 343
- Frequency tables 365
- Collostructional analysis 391
- Cluster analysis 405
- Correspondence analysis 443
- Logistic regression 487
- Name index 535
- Subject index 541
Kapitel in diesem Buch
- Prelim pages i
- Table of contents v
- Contributors vii
- Outline 1
-
Section 1. Polysemy and synonymy
- Polysemy and synonymy 7
- Competing ‘transfer’ constructions in Dutch 39
- Rethinking constructional polysemy 61
- Quantifying polysemy in Cognitive Sociolinguistics 87
- The many uses of run 117
- Visualizing distances in a set of near-synonyms 145
- A case for the multifactorial assessment of learner language 179
- Dutch causative constructions 205
- The semasiological structure of Polish myśleć ‘to think’ 223
- A multifactorial corpus analysis of grammatical synonymy 253
- A diachronic corpus-based multivariate analysis of “I think that” vs. “I think zero” 279
-
Section 2. Statistical techniques
- Techniques and tools 307
- Statistics in R 343
- Frequency tables 365
- Collostructional analysis 391
- Cluster analysis 405
- Correspondence analysis 443
- Logistic regression 487
- Name index 535
- Subject index 541