What matters more: The size of the corpora or their quality?
-
Ruslan Mitkov
Abstract
This study investigates (and compares) the impact of the size and the similarity/quality of comparable corpora on the specific task of extracting translation equivalents of verb-noun collocations from such corpora. The comprehensive evaluation of different configurations of English and Spanish corpora sheds some light on the more general and perennial question: what matters more – the quantity or quality of corpora?
Abstract
This study investigates (and compares) the impact of the size and the similarity/quality of comparable corpora on the specific task of extracting translation equivalents of verb-noun collocations from such corpora. The comprehensive evaluation of different configurations of English and Spanish corpora sheds some light on the more general and perennial question: what matters more – the quantity or quality of corpora?
Chapters in this book
- Prelim pages i
- Table of contents v
- Foreword vii
- Introduction 1
- Monocollocable words 9
- Translation asymmetries of multiword expressions in machine translation 23
- German constructional phrasemes and their Russian counterparts 43
- Computational phraseology and translation studies 65
- Computational extraction of formulaic sequences from corpora 83
- Computational phraseology discovery in corpora with the mwetoolkit 111
- Multiword expressions in comparable corpora 135
- Collecting collocations from general and specialised corpora 151
- What matters more: The size of the corpora or their quality? 177
- Statistical significance for measures of collocation strength 189
- Verbal collocations and pronominalisation 207
- Empirical variability of Italian multiword expressions as a useful feature for their categorisation 225
- Too big to fail but big enough to pay for their mistakes 247
- Multi-word patterns and networks 273
- How context determines meaning 297
- Detecting semantic difference 311
- Index 325
Chapters in this book
- Prelim pages i
- Table of contents v
- Foreword vii
- Introduction 1
- Monocollocable words 9
- Translation asymmetries of multiword expressions in machine translation 23
- German constructional phrasemes and their Russian counterparts 43
- Computational phraseology and translation studies 65
- Computational extraction of formulaic sequences from corpora 83
- Computational phraseology discovery in corpora with the mwetoolkit 111
- Multiword expressions in comparable corpora 135
- Collecting collocations from general and specialised corpora 151
- What matters more: The size of the corpora or their quality? 177
- Statistical significance for measures of collocation strength 189
- Verbal collocations and pronominalisation 207
- Empirical variability of Italian multiword expressions as a useful feature for their categorisation 225
- Too big to fail but big enough to pay for their mistakes 247
- Multi-word patterns and networks 273
- How context determines meaning 297
- Detecting semantic difference 311
- Index 325