Multiword expressions in comparable corpora
-
Peter Ďurčo
Abstract
On the basis of Aranea Gigaword Web corpora, a family of comparable corpora intended for use in contrastive linguistic research, multilingual lexicography, language teaching and translation studies we discuss the pros and cons of comparable corpora in contrast to monolingual and parallel corpora for the analysis of multiword entities (MWEs). We demonstrate that by using large corpora for two or more languages, consisting of unrelated texts, yet created in a comparable manner, parallel language structures and phenomena like MWEs can be identified if the appropriate tools are employed. With the Aranea corpora, the “bilingual sketch” functionality of the Sketch Engine is one such tool which provides a new approach for analyses of similarities of (or differences between) collocation profiles (word sketches) for words and their translation equivalents.
Abstract
On the basis of Aranea Gigaword Web corpora, a family of comparable corpora intended for use in contrastive linguistic research, multilingual lexicography, language teaching and translation studies we discuss the pros and cons of comparable corpora in contrast to monolingual and parallel corpora for the analysis of multiword entities (MWEs). We demonstrate that by using large corpora for two or more languages, consisting of unrelated texts, yet created in a comparable manner, parallel language structures and phenomena like MWEs can be identified if the appropriate tools are employed. With the Aranea corpora, the “bilingual sketch” functionality of the Sketch Engine is one such tool which provides a new approach for analyses of similarities of (or differences between) collocation profiles (word sketches) for words and their translation equivalents.
Chapters in this book
- Prelim pages i
- Table of contents v
- Foreword vii
- Introduction 1
- Monocollocable words 9
- Translation asymmetries of multiword expressions in machine translation 23
- German constructional phrasemes and their Russian counterparts 43
- Computational phraseology and translation studies 65
- Computational extraction of formulaic sequences from corpora 83
- Computational phraseology discovery in corpora with the mwetoolkit 111
- Multiword expressions in comparable corpora 135
- Collecting collocations from general and specialised corpora 151
- What matters more: The size of the corpora or their quality? 177
- Statistical significance for measures of collocation strength 189
- Verbal collocations and pronominalisation 207
- Empirical variability of Italian multiword expressions as a useful feature for their categorisation 225
- Too big to fail but big enough to pay for their mistakes 247
- Multi-word patterns and networks 273
- How context determines meaning 297
- Detecting semantic difference 311
- Index 325
Chapters in this book
- Prelim pages i
- Table of contents v
- Foreword vii
- Introduction 1
- Monocollocable words 9
- Translation asymmetries of multiword expressions in machine translation 23
- German constructional phrasemes and their Russian counterparts 43
- Computational phraseology and translation studies 65
- Computational extraction of formulaic sequences from corpora 83
- Computational phraseology discovery in corpora with the mwetoolkit 111
- Multiword expressions in comparable corpora 135
- Collecting collocations from general and specialised corpora 151
- What matters more: The size of the corpora or their quality? 177
- Statistical significance for measures of collocation strength 189
- Verbal collocations and pronominalisation 207
- Empirical variability of Italian multiword expressions as a useful feature for their categorisation 225
- Too big to fail but big enough to pay for their mistakes 247
- Multi-word patterns and networks 273
- How context determines meaning 297
- Detecting semantic difference 311
- Index 325