Computational phraseology and translation studies
-
Jean-Pierre Colson
Abstract
The notion of phraseology is now used across a wide range of linguistic disciplines but it is conspicuously absent from most studies in the area of Translation Studies (e.g. Delisle, 2003; Baker and Saldanha, 2011). The paradox is that many practical difficulties encountered by translators and interpreters are directly related to phraseology in the broad sense (Colson, 2008, 2013), and this can also clearly be seen in the failure of machine translation systems to deal efficiently with the translation of phraseological units (PUs).
We argue that phraseology and translation studies have much to gain from cross fertilisation, because both disciplines are regularly criticised for their lack of coherent terminological description and for the insufficient number of reproducible experiments they involve.
Decoding phraseology in the source text is far from easy for translators and interpreters, all the more so as they are usually not native speakers of the source language. Finding a natural formulation in the target language and avoiding translationese requires an excellent mastery of the phraseology of the target language. Even experienced professionals sometimes fail to detect the fixed or semi-fixed character of a source text construction. We argue that algorithms derived from text mining and information retrieval techniques can be efficient and (computationally) cost-effective in order to build up unfiltered collections of recurrent fixed or semi-fixed phrases, from which translators could gain information about the number of PUs in the source text. Such an algorithm has been proposed in Colson (2016) and has been implemented in a web application enabling translators and language professionals to automatically retrieve most PUs from a source text. Other tools should be developed in order to bridge the gap between the findings of computational phraseology and the practice of translation and interpreting.
Abstract
The notion of phraseology is now used across a wide range of linguistic disciplines but it is conspicuously absent from most studies in the area of Translation Studies (e.g. Delisle, 2003; Baker and Saldanha, 2011). The paradox is that many practical difficulties encountered by translators and interpreters are directly related to phraseology in the broad sense (Colson, 2008, 2013), and this can also clearly be seen in the failure of machine translation systems to deal efficiently with the translation of phraseological units (PUs).
We argue that phraseology and translation studies have much to gain from cross fertilisation, because both disciplines are regularly criticised for their lack of coherent terminological description and for the insufficient number of reproducible experiments they involve.
Decoding phraseology in the source text is far from easy for translators and interpreters, all the more so as they are usually not native speakers of the source language. Finding a natural formulation in the target language and avoiding translationese requires an excellent mastery of the phraseology of the target language. Even experienced professionals sometimes fail to detect the fixed or semi-fixed character of a source text construction. We argue that algorithms derived from text mining and information retrieval techniques can be efficient and (computationally) cost-effective in order to build up unfiltered collections of recurrent fixed or semi-fixed phrases, from which translators could gain information about the number of PUs in the source text. Such an algorithm has been proposed in Colson (2016) and has been implemented in a web application enabling translators and language professionals to automatically retrieve most PUs from a source text. Other tools should be developed in order to bridge the gap between the findings of computational phraseology and the practice of translation and interpreting.
Chapters in this book
- Prelim pages i
- Table of contents v
- Foreword vii
- Introduction 1
- Monocollocable words 9
- Translation asymmetries of multiword expressions in machine translation 23
- German constructional phrasemes and their Russian counterparts 43
- Computational phraseology and translation studies 65
- Computational extraction of formulaic sequences from corpora 83
- Computational phraseology discovery in corpora with the mwetoolkit 111
- Multiword expressions in comparable corpora 135
- Collecting collocations from general and specialised corpora 151
- What matters more: The size of the corpora or their quality? 177
- Statistical significance for measures of collocation strength 189
- Verbal collocations and pronominalisation 207
- Empirical variability of Italian multiword expressions as a useful feature for their categorisation 225
- Too big to fail but big enough to pay for their mistakes 247
- Multi-word patterns and networks 273
- How context determines meaning 297
- Detecting semantic difference 311
- Index 325
Chapters in this book
- Prelim pages i
- Table of contents v
- Foreword vii
- Introduction 1
- Monocollocable words 9
- Translation asymmetries of multiword expressions in machine translation 23
- German constructional phrasemes and their Russian counterparts 43
- Computational phraseology and translation studies 65
- Computational extraction of formulaic sequences from corpora 83
- Computational phraseology discovery in corpora with the mwetoolkit 111
- Multiword expressions in comparable corpora 135
- Collecting collocations from general and specialised corpora 151
- What matters more: The size of the corpora or their quality? 177
- Statistical significance for measures of collocation strength 189
- Verbal collocations and pronominalisation 207
- Empirical variability of Italian multiword expressions as a useful feature for their categorisation 225
- Too big to fail but big enough to pay for their mistakes 247
- Multi-word patterns and networks 273
- How context determines meaning 297
- Detecting semantic difference 311
- Index 325