Chapter 1. Multi-word units in neural machine translation
-
Jean-Pierre Colson
Abstract
Neural machine translation (NMT) has recently made significant progress in improving the quality of the texts it produces. New features of NMT include the fluidity of translations and the successful handling of multi-word units. In this paper we first report the results of an automated evaluation of the percentage of phraseology in the translations produced by Google Translate and DeepL. A corpus-based approach makes it possible to estimate that both NMT systems succeed in producing an average percentage of phraseology that is quite reasonable and sometimes even higher than in natural language production by native speakers. However, a closer look at some problematic cases shows that the ability of NMT systems to treat phraseological units can be deceptive, as they are often unable to cope with contextual complexity and low-frequency idioms.
Abstract
Neural machine translation (NMT) has recently made significant progress in improving the quality of the texts it produces. New features of NMT include the fluidity of translations and the successful handling of multi-word units. In this paper we first report the results of an automated evaluation of the percentage of phraseology in the translations produced by Google Translate and DeepL. A corpus-based approach makes it possible to estimate that both NMT systems succeed in producing an average percentage of phraseology that is quite reasonable and sometimes even higher than in natural language production by native speakers. However, a closer look at some problematic cases shows that the ability of NMT systems to treat phraseological units can be deceptive, as they are often unable to cope with contextual complexity and low-frequency idioms.
Chapters in this book
- Prelim pages i
- Table of contents v
- Preface vii
-
Section 1. Computational treatment of multiword units
- Chapter 1. Multi-word units in neural machine translation 2
- Chapter 2. ReGap 18
- Chapter 3. Evaluating the Italian-English machine translation quality of MWUs in the domain of archaeology 40
- Chapter 4. Post-editing neural machine translation in specialised languages 57
- Chapter 5. Evaluating a bracketing protocol for multiword terms 79
-
Section 2. Corpus-based and linguistic studies in phraseology
- Chapter 6. Suggestions for a new model of functional phraseme categorization for applied purposes 104
- Chapter 7. Verb collocations and their semantics in the specialized language of science 124
- Chapter 8. Negative–positive adjective pairing in travel journalism in English, Italian, and Polish 141
- Chapter 9. The middle construction and some machine translation issues 156
- Chapter 10. Semantic annotation of named rivers and its application for the prediction of multiword-term bracketing 173
- Chapter 11. Irony in American-English tweets 197
- Chapter 12. A comprehensive Japanese MWE lexicon 218
- Chapter 13. Ontology-based formalisation of Italian clitic verbal MWEs 243
- Index 263
Chapters in this book
- Prelim pages i
- Table of contents v
- Preface vii
-
Section 1. Computational treatment of multiword units
- Chapter 1. Multi-word units in neural machine translation 2
- Chapter 2. ReGap 18
- Chapter 3. Evaluating the Italian-English machine translation quality of MWUs in the domain of archaeology 40
- Chapter 4. Post-editing neural machine translation in specialised languages 57
- Chapter 5. Evaluating a bracketing protocol for multiword terms 79
-
Section 2. Corpus-based and linguistic studies in phraseology
- Chapter 6. Suggestions for a new model of functional phraseme categorization for applied purposes 104
- Chapter 7. Verb collocations and their semantics in the specialized language of science 124
- Chapter 8. Negative–positive adjective pairing in travel journalism in English, Italian, and Polish 141
- Chapter 9. The middle construction and some machine translation issues 156
- Chapter 10. Semantic annotation of named rivers and its application for the prediction of multiword-term bracketing 173
- Chapter 11. Irony in American-English tweets 197
- Chapter 12. A comprehensive Japanese MWE lexicon 218
- Chapter 13. Ontology-based formalisation of Italian clitic verbal MWEs 243
- Index 263