Chapter 1. Multi-word units in neural machine translation: Why the tip of the iceberg remains problematic

Jean-Pierre Colson

Chapter

Chapter 1. Multi-word units in neural machine translation

Why the tip of the iceberg remains problematic

Jean-Pierre Colson

Published by

View more publications by John Benjamins Publishing Company

To Publisher Page

This chapter is in the book Recent Advances in Multiword Units in Machine Translation and Translation Technology

Abstract

Neural machine translation (NMT) has recently made significant progress in improving the quality of the texts it produces. New features of NMT include the fluidity of translations and the successful handling of multi-word units. In this paper we first report the results of an automated evaluation of the percentage of phraseology in the translations produced by Google Translate and DeepL. A corpus-based approach makes it possible to estimate that both NMT systems succeed in producing an average percentage of phraseology that is quite reasonable and sometimes even higher than in natural language production by native speakers. However, a closer look at some problematic cases shows that the ability of NMT systems to treat phraseological units can be deceptive, as they are often unable to cope with contextual complexity and low-frequency idioms.

You are currently not able to access this content.

Abstract

You are currently not able to access this content.

Chapters in this book

Prelim pages i
Table of contents v
Preface vii
Section 1. Computational treatment of multiword units
Chapter 1. Multi-word units in neural machine translation 2
Chapter 2. ReGap 18
Chapter 3. Evaluating the Italian-English machine translation quality of MWUs in the domain of archaeology 40
Chapter 4. Post-editing neural machine translation in specialised languages 57
Chapter 5. Evaluating a bracketing protocol for multiword terms 79
Section 2. Corpus-based and linguistic studies in phraseology
Chapter 6. Suggestions for a new model of functional phraseme categorization for applied purposes 104
Chapter 7. Verb collocations and their semantics in the specialized language of science 124
Chapter 8. Negative–positive adjective pairing in travel journalism in English, Italian, and Polish 141
Chapter 9. The middle construction and some machine translation issues 156
Chapter 10. Semantic annotation of named rivers and its application for the prediction of multiword-term bracketing 173
Chapter 11. Irony in American-English tweets 197
Chapter 12. A comprehensive Japanese MWE lexicon 218
Chapter 13. Ontology-based formalisation of Italian clitic verbal MWEs 243
Index 263

https://doi.org/10.1075/cilt.366.01col

Chapters in this book

Prelim pages i
Table of contents v
Preface vii
Section 1. Computational treatment of multiword units
Chapter 1. Multi-word units in neural machine translation 2
Chapter 2. ReGap 18
Chapter 3. Evaluating the Italian-English machine translation quality of MWUs in the domain of archaeology 40
Chapter 4. Post-editing neural machine translation in specialised languages 57
Chapter 5. Evaluating a bracketing protocol for multiword terms 79
Section 2. Corpus-based and linguistic studies in phraseology
Chapter 6. Suggestions for a new model of functional phraseme categorization for applied purposes 104
Chapter 7. Verb collocations and their semantics in the specialized language of science 124
Chapter 8. Negative–positive adjective pairing in travel journalism in English, Italian, and Polish 141
Chapter 9. The middle construction and some machine translation issues 156
Chapter 10. Semantic annotation of named rivers and its application for the prediction of multiword-term bracketing 173
Chapter 11. Irony in American-English tweets 197
Chapter 12. A comprehensive Japanese MWE lexicon 218
Chapter 13. Ontology-based formalisation of Italian clitic verbal MWEs 243
Index 263

Chapter 1. Multi-word units in neural machine translation

Abstract

Chapter PDF View

Abstract

Chapters in this book

Chapters in this book

Chapters in this book