Chapter 3. Evaluating the Italian-English machine translation quality of MWUs in the domain of archaeology
-
Giulia Speranza
Abstract
Multiword units (MWUs) represent a challenging and problematic linguistic issue in the field of Natural Language Processing (NLP) due to their idiosyncratic nature. This paper investigates the quality of Neural Machine Translation (NMT) outputs when dealing with MWUs in the domain of archaeology. As a case study, a dataset of 100 MWUs is used as a Gold Standard to evaluate out-of-context and in-context translation outputs from three state-of-the-art NMT systems for the Italian-English language pair: Google Translate, DeepL, and Microsoft Bing Translator. MT outputs are manually evaluated with reference to the Gold Standard, namely out-of-context and in-context human English translations of the selected 100 MWUs. Results show that terminology is still a problematic category for MT quality and that MWUs translation may vary, and sometimes even improve, when further context is provided.
Abstract
Multiword units (MWUs) represent a challenging and problematic linguistic issue in the field of Natural Language Processing (NLP) due to their idiosyncratic nature. This paper investigates the quality of Neural Machine Translation (NMT) outputs when dealing with MWUs in the domain of archaeology. As a case study, a dataset of 100 MWUs is used as a Gold Standard to evaluate out-of-context and in-context translation outputs from three state-of-the-art NMT systems for the Italian-English language pair: Google Translate, DeepL, and Microsoft Bing Translator. MT outputs are manually evaluated with reference to the Gold Standard, namely out-of-context and in-context human English translations of the selected 100 MWUs. Results show that terminology is still a problematic category for MT quality and that MWUs translation may vary, and sometimes even improve, when further context is provided.
Chapters in this book
- Prelim pages i
- Table of contents v
- Preface vii
-
Section 1. Computational treatment of multiword units
- Chapter 1. Multi-word units in neural machine translation 2
- Chapter 2. ReGap 18
- Chapter 3. Evaluating the Italian-English machine translation quality of MWUs in the domain of archaeology 40
- Chapter 4. Post-editing neural machine translation in specialised languages 57
- Chapter 5. Evaluating a bracketing protocol for multiword terms 79
-
Section 2. Corpus-based and linguistic studies in phraseology
- Chapter 6. Suggestions for a new model of functional phraseme categorization for applied purposes 104
- Chapter 7. Verb collocations and their semantics in the specialized language of science 124
- Chapter 8. Negative–positive adjective pairing in travel journalism in English, Italian, and Polish 141
- Chapter 9. The middle construction and some machine translation issues 156
- Chapter 10. Semantic annotation of named rivers and its application for the prediction of multiword-term bracketing 173
- Chapter 11. Irony in American-English tweets 197
- Chapter 12. A comprehensive Japanese MWE lexicon 218
- Chapter 13. Ontology-based formalisation of Italian clitic verbal MWEs 243
- Index 263
Chapters in this book
- Prelim pages i
- Table of contents v
- Preface vii
-
Section 1. Computational treatment of multiword units
- Chapter 1. Multi-word units in neural machine translation 2
- Chapter 2. ReGap 18
- Chapter 3. Evaluating the Italian-English machine translation quality of MWUs in the domain of archaeology 40
- Chapter 4. Post-editing neural machine translation in specialised languages 57
- Chapter 5. Evaluating a bracketing protocol for multiword terms 79
-
Section 2. Corpus-based and linguistic studies in phraseology
- Chapter 6. Suggestions for a new model of functional phraseme categorization for applied purposes 104
- Chapter 7. Verb collocations and their semantics in the specialized language of science 124
- Chapter 8. Negative–positive adjective pairing in travel journalism in English, Italian, and Polish 141
- Chapter 9. The middle construction and some machine translation issues 156
- Chapter 10. Semantic annotation of named rivers and its application for the prediction of multiword-term bracketing 173
- Chapter 11. Irony in American-English tweets 197
- Chapter 12. A comprehensive Japanese MWE lexicon 218
- Chapter 13. Ontology-based formalisation of Italian clitic verbal MWEs 243
- Index 263