Chapter 2. ReGap: A text-preprocessing algorithm to enhance MWE‑aware neural machine translation systems

Carlos Manuel Hidalgo-Ternero; Gloria Corpas Pastor

Chapter

Chapter 2. ReGap

A text-preprocessing algorithm to enhance MWE‑aware neural machine translation systems

Carlos Manuel Hidalgo-Ternero and Gloria Corpas Pastor

Published by

John Benjamins Publishing Company

View more publications by John Benjamins Publishing Company

This chapter is in the book Recent Advances in Multiword Units in Machine Translation and Translation Technology

Abstract

This research presents ReGap, a text-preprocessing algorithm for the automatic token-based identification and conversion of discontinuous multiword expressions (MWEs) into their canonical state, i.e., their continuous form, as a means to optimise neural machine translation (NMT) systems. To this end, an experiment with flexible verb-noun idiomatic constructions (VNICs) is conducted in order to assess to what extent ReGap can enhance the performance of the most robust NMT system to date, DeepL, under the challenge of MWE discontinuity in the Spanish-into-English and the Spanish-into-German directionalities. In this regard, the promising results yielded for VNICs will shed some light on new avenues for enhancing MWE‑aware NMT systems.

You are currently not able to access this content.

Abstract

You are currently not able to access this content.

Chapters in this book

Prelim pages i
Table of contents v
Preface vii
Section 1. Computational treatment of multiword units
Chapter 1. Multi-word units in neural machine translation 2
Chapter 2. ReGap 18
Chapter 3. Evaluating the Italian-English machine translation quality of MWUs in the domain of archaeology 40
Chapter 4. Post-editing neural machine translation in specialised languages 57
Chapter 5. Evaluating a bracketing protocol for multiword terms 79
Section 2. Corpus-based and linguistic studies in phraseology
Chapter 6. Suggestions for a new model of functional phraseme categorization for applied purposes 104
Chapter 7. Verb collocations and their semantics in the specialized language of science 124
Chapter 8. Negative–positive adjective pairing in travel journalism in English, Italian, and Polish 141
Chapter 9. The middle construction and some machine translation issues 156
Chapter 10. Semantic annotation of named rivers and its application for the prediction of multiword-term bracketing 173
Chapter 11. Irony in American-English tweets 197
Chapter 12. A comprehensive Japanese MWE lexicon 218
Chapter 13. Ontology-based formalisation of Italian clitic verbal MWEs 243
Index 263

https://doi.org/10.1075/cilt.366.02hid

Chapters in this book

Prelim pages i
Table of contents v
Preface vii
Section 1. Computational treatment of multiword units
Chapter 1. Multi-word units in neural machine translation 2
Chapter 2. ReGap 18
Chapter 3. Evaluating the Italian-English machine translation quality of MWUs in the domain of archaeology 40
Chapter 4. Post-editing neural machine translation in specialised languages 57
Chapter 5. Evaluating a bracketing protocol for multiword terms 79
Section 2. Corpus-based and linguistic studies in phraseology
Chapter 6. Suggestions for a new model of functional phraseme categorization for applied purposes 104
Chapter 7. Verb collocations and their semantics in the specialized language of science 124
Chapter 8. Negative–positive adjective pairing in travel journalism in English, Italian, and Polish 141
Chapter 9. The middle construction and some machine translation issues 156
Chapter 10. Semantic annotation of named rivers and its application for the prediction of multiword-term bracketing 173
Chapter 11. Irony in American-English tweets 197
Chapter 12. A comprehensive Japanese MWE lexicon 218
Chapter 13. Ontology-based formalisation of Italian clitic verbal MWEs 243
Index 263

Chapter 2. ReGap

Abstract

Chapter PDF View

Abstract

Chapters in this book

Chapters in this book

Chapters in this book