Skip to main content
Presented to you through Paradigm Publishing Services

John Benjamins Publishing Company

Chapter
Licensed
Unlicensed Requires Authentication

Chapter 2. ReGap

A text-preprocessing algorithm to enhance MWE‑aware neural machine translation systems
  • and

Abstract

This research presents ReGap, a text-preprocessing algorithm for the automatic token-based identification and conversion of discontinuous multiword expressions (MWEs) into their canonical state, i.e., their continuous form, as a means to optimise neural machine translation (NMT) systems. To this end, an experiment with flexible verb-noun idiomatic constructions (VNICs) is conducted in order to assess to what extent ReGap can enhance the performance of the most robust NMT system to date, DeepL, under the challenge of MWE discontinuity in the Spanish-into-English and the Spanish-into-German directionalities. In this regard, the promising results yielded for VNICs will shed some light on new avenues for enhancing MWE‑aware NMT systems.

Abstract

This research presents ReGap, a text-preprocessing algorithm for the automatic token-based identification and conversion of discontinuous multiword expressions (MWEs) into their canonical state, i.e., their continuous form, as a means to optimise neural machine translation (NMT) systems. To this end, an experiment with flexible verb-noun idiomatic constructions (VNICs) is conducted in order to assess to what extent ReGap can enhance the performance of the most robust NMT system to date, DeepL, under the challenge of MWE discontinuity in the Spanish-into-English and the Spanish-into-German directionalities. In this regard, the promising results yielded for VNICs will shed some light on new avenues for enhancing MWE‑aware NMT systems.

Downloaded on 27.4.2026 from https://www.degruyterbrill.com/document/doi/10.1075/cilt.366.02hid/html?lang=en
Scroll to top button