Startseite Allgemein Chapter 2. ReGap
Kapitel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Chapter 2. ReGap

A text-preprocessing algorithm to enhance MWE‑aware neural machine translation systems
  • Carlos Manuel Hidalgo-Ternero und Gloria Corpas Pastor
Weitere Titel anzeigen von John Benjamins Publishing Company

Abstract

This research presents ReGap, a text-preprocessing algorithm for the automatic token-based identification and conversion of discontinuous multiword expressions (MWEs) into their canonical state, i.e., their continuous form, as a means to optimise neural machine translation (NMT) systems. To this end, an experiment with flexible verb-noun idiomatic constructions (VNICs) is conducted in order to assess to what extent ReGap can enhance the performance of the most robust NMT system to date, DeepL, under the challenge of MWE discontinuity in the Spanish-into-English and the Spanish-into-German directionalities. In this regard, the promising results yielded for VNICs will shed some light on new avenues for enhancing MWE‑aware NMT systems.

Abstract

This research presents ReGap, a text-preprocessing algorithm for the automatic token-based identification and conversion of discontinuous multiword expressions (MWEs) into their canonical state, i.e., their continuous form, as a means to optimise neural machine translation (NMT) systems. To this end, an experiment with flexible verb-noun idiomatic constructions (VNICs) is conducted in order to assess to what extent ReGap can enhance the performance of the most robust NMT system to date, DeepL, under the challenge of MWE discontinuity in the Spanish-into-English and the Spanish-into-German directionalities. In this regard, the promising results yielded for VNICs will shed some light on new avenues for enhancing MWE‑aware NMT systems.

Heruntergeladen am 29.12.2025 von https://www.degruyterbrill.com/document/doi/10.1075/cilt.366.02hid/html?lang=de
Button zum nach oben scrollen