Chapter

Aligning verb + noun collocations to improve a French-Romanian FSMT system

Amalia Todiraşcu and Mirabela Navlea

Published by

View more publications by John Benjamins Publishing Company

To Publisher Page

Multiword Units in Machine Translation and Translation Technology

This chapter is in the book Multiword Units in Machine Translation and Translation Technology

Abstract

We present several Verb + Noun collocation integration methods using linguistic information, aiming to improve the results of a French-Romanian factored statistical machine translation system (FSMT). The system uses lemmatised, tagged and sentence-aligned legal parallel corpora. Verb + Noun collocations are frequent word associations, sometimes discontinuous, related by syntactic links and with non-compositional sense (Gledhill, 2007). Our first strategy extracts collocations from monolingual corpora, using a hybrid method which combines morphosyntactic properties and frequency criteria. The second method applies a bilingual collocation dictionary to identify collocations. Both methods transform collocations into single tokens before alignment. The third method applies a specific alignment algorithm for collocations. We evaluate the influence of these collocation alignment methods on the results of the lexical alignment and of the FSMT system.

You are currently not able to access this content.

Abstract

We present several Verb + Noun collocation integration methods using linguistic information, aiming to improve the results of a French-Romanian factored statistical machine translation system (FSMT). The system uses lemmatised, tagged and sentence-aligned legal parallel corpora. Verb + Noun collocations are frequent word associations, sometimes discontinuous, related by syntactic links and with non-compositional sense (Gledhill, 2007). Our first strategy extracts collocations from monolingual corpora, using a hybrid method which combines morphosyntactic properties and frequency criteria. The second method applies a bilingual collocation dictionary to identify collocations. Both methods transform collocations into single tokens before alignment. The third method applies a specific alignment algorithm for collocations. We evaluate the influence of these collocation alignment methods on the results of the lexical alignment and of the FSMT system.

You are currently not able to access this content.

Chapters in this book

Prelim pages i
Table of contents v
About the editors vii
Multiword units in machine translation and translation technology 1
Part 1. Multiword units in machine translation
Analysing linguistic information about word combinations for a Spanish-Basque rule-based machine translation system 41
How do students cope with machine translation output of multiword units? An exploratory study 61
Aligning verb + noun collocations to improve a French-Romanian FSMT system 81
Part 2. Multiword units in multilingual NLP applications
Multiword expressions in multilingual information extraction 103
A multilingual gold standard for translation spotting of German compounds and their corresponding multiword units in English, French, Italian and Spanish 125
Dutch compound splitting for bilingual terminology extraction 147
Part 3. Identification and translation of multiword units
A flexible framework for collocation retrieval and translation from parallel and comparable corpora 165
On identification of bilingual lexical bundles for translation purposes 181
The quest for croatian idioms as multiword units 201
Corpus analysis of croatian constructions with the verb doći ‘to come’ 223
Anaphora resolution, collocations and translation 243
Index 257

https://doi.org/10.1075/cilt.341.04tod

Chapters in this book

Prelim pages i
Table of contents v
About the editors vii
Multiword units in machine translation and translation technology 1
Part 1. Multiword units in machine translation
Analysing linguistic information about word combinations for a Spanish-Basque rule-based machine translation system 41
How do students cope with machine translation output of multiword units? An exploratory study 61
Aligning verb + noun collocations to improve a French-Romanian FSMT system 81
Part 2. Multiword units in multilingual NLP applications
Multiword expressions in multilingual information extraction 103
A multilingual gold standard for translation spotting of German compounds and their corresponding multiword units in English, French, Italian and Spanish 125
Dutch compound splitting for bilingual terminology extraction 147
Part 3. Identification and translation of multiword units
A flexible framework for collocation retrieval and translation from parallel and comparable corpora 165
On identification of bilingual lexical bundles for translation purposes 181
The quest for croatian idioms as multiword units 201
Corpus analysis of croatian constructions with the verb doći ‘to come’ 223
Anaphora resolution, collocations and translation 243
Index 257

Institutional Access

How does access work?

Downloaded on 29.12.2025 from https://www.degruyterbrill.com/document/doi/10.1075/cilt.341.04tod/html