Home Linguistics & Semiotics Chapter 11. Word alignment in the Russian-Chinese parallel corpus
Chapter
Licensed
Unlicensed Requires Authentication

Chapter 11. Word alignment in the Russian-Chinese parallel corpus

  • Anastasia Politova , Olga Bonetskaya , Dmitry Dolgov , Maria Frolova and Anna Pyrkova
View more publications by John Benjamins Publishing Company
Corpus Use in Cross-linguistic Research
This chapter is in the book Corpus Use in Cross-linguistic Research

Abstract

The Russian-Chinese parallel corpus (RuZhCorp) was created in 2016 by sinologists and computational linguists. So far, it has accumulated 1 074 texts and over 4.6 million words that are aligned on a sentence level. To produce word alignment for the entire corpus, we used deep neural networks trained both on the whole RuZhCorp and on a manually aligned at a word level gold dataset. Using the principles presented in previous publications, we compiled the first word-to-word alignment guideline for the Russian-Chinese language pair, which makes the manual alignment process less ambiguous and more consistent. The joint fine-tuning of the LaBSE deep learning model on RuZhCorp and the gold dataset achieved the best AER of 18.9%.

Abstract

The Russian-Chinese parallel corpus (RuZhCorp) was created in 2016 by sinologists and computational linguists. So far, it has accumulated 1 074 texts and over 4.6 million words that are aligned on a sentence level. To produce word alignment for the entire corpus, we used deep neural networks trained both on the whole RuZhCorp and on a manually aligned at a word level gold dataset. Using the principles presented in previous publications, we compiled the first word-to-word alignment guideline for the Russian-Chinese language pair, which makes the manual alignment process less ambiguous and more consistent. The joint fine-tuning of the LaBSE deep learning model on RuZhCorp and the gold dataset achieved the best AER of 18.9%.

Downloaded on 13.9.2025 from https://www.degruyterbrill.com/document/doi/10.1075/scl.113.11pol/html
Scroll to top button