Chapter 11. Word alignment in the Russian-Chinese parallel corpus

Anastasia Politova; Olga Bonetskaya; Dmitry Dolgov; Maria Frolova; Anna Pyrkova

Presented to you through Paradigm Publishing Services

John Benjamins Publishing Company

Visit our Partner Page See all our books

Chapter

Chapter 11. Word alignment in the Russian-Chinese parallel corpus

, , , and

Abstract

The Russian-Chinese parallel corpus (RuZhCorp) was created in 2016 by sinologists and computational linguists. So far, it has accumulated 1 074 texts and over 4.6 million words that are aligned on a sentence level. To produce word alignment for the entire corpus, we used deep neural networks trained both on the whole RuZhCorp and on a manually aligned at a word level gold dataset. Using the principles presented in previous publications, we compiled the first word-to-word alignment guideline for the Russian-Chinese language pair, which makes the manual alignment process less ambiguous and more consistent. The joint fine-tuning of the LaBSE deep learning model on RuZhCorp and the gold dataset achieved the best AER of 18.9%.

You are currently not able to access this content.

Abstract

You are currently not able to access this content.

Chapters in this book

This chapter is in the book Corpus Use in Cross-linguistic Research

https://doi.org/10.1075/scl.113.11pol

John Benjamins Publishing Company

Chapter 11. Word alignment in the Russian-Chinese parallel corpus

Abstract

Chapter PDF View

Abstract

Chapters in this book

Chapters in this book

Chapters in this book