Startseite Linguistik & Semiotik Discovering bilingual collocations in parallel corpora
Kapitel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Discovering bilingual collocations in parallel corpora

A first attempt at using distributional semantics
  • Marcos Garcia , Marcos García-Salido und Margarita Alonso-Ramos
Weitere Titel anzeigen von John Benjamins Publishing Company

Abstract

This chapter presents a method that exploits parallel corpora to automatically extract bilingual collocation equivalents. First, we use dependency parsing and statistical measures to identify collocation candidates in corpora. Then, we leverage the parallel corpora to extract bilingual word-embeddings. Finally, we use these distributional models as probabilistic dictionaries in order to identify bilingual collocation equivalents. To evaluate our strategy we carry out a set of experiments in Portuguese and Spanish focusing on verb-object collocations, for example, “reach the maturity” (“atingir a maturidade” in Portuguese, “alcanzar la madurez” in Spanish). The results of our experiments show that this method is useful to automatically identify thousands of bilingual collocation equivalents, achieving a precision of 86%.

Abstract

This chapter presents a method that exploits parallel corpora to automatically extract bilingual collocation equivalents. First, we use dependency parsing and statistical measures to identify collocation candidates in corpora. Then, we leverage the parallel corpora to extract bilingual word-embeddings. Finally, we use these distributional models as probabilistic dictionaries in order to identify bilingual collocation equivalents. To evaluate our strategy we carry out a set of experiments in Portuguese and Spanish focusing on verb-object collocations, for example, “reach the maturity” (“atingir a maturidade” in Portuguese, “alcanzar la madurez” in Spanish). The results of our experiments show that this method is useful to automatically identify thousands of bilingual collocation equivalents, achieving a precision of 86%.

Heruntergeladen am 2.10.2025 von https://www.degruyterbrill.com/document/doi/10.1075/scl.90.16gon/html
Button zum nach oben scrollen