Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues
-
Izaskun Aldezabal
, Maria Jesus Aranzabe , Jose Mari Arriola und Arantza Diaz de Ilarraza
Abstract
In this paper, we will describe some theoretical and practical issues raised during the construction of the Basque Dependency Treebank (BDT): the syntactic annotation of EPEC (Reference Corpus for the Processing of Basque). EPEC is a 300,000 word corpus of standard written Basque whose purpose is to be a training corpus for the development and improvement of several NLP (Natural Language Processing) tools for Basque. BDT will be the first corpus for the Basque language tagged at syntactic level. We will also present the dependency-based annotation hierarchy that we have established for the syntactic tagging. Decisions made during design of the annotation hierarchy are based on the description of Basque grammar made by Euskaltzaindia (Academy for the Basque Language). When describing dependency relations, we consider lexical units as syntactic heads. This will open up a way for us to work with semantics.
© 2009 by Walter de Gruyter GmbH & Co. KG, D-10785 Berlin
Artikel in diesem Heft
- Using a Chinese treebank to measure dependency distance
- Combining corpus linguistic and psychological data on word co-occurrences: Corpus collocates versus word associations
- Does branching direction determine prominence assignment? An empirical investigation of triconstituent compounds in English
- Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues
- Contents Volume 5 (2009)
Artikel in diesem Heft
- Using a Chinese treebank to measure dependency distance
- Combining corpus linguistic and psychological data on word co-occurrences: Corpus collocates versus word associations
- Does branching direction determine prominence assignment? An empirical investigation of triconstituent compounds in English
- Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues
- Contents Volume 5 (2009)