John Benjamins Publishing Company
Hybrid approaches for automatic segmentation and annotation of a Chinese text corpus
Abstract
This paper describes the hybrid approaches for automatic segmentation and annotation of a Chinese text corpus. Some experiment results are given. Hybrid approaches combine the rule-based method, the statistic-based method, and the automatic learning method. It is a good approach, and it can obviously improve the precision of segmentation and annotation of a Chinese text corpus.
Abstract
This paper describes the hybrid approaches for automatic segmentation and annotation of a Chinese text corpus. Some experiment results are given. Hybrid approaches combine the rule-based method, the statistic-based method, and the automatic learning method. It is a good approach, and it can obviously improve the precision of segmentation and annotation of a Chinese text corpus.
Chapters in this book
- Prelim pages i
- Table of contents v
- Preface vii
- Automatic extraction of terminological translation lexicon from Czech-English parallel texts 1
- Words from Bononia Legal Corpus 11
- Hybrid approaches for automatic segmentation and annotation of a Chinese text corpus 31
- Distance between languages as measured by the minimal-entropy model 39
- The importance of the syntagmatic dimension in the multilingual lexical database 49
- Compiling parallel text corpora 59
- Data-derived multilingual lexicons 69
- Bridge dictionaries as bridges between languages 83
- Procedures in building the Croatian-English parallel corpus 93
- Corpus linguistics and lexicography* 109
- Analysing the fluency of translators 135
- Equivalence and non-equivalence in parallel corpora* 147
- Index 157
Chapters in this book
- Prelim pages i
- Table of contents v
- Preface vii
- Automatic extraction of terminological translation lexicon from Czech-English parallel texts 1
- Words from Bononia Legal Corpus 11
- Hybrid approaches for automatic segmentation and annotation of a Chinese text corpus 31
- Distance between languages as measured by the minimal-entropy model 39
- The importance of the syntagmatic dimension in the multilingual lexical database 49
- Compiling parallel text corpora 59
- Data-derived multilingual lexicons 69
- Bridge dictionaries as bridges between languages 83
- Procedures in building the Croatian-English parallel corpus 93
- Corpus linguistics and lexicography* 109
- Analysing the fluency of translators 135
- Equivalence and non-equivalence in parallel corpora* 147
- Index 157