Corpus-based extension of a terminological semantic lexicon
-
A. Nazarenko
Abstract
This paper addresses the problem of extending and tuning a terminological semantic lexicon to new domains and corpora. We argue that by relying on both a sublanguage corpus and a core semantic lexicon, it is possible to give an adequate description of the words that occur in the corpus. Our tuning method explores the corpus and gathers words that are likely to have similar meanings on the basis of their dependency relationships in the corpus. The aim of the present work is to assess the potential for classifying words based on the semantic categories of “neighbors”. The tagging procedure is tested and parameterized on a rather small French corpus dealing with coronary diseases (85,000 word units). This method is systematically evaluated by creating and categorizing artificial unknown words. Although word semantic categorization cannot be fully automated, the results show that our tagging procedure is a valuable help to account for new words and new word uses in a sublanguage.
Abstract
This paper addresses the problem of extending and tuning a terminological semantic lexicon to new domains and corpora. We argue that by relying on both a sublanguage corpus and a core semantic lexicon, it is possible to give an adequate description of the words that occur in the corpus. Our tuning method explores the corpus and gathers words that are likely to have similar meanings on the basis of their dependency relationships in the corpus. The aim of the present work is to assess the potential for classifying words based on the semantic categories of “neighbors”. The tagging procedure is tested and parameterized on a rather small French corpus dealing with coronary diseases (85,000 word units). This method is systematically evaluated by creating and categorizing artificial unknown words. Although word semantic categorization cannot be fully automated, the results show that our tagging procedure is a valuable help to account for new words and new word uses in a sublanguage.
Chapters in this book
- Prelim pages i
- Table of contents vi
- Introduction viii
- A graph-based approach to the automatic generation of multilingual keyword clusters 1
- The automatic construction of faceted terminological feedback for interactive document retrieval 29
- Automatic term detection 53
- Incremental extraction of domain-specific terms from online text resources 89
- Knowledge-based terminology management in medicine 111
- Searching for and identifying conceptual relationships via a corpus-based approach to a Terminological Knowledge Base (CTKB) 127
- Qualitative terminology extraction 149
- General considerations on bilingual terminology extraction 167
- Detection of synonymy links between terms 185
- Extracting useful terms from parenthetical expressions by combining simple rules and statistical measures 209
- Software tools to support the construction of bilingual terminology lexicons 225
- Determining semantic equivalence of terms in information retrieval 245
- Term extraction using a similarity-based approach 261
- Extracting knowledge-rich contexts for terminography 279
- Experimental evaluation of ranking and selection methods in term extraction 303
- Corpus-based extension of a terminological semantic lexicon 327
- Term extraction for automatic abstracting 353
- About the contributors 371
- Subject Index 377
Chapters in this book
- Prelim pages i
- Table of contents vi
- Introduction viii
- A graph-based approach to the automatic generation of multilingual keyword clusters 1
- The automatic construction of faceted terminological feedback for interactive document retrieval 29
- Automatic term detection 53
- Incremental extraction of domain-specific terms from online text resources 89
- Knowledge-based terminology management in medicine 111
- Searching for and identifying conceptual relationships via a corpus-based approach to a Terminological Knowledge Base (CTKB) 127
- Qualitative terminology extraction 149
- General considerations on bilingual terminology extraction 167
- Detection of synonymy links between terms 185
- Extracting useful terms from parenthetical expressions by combining simple rules and statistical measures 209
- Software tools to support the construction of bilingual terminology lexicons 225
- Determining semantic equivalence of terms in information retrieval 245
- Term extraction using a similarity-based approach 261
- Extracting knowledge-rich contexts for terminography 279
- Experimental evaluation of ranking and selection methods in term extraction 303
- Corpus-based extension of a terminological semantic lexicon 327
- Term extraction for automatic abstracting 353
- About the contributors 371
- Subject Index 377