Term extraction using a similarity-based approach
-
Diana Maynard
and Sophia Ananiadou
Abstract
Traditional methods of multi-word term extraction have used hybrid methods combining linguistic and statistical information. The linguistic part of these applications is often underexploited and consists of very shallow knowledge in the form of a simple syntactic filter. In most cases no interpretation of terms is undertaken and recognition does not involve distinguishing between different senses of terms, although ambiguity can be a serious problem for applications such as ontology building and machine translation. The approach described uses both statistical and linguistic information, combining syntax and semantics to identify, rank and disambiguate terms. We describe a new thesaurus-based similarity measure, which uses semantic information to calculate the importance of different parts of the context in relation to the term. Results show that making use of semantic information is beneficial for both theoretical and practical aspects of terminology.
Abstract
Traditional methods of multi-word term extraction have used hybrid methods combining linguistic and statistical information. The linguistic part of these applications is often underexploited and consists of very shallow knowledge in the form of a simple syntactic filter. In most cases no interpretation of terms is undertaken and recognition does not involve distinguishing between different senses of terms, although ambiguity can be a serious problem for applications such as ontology building and machine translation. The approach described uses both statistical and linguistic information, combining syntax and semantics to identify, rank and disambiguate terms. We describe a new thesaurus-based similarity measure, which uses semantic information to calculate the importance of different parts of the context in relation to the term. Results show that making use of semantic information is beneficial for both theoretical and practical aspects of terminology.
Chapters in this book
- Prelim pages i
- Table of contents vi
- Introduction viii
- A graph-based approach to the automatic generation of multilingual keyword clusters 1
- The automatic construction of faceted terminological feedback for interactive document retrieval 29
- Automatic term detection 53
- Incremental extraction of domain-specific terms from online text resources 89
- Knowledge-based terminology management in medicine 111
- Searching for and identifying conceptual relationships via a corpus-based approach to a Terminological Knowledge Base (CTKB) 127
- Qualitative terminology extraction 149
- General considerations on bilingual terminology extraction 167
- Detection of synonymy links between terms 185
- Extracting useful terms from parenthetical expressions by combining simple rules and statistical measures 209
- Software tools to support the construction of bilingual terminology lexicons 225
- Determining semantic equivalence of terms in information retrieval 245
- Term extraction using a similarity-based approach 261
- Extracting knowledge-rich contexts for terminography 279
- Experimental evaluation of ranking and selection methods in term extraction 303
- Corpus-based extension of a terminological semantic lexicon 327
- Term extraction for automatic abstracting 353
- About the contributors 371
- Subject Index 377
Chapters in this book
- Prelim pages i
- Table of contents vi
- Introduction viii
- A graph-based approach to the automatic generation of multilingual keyword clusters 1
- The automatic construction of faceted terminological feedback for interactive document retrieval 29
- Automatic term detection 53
- Incremental extraction of domain-specific terms from online text resources 89
- Knowledge-based terminology management in medicine 111
- Searching for and identifying conceptual relationships via a corpus-based approach to a Terminological Knowledge Base (CTKB) 127
- Qualitative terminology extraction 149
- General considerations on bilingual terminology extraction 167
- Detection of synonymy links between terms 185
- Extracting useful terms from parenthetical expressions by combining simple rules and statistical measures 209
- Software tools to support the construction of bilingual terminology lexicons 225
- Determining semantic equivalence of terms in information retrieval 245
- Term extraction using a similarity-based approach 261
- Extracting knowledge-rich contexts for terminography 279
- Experimental evaluation of ranking and selection methods in term extraction 303
- Corpus-based extension of a terminological semantic lexicon 327
- Term extraction for automatic abstracting 353
- About the contributors 371
- Subject Index 377