Detection of synonymy links between terms
-
Thierry Hamon
Abstract
This paper presents a new approach for the evaluation of the detection of synonymy relations between terms. Our goal is to help the terminology structuring. This approach exploits the synonymy relationships which have been extracted from lexical resources to infer synonymy links between complex terms. The inferred links are then validated by an human expert in the context of a terminological application. In a previous evaluation on documents dealing with electric power plant, the expert has underlined that the most important point is to increase the recall even if the precision is low and if some links are mistyped. This paper reports new experiments which help to understand how this synonymy detection approach is to be used. Various lexical resources — from general language dictionary to very specialized semantic information — are exploited and compared as bootstrapping knowledge. Results show the complementary of the different sources.
The first evaluation relied on traditional recall and precision measures. However, those scores do not reflect the usefulness of the inferred links for the terminology structuring. From the terminologist’s point of view, erroneous links are quick to eliminated. They may even suggest good ones. Above all, the system points out relations between terms which are generally not found manually. We thus aim at proposing a new evaluation criteria which better reflects the expert’s and terminologist’s point of view in the application context. This score points out the quality of the results and the validation cost rather than the proportion of validated links. We have designed an evaluation score which takes into account the productivity of the dictionary links. It can be viewed as a normalization of the precision.
Abstract
This paper presents a new approach for the evaluation of the detection of synonymy relations between terms. Our goal is to help the terminology structuring. This approach exploits the synonymy relationships which have been extracted from lexical resources to infer synonymy links between complex terms. The inferred links are then validated by an human expert in the context of a terminological application. In a previous evaluation on documents dealing with electric power plant, the expert has underlined that the most important point is to increase the recall even if the precision is low and if some links are mistyped. This paper reports new experiments which help to understand how this synonymy detection approach is to be used. Various lexical resources — from general language dictionary to very specialized semantic information — are exploited and compared as bootstrapping knowledge. Results show the complementary of the different sources.
The first evaluation relied on traditional recall and precision measures. However, those scores do not reflect the usefulness of the inferred links for the terminology structuring. From the terminologist’s point of view, erroneous links are quick to eliminated. They may even suggest good ones. Above all, the system points out relations between terms which are generally not found manually. We thus aim at proposing a new evaluation criteria which better reflects the expert’s and terminologist’s point of view in the application context. This score points out the quality of the results and the validation cost rather than the proportion of validated links. We have designed an evaluation score which takes into account the productivity of the dictionary links. It can be viewed as a normalization of the precision.
Chapters in this book
- Prelim pages i
- Table of contents vi
- Introduction viii
- A graph-based approach to the automatic generation of multilingual keyword clusters 1
- The automatic construction of faceted terminological feedback for interactive document retrieval 29
- Automatic term detection 53
- Incremental extraction of domain-specific terms from online text resources 89
- Knowledge-based terminology management in medicine 111
- Searching for and identifying conceptual relationships via a corpus-based approach to a Terminological Knowledge Base (CTKB) 127
- Qualitative terminology extraction 149
- General considerations on bilingual terminology extraction 167
- Detection of synonymy links between terms 185
- Extracting useful terms from parenthetical expressions by combining simple rules and statistical measures 209
- Software tools to support the construction of bilingual terminology lexicons 225
- Determining semantic equivalence of terms in information retrieval 245
- Term extraction using a similarity-based approach 261
- Extracting knowledge-rich contexts for terminography 279
- Experimental evaluation of ranking and selection methods in term extraction 303
- Corpus-based extension of a terminological semantic lexicon 327
- Term extraction for automatic abstracting 353
- About the contributors 371
- Subject Index 377
Chapters in this book
- Prelim pages i
- Table of contents vi
- Introduction viii
- A graph-based approach to the automatic generation of multilingual keyword clusters 1
- The automatic construction of faceted terminological feedback for interactive document retrieval 29
- Automatic term detection 53
- Incremental extraction of domain-specific terms from online text resources 89
- Knowledge-based terminology management in medicine 111
- Searching for and identifying conceptual relationships via a corpus-based approach to a Terminological Knowledge Base (CTKB) 127
- Qualitative terminology extraction 149
- General considerations on bilingual terminology extraction 167
- Detection of synonymy links between terms 185
- Extracting useful terms from parenthetical expressions by combining simple rules and statistical measures 209
- Software tools to support the construction of bilingual terminology lexicons 225
- Determining semantic equivalence of terms in information retrieval 245
- Term extraction using a similarity-based approach 261
- Extracting knowledge-rich contexts for terminography 279
- Experimental evaluation of ranking and selection methods in term extraction 303
- Corpus-based extension of a terminological semantic lexicon 327
- Term extraction for automatic abstracting 353
- About the contributors 371
- Subject Index 377