Startseite Linguistik & Semiotik Determining semantic equivalence of terms in information retrieval
Kapitel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Determining semantic equivalence of terms in information retrieval

An approach based on context distance and morphology
  • Hongyan Jing und Evelyne Tzoukermann
Weitere Titel anzeigen von John Benjamins Publishing Company

Abstract

An important issue in Information Retrieval is determining the semantic equivalence between terms in a query and terms in a document. We propose an approach based on context distance and morphology. Context distance is a measure we use to assess the closeness of word meanings. This context distance model compares the similarity of the contexts where a word appears, using the local document information and the global lexical co-occurrence information derived from the entire set of documents to be retrieved. We integrate this context distance model with morphological analysis in determining semantic equivalence of terms so that the two operations can enhance each other. Using the standard vector-space model, we evaluated the proposed method on a subset of TREC-4 corpus (AP88 and AP90 collection, 158,240 documents, 49 queries). Results show that this method improves the 11-point average precision by 8.6%.

Abstract

An important issue in Information Retrieval is determining the semantic equivalence between terms in a query and terms in a document. We propose an approach based on context distance and morphology. Context distance is a measure we use to assess the closeness of word meanings. This context distance model compares the similarity of the contexts where a word appears, using the local document information and the global lexical co-occurrence information derived from the entire set of documents to be retrieved. We integrate this context distance model with morphological analysis in determining semantic equivalence of terms so that the two operations can enhance each other. Using the standard vector-space model, we evaluated the proposed method on a subset of TREC-4 corpus (AP88 and AP90 collection, 158,240 documents, 49 queries). Results show that this method improves the 11-point average precision by 8.6%.

Heruntergeladen am 15.9.2025 von https://www.degruyterbrill.com/document/doi/10.1075/nlp.2.13jin/html
Button zum nach oben scrollen