Home ESLMT: a new clustering method for biomedical document retrieval
Article
Licensed
Unlicensed Requires Authentication

ESLMT: a new clustering method for biomedical document retrieval

  • MohammadReza Keyvanpour and Fatemeh Serpush EMAIL logo
Published/Copyright: June 14, 2019

Abstract

MEDLINE is a rapidly growing database; to utilize this resource, practitioners and biomedical researchers have dealt with tedious and time-consuming tasks such as discovering, searching, reading and evaluating of biomedical documents. However, making a label for a group of biomedical documents is expensive and needs a complicated operation. Otherwise, compound words, polysemous and synonymous problems can influence the search in MEDLINE. Therefore, designing an efficient way of sharing knowledge and information organization is essential so that information retrieval systems can provide ideal outcomes. For this purpose, different strategies are used in the retrieval of biomedical documents (RBD). However, still a number of unrelated results for the users’ query are obtained in the RBD process. Studies have shown that well-defined clusters in the retrieval system exhibit a more efficient performance in contrast to the document-based retrieval. Accordingly, the present study proposes the Expanding Statistical Language Modeling and Thesaurus (ESLMT) for clustering and retrieving biomedical documents. The results showed that Clustering with ESLM Similarity and Thesaurus (CESLMST) in all those criteria in this study have a higher value than the other compared methods. The results indicated that the mean average precision (MAP) has improved in the Clusters’ Retrieval Derived from ESLM Similarity-Query (CRDESLMS-QET) method in comparison to the previous methods with the Text REtrieval Conference (TREC) data set.

Acknowledgments

This work was supported by the Islamic Azad University, Qazvin Branch Qazvin, Iran.

  1. Author Statement

  2. Research funding: Authors state no funding involved.

  3. Conflict of interest: Authors state no conflict of interest.

  4. Informed consent: Informed consent is not applicable.

  5. Ethical approval: The conducted research is not related to either human or animal use.

References

[1] Karaa WBA, Ashour AS, Sassi DB, Roy P, Kausar N, Dey N. Medline text mining: an enhancement genetic algorithm based approach for document clustering. In: Applications of Intelligent Optimization in Biology and Medicine. Springer, Cham: Springer International Publishing; 2016:267–87.10.1007/978-3-319-21212-8_12Search in Google Scholar

[2] Cestnik B, Fabbretti E, Gubiani D, Urbančič T, Lavrač N. Reducing the search space in literature-based discovery by exploring outlier documents: a case study in finding links between gut microbiome and Alzheimer’s disease. Genom Comput Biol 2017;3:e58.10.18547/gcb.2017.vol3.iss3.e58Search in Google Scholar

[3] Mishra R, Bian J, Fiszman M, Weir CR, Jonnalagadda S, Mostafa J, et al. Text summarization in the biomedical domain: a systematic review of recent research. J Biomed Inform 2014;52:457–67.10.1016/j.jbi.2014.06.009Search in Google Scholar PubMed PubMed Central

[4] Xu X, Xiaohua H. Cluster-based query expansion using language modeling in the biomedical domain. In: 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW), IEEE, 2010;18:185–8.10.1109/BIBMW.2010.5703796Search in Google Scholar

[5] Shih W, Tseng SH. A knowledge-based approach to retrieving teaching materials for context aware learning. Edu. Technol. Soc. 2009;1:82–106.Search in Google Scholar

[6] Serpush F, Keyvanpour MR. QEA: a new systematic and comprehensive classification of query expansion approaches. J Comput Robot 2014;7:1–17.Search in Google Scholar

[7] Natsev AP, Haubold A, Tešić J, Xie LY. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In: Proceedings of the 15th International Conference on Multimedia, ACM 2007:991–1000.10.1145/1291233.1291448Search in Google Scholar

[8] Feng SC, Bernstein WZ, Hedberg T, Feeney AB. Toward knowledge management for smart manufacturing. J Comput Inf Sci Eng 2017;17:031016.10.1115/1.4037178Search in Google Scholar PubMed PubMed Central

[9] Alonso I, Contreras D. Evaluation of semantic similarity metrics applied to the automatic retrieval of medical documents: an UMLS approach. Expert Syst Appl 2016;44:386–99.10.1016/j.eswa.2015.09.028Search in Google Scholar

[10] Xu X. Cluster-Based Query Expansion Using Language Modeling for Biomedical Literature Retrieval. Doctoral dissertation, A Thesis Submitted to the Faculty, Drexel University; 2011.Search in Google Scholar

[11] Ontrup J, Nattkemper TW, Gerstung O, Ritter H. A MeSH term based distance measure for document retrieval and labeling assistance. In: Engineering in Medicine and Biology Society, Proceedings of the 25th Annual International Conference of the IEEE 2003;2:1303–6.10.1109/IEMBS.2003.1279511Search in Google Scholar

[12] Lourenço A, Carneiro S, Ferreira EC, Carreira R, Rocha LM, Glez-Peña D, et al. Biomedical text mining applied to document retrieval and semantic indexing. In: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living. Berlin, Heidelberg: Springer; 2009:954–63.10.1007/978-3-642-02481-8_146Search in Google Scholar

[13] Boer M, Schutte K, Kraaij W. Knowledge based query expansion in complex multimedia event detection. Multimed Tools Appl 2016;75:9025–43.10.1007/s11042-015-2757-4Search in Google Scholar

[14] Chen H, Martin B, Daimon CM, Maudsley S. Effective use of latent semantic indexing and computational linguistics in biological and biomedical applications. Front Physiol 2013;4:8.10.3389/fphys.2013.00008Search in Google Scholar PubMed PubMed Central

[15] Yamamoto Y, Takagi T. Biomedical knowledge navigation by literature clustering. J Biomed Inform 2007;40:114–30.10.1016/j.jbi.2006.07.004Search in Google Scholar PubMed

[16] Meiyappan Y, Iyengar S. Interactive query expansion using concept-based directions finder based on Wikipedia. Int Arab J Inf Tech 2013;10:571–8.Search in Google Scholar

[17] Christopher D, Prabhakar R, Hinrich S. An Introduction to Information Retrieval. Cambridge, England: Cambridge University Press; 2009.Search in Google Scholar

[18] Alfred R, Chin KO, Anthony P, San PW, Im TL, Leong LC, et al. Ontology-based query expansion for supporting information retrieval in agriculture. In: The 8th International Conference on Knowledge Management in Organizations. Dordrecht: Springer 2014:299–311.10.1007/978-94-007-7287-8_24Search in Google Scholar

[19] Na SH, Kang IS, Roh JE, Lee JH. An empirical study of query expansion and cluster-based retrieval in language modeling approach. Inf Process Manag 2007;43:302–14.10.1016/j.ipm.2006.07.003Search in Google Scholar

[20] Consoli S, Stilianakis NI. A quartet method based on variable neighborhood search for biomedical literature extraction and clustering. Int Trans Oper Res 2017;24:537–58.10.1111/itor.12240Search in Google Scholar

[21] Xu X, Xiaodan Z, Xiaohua H. Using two-stage concept-based singular value decomposition technique as a query expansion strategy. In: 21st International Conference on Advanced Information Networking and Applications Workshops, 2007, AINAW’07. 2007;1:295–300.10.1109/AINAW.2007.366Search in Google Scholar

[22] Gan L, Hong H. Improving query expansion for information retrieval using Wikipedia. IJDTA 2015;8:27–40.10.14257/ijdta.2015.8.3.03Search in Google Scholar

[23] Singh J, Sharan A. Co-occurrence and semantic similarity based hybrid approach for improving automatic query expansion in information retrieval. In: Natarajan R, Barua G, Patra MR, editors. International Conference on Distributed Computing and Internet Technology 2015. Lecture Notes in Computer Science: Springer, Cham; 2015;8956:415–8.10.1007/978-3-319-14977-6_45Search in Google Scholar

[24] http://www.ncbi. nlm.nih.gov/PubMed/.Search in Google Scholar

[25] Ferrari DG, De Castro LN. Clustering algorithm selection by meta-learning systems: a new distance-based problem characterization and ranking combination methods. Inf Sci 2015;301:181–94.10.1016/j.ins.2014.12.044Search in Google Scholar

[26] Sudipto G, Mishra N. Clustering Data Streams. In: Data Stream Management. Springer: Berlin, Heidelberg; 2016:169–87. doi: https://doi.org/10.1007/978-3-540-28608-0_8.10.1007/978-3-540-28608-0_8Search in Google Scholar

[27] Shirzad MB, Keyvanpour MR. A feature selection method based on minimum redundancy maximum relevance for learning to rank. In: 5th Conference on Artificial Intelligence and Robotics. IRANOPEN: Qazvin 2015;1–5.10.1109/RIOS.2015.7270735Search in Google Scholar

[28] Peng S, You R, Wang H, Zhai C, Mamitsuka H, Zhu S. DeepMeSH: deep semantic representation for improving large-scale MeSH indexing. Bioinformatics 2016;32:i70–9.10.1093/bioinformatics/btw294Search in Google Scholar PubMed PubMed Central

[29] Ragunath R, Sivaranjani N. Ontology based text document summarization system using concept terms. ARPN J Eng Appl Sci 2015;10:2638–42.Search in Google Scholar

[30] Zhai Ch, John L. A study of smoothing methods for language models applied to ad hoc information retrieval. In: ACM SIGIR Forum 2017;51:268–76.10.1145/383952.384019Search in Google Scholar

[31] Shirzad MB, Keyvanpour MR. A systematic study of feature selection methods for learning to rank algorithms. Int J Inf Retrieval Res 2018;8:46–67.10.4018/IJIRR.2018070104Search in Google Scholar

[32] Xu B, Lin H, Lin Y, Ma Y, Yang L, Wang J, et al. Improve biomedical information retrieval using modified learning to rank methods. IEEE/ACM Trans Comput Biol Bioinform 2016;15:1797–809.10.1109/TCBB.2016.2578337Search in Google Scholar PubMed

[33] Mottaghi N, Keyvanpour MR. Test suite reduction using data mining techniques: a review article. In: 2017 International Symposium on Computer Science and Software Engineering Conference (CSSE) 2017;61–6.10.1109/CSICSSE.2017.8320118Search in Google Scholar

[34] Yu Z, Bernstam E, Cohen T, Wallace BC, Johnson TR. Improving the utility of MeSH® terms using the TopicalMeSH representation. J Biomed Inform 2016;61:77–86.10.1016/j.jbi.2016.03.013Search in Google Scholar PubMed PubMed Central

[35] Abdou S, Savoy J. Searching in MEDLINE: query expansion and manual indexing evaluation. Inf Process Manag 2008;44:781–9.10.1016/j.ipm.2007.03.013Search in Google Scholar

Received: 2018-05-04
Accepted: 2019-02-26
Published Online: 2019-06-14
Published in Print: 2019-12-18

© 2019 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 29.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/bmt-2018-0068/html
Scroll to top button