Abstract
The relation that exists between the structure of a compound and its function is an integral part of chemoinformatics. The similarity principle states that “structurally similar molecules tend to have similar properties and similar molecules exert similar biological activities”. The similarity of the molecules can either be studied at the structure level or at the descriptor level (properties level). Generally, the objective of chemical similarity measures is to enhance prediction of the biological activities of molecules. In this article, an overview of various methods used to compare the similarity between metabolite structures has been provided, including two-dimensional (2D) and three-dimensional (3D) approaches. The focus has been on methods description; e.g. fingerprint-based similarity in which the molecules under study are first fragmented and their fingerprints are computed, 2D structural similarity by comparing the Tanimoto coefficients and Euclidean distances, as well as the use of physiochemical properties descriptor-based similarity methods. The similarity between molecules could also be measured by using data mining (clustering) techniques, e.g. by using virtual screening (VS)-based similarity methods. In this approach, the molecules with the desired descriptors or /and structures are screened from large databases. Lastly, SMILES-based chemical similarity search is an important method for studying the exact structure search, substructure search and also descriptor similarity. The use of a particular method depends upon the requirements of the researcher.
Acknowledgements
FNK acknowledge funding from the European Structural and Investment Funds, through the OP RDE-funded project “ChemJets” (Award No. CZ.02.2.69/0.0/0.0/16_027/0008351). FNK also received an equipment donation from the Alexander von Humboldt Foundation, Germany. The technical support of Mme. Bokeng and Mr. Eseme are acknowledged. The reviewers are appreciated for their constructive comments to improve the final manuscript.
References
1. Nikolova N, Jaworska J. Approaches to measure chemical similarity - a review. QSAR Combi Sci. 2003;22:1006–26.10.1002/qsar.200330831Search in Google Scholar
2. Johnson AM, Maggiora GM. Concepts and applications of molecular similarity. New York: John Willey & Sons, 1990. ISBN 978-0-471-62175–1.Search in Google Scholar
3. Martin Y, Kofron J, Traphagen L. Do structurally similar molecules have similar biological activity. J Med Chem. 2002;45:4350.10.1021/jm020155cSearch in Google Scholar PubMed
4. Kubinyi H. Similarity and dissimilarity: a medicinal chemist’s view. Perspect Drug Discovery Des. 1998;9:225.10.1007/0-306-46857-3_13Search in Google Scholar
5. Abegaz BM, Kinfe HH. Secondary metabolites, their structural diversity, bioactivity, and ecological functions: an overview. Phys Sci Rev. 2018. DOI:10.1515/psr-2018-0100.Search in Google Scholar
6. Cragg G, Newman D. Natural products: a continuing source of novel drug leads. Biochim Biophys Acta. 2013;1830:3670.10.1016/j.bbagen.2013.02.008Search in Google Scholar PubMed PubMed Central
7. Bennett R, Wallsgrove R. Secondary metabolites in plant defence mechanisms. New Phytol. 1994;127:617.10.1111/j.1469-8137.1994.tb02968.xSearch in Google Scholar PubMed
8. Liu K, Abdullah AA, Huang M, Nishioka T, Altaf-Ul-Amin M, Kanay S. Novel approach to classify plants based on metabolite-content similarity. BioMed Res Int. 2017;2017:296729.10.1155/2017/5296729Search in Google Scholar PubMed PubMed Central
9. Bajusz D, Rácz A, Héberger K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J Cheminform. 2015;7:20.10.1186/s13321-015-0069-3Search in Google Scholar PubMed PubMed Central
10. Lo YC, Senese S, Damoiseaux R, Torres JZ. 3D Chemical similarity networks for structure-based target prediction and scaffold hopping. ACS Chem Biol. 2016;11:2244–53.10.1021/acschembio.6b00253Search in Google Scholar PubMed PubMed Central
11. Yan X, Liao C, Liu Z, Hagler AT, Gu Q1, Xu J. Chemical structure similarity search for ligand-based virtual screening: methods and computational resources. Curr Drug Targets. 2016;17:1580–5.10.2174/1389450116666151102095555Search in Google Scholar PubMed
12. Skinnider MA, Dejong CA, Franczak BC, McNicholas PD, Magarvey NA. Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm. J Cheminform. 2017;9:46.10.1186/s13321-017-0234-ySearch in Google Scholar PubMed PubMed Central
13. Schwartz J, Awale M, Reymond J-L. SMIfp (SMILES fingerprint) chemical space for virtual screening and visualization of large databases of organic molecules. J Chem Inf Model. 2013;538:1979–89.10.1021/ci400206hSearch in Google Scholar PubMed
14. Kumar A. Chemical similarity methods - a tutorial review. Chem Educator. 2011;16:1.Search in Google Scholar
15. Mackay D. Chapter 20, “An example inference task: clustering information theory, inference and learning algorithms. Cambridge University Press, 2003:284–92.Search in Google Scholar
16. Koulouridi E, Valli M, Ntie-Kang F, Bolzani VS. A primer on natural product-based virtual screening. Phys Sci Rev. 2018. DOI:10.1515/psr-2018-0105.Search in Google Scholar
17. Sterling T, Irwin JJ. ZINC 15 – ligand discovery for everyone. J Chem Inf Model. 2015;55:2324–37.10.1021/acs.jcim.5b00559Search in Google Scholar PubMed PubMed Central
18. Irwin JJ. Using ZINC to acquire a virtual screening library. In: Current protocols in bioinformatics (Suppl. 22) 14.6.1-14.6.23. Wiley Interscience John Wiley & Sons, Inc., 2008. DOI:10.1002/0471250953.bi1406s22.10.1002/0471250953.bi1406s22Search in Google Scholar PubMed
19. Atta-ur-rahmann CM. Chemistry and biology of steroidal alkaloids from marine organisms. Alkaloids. 1999;52:233.10.1016/S0099-9598(08)60028-0Search in Google Scholar
20. Kotler-Brajtburg J, Medoff G, Kobayashi GS, Boggs S, Schlessinger D, Pandey RC, et al. Classification of polyene antibiotics according to chemical structure and biological effects. Antimicrob Agents Chemother. 1979;15:716–22.10.1128/AAC.15.5.716Search in Google Scholar PubMed PubMed Central
21. Maggiora G, Vogt M, Stumpfe D, Bajorath J. Molecular similarity in medicinal chemistry. J Med Chem. 2014;57:3186–204.10.1021/jm401411zSearch in Google Scholar PubMed
22. Bender A, Jenkins J, Scheiber J, Sukuru S, Glick M, Davies J. How similar are similarity searching methods? A principal component analysis of molecular descriptor space. J Chem Inf Model. 2009;49:108–19.10.1021/ci800249sSearch in Google Scholar PubMed
23. Thimm M, Goede A, Hougardy S, Preibner R. Comparison of 2D similarity and 3D superposition. Application to searching a conformational drug database. J Chem Inf Computer Sci. 2004;44:1816–22.10.1021/ci049920hSearch in Google Scholar PubMed
24. Awale M, Reymond JL. A multi-fingerprint browser for the ZINC database. Nucleic Acids Res. 2014;42:W234–39.10.1093/nar/gku379Search in Google Scholar PubMed PubMed Central
25. Awale M, Jin X, Reymond J-L. Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints. J Cheminform. 2015;7:3.10.1186/s13321-014-0051-5Search in Google Scholar
26. Schwartz J, Awale M, Reymond JL. SMIfp (SMILES fingerprint) chemical space for virtual screening and visualization of large databases of organic molecules. J Chem Inf Model. 2013;53:1979–89.10.1021/ci400206hSearch in Google Scholar
27. Wink M. Evolution of secondary metabolites from an ecological and molecular phylogenetic perspective. Phytochemistry. 2003;64:3–19.10.1016/S0031-9422(03)00300-5Search in Google Scholar
28. Nakamura Y, Afendi M, Parvin K. KNApSAcK metabolite activity database for retrieving the relationships between metabolites and biological activities. Plant Cell Physiol. 2014;55:e7.10.1093/pcp/pct176Search in Google Scholar PubMed
29. Altaf-Ul-Amin M, Tsuji H, Kurokawa H, Asahi H, Shinbo Y, Kanaya S. DPClus: a density-periphery based graph clustering software mainly focused on detection of protein complexes in interaction networks. J Comput-Aided Chem. 2006;7:150.10.2751/jcac.7.150Search in Google Scholar
30. Cao Y, Charisi L, Cheng C, Jiang T, Girke T. ChemmineR: a compound mining framework for R. Bioinformatics. 2008;24:1733–4.10.1093/bioinformatics/btn307Search in Google Scholar PubMed PubMed Central
31. Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–97.10.1007/BF00994018Search in Google Scholar
32. Durant JL, Leland BA, Henry DR, Nourse JD. Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci. 2002;42:1273–80.10.1021/ci010132rSearch in Google Scholar PubMed
33. Fox HM. Chemical taxonomy. Nature. 1946;157:511.10.1038/157511a0Search in Google Scholar
34. Smith CR Jr, Powell RG. Plant sources of hepatotoxic pyrrolizidine alkaloids. In: Pelletier SW, editor. Alkaloids, vol. 2. NY: Wiley, 1984:149–204.Search in Google Scholar
35. Kupchan SM, Komoda Y, Court WA, Thomas GJ, Smith RM, Karim A, et al. Maytansine, a novel antileukemic ansa macrolide from Maytenus ovatus. J Am Chem Soc. 1972;94:1354–6.10.1021/ja00759a054Search in Google Scholar PubMed
36. Yu T-W, Bai L, Clade D, Hoffmann D, Toelzer S, Trinh KQ, et al The biosynthetic gene cluster of the maytansinoid antitumor agent ansamitocin from Actinosynnema pretiosum. Proc Natl Acad Sci USA. 2002;99:7968–73.10.1073/pnas.092697199Search in Google Scholar PubMed PubMed Central
37. National Cancer Institute: Definition of Maytansine. https://www.cancer.gov/publications/dictionaries/cancer-drug/def/maytansine?redirect=true. Accessed: 20 Aug 2019.Search in Google Scholar
38. Yang JY, Sanchez LM, Rath CM, Liu X, Boudreau PD, Bruns N, et al. Molecular networking as a dereplication strategy. J Nat Prod. 2013;769:1686–99.10.1021/np400413sSearch in Google Scholar PubMed PubMed Central
39. Aron AT, Gentry EC, McPhail KL, Nothias LF, Nothias-Esposito M, Bouslimani A, et al. Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nat Protoc. 2020;15:1954–91.10.1038/s41596-020-0317-5Search in Google Scholar PubMed
40. Kang KB, Ernst M, Van Der Hooft JJ, Da Silva RR, Park J, Medema MH, et al. Comprehensive mass spectrometry-guided phenotyping of plant specialized metabolites reveals metabolic diversity in the cosmopolitan plant family Rhamnaceae. Plant J. 2019;98:1134–44.10.1111/tpj.14292Search in Google Scholar PubMed
41. Nothias LF, Petras D, Schmid R, Dührkop K, Rainer J, Sarvepalli A, et al. Feature-based molecular networking in the GNPS analysis environment. Nat Methods. 2020;17:905–8.10.1038/s41592-020-0933-6Search in Google Scholar PubMed PubMed Central
42. Gao YL, Wang YJ, Chung HH, Chen KC, Shen TL, Hsu CC. Molecular networking as a dereplication strategy for monitoring metabolites of natural product treated cancer cells. Rapid Commun Mass Spectrom. 2020;34:e8549.10.1002/rcm.8549Search in Google Scholar PubMed
43. Kuo TH, Huang HC, Hsu CC. Mass spectrometry imaging guided molecular networking to expedite discovery and structural analysis of agarwood natural products. Anal Chim Acta. 2019;1080:95–103.10.1016/j.aca.2019.05.070Search in Google Scholar PubMed
© 2020 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Editorial: Advanced chemoinformatics applications at the service of natural product discovery
- Combinatorial library design and virtual screening of cryptolepine derivatives against topoisomerase IIA by molecular docking and DFT studies
- Chemical similarity methods for analyzing secondary metabolite structures
- Carbonyl pigments: miscellaneous types
- Diketopyrrolopyrrole (DPP) pigments
Articles in the same Issue
- Frontmatter
- Editorial: Advanced chemoinformatics applications at the service of natural product discovery
- Combinatorial library design and virtual screening of cryptolepine derivatives against topoisomerase IIA by molecular docking and DFT studies
- Chemical similarity methods for analyzing secondary metabolite structures
- Carbonyl pigments: miscellaneous types
- Diketopyrrolopyrrole (DPP) pigments