Identification of similarities and clusters of bread baking recipes based on data of ingredients

Stefan Anlauf; Sebastian Dorl; Theresa Hirz; Melanie Lasslberger; Rudolf Grassmann; Johannes Himmelbauer; Stephan Winkler

doi:10.1515/ijfe-2023-0032

Article

Identification of similarities and clusters of bread baking recipes based on data of ingredients

Stefan Anlauf , Sebastian Dorl , Theresa Hirz , Melanie Lasslberger , Rudolf Grassmann , Johannes Himmelbauer and Stephan Winkler

Published/Copyright: March 11, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal International Journal of Food Engineering

Abstract

We define the similarity of bakery recipes using different distance calculations and identify groups of similar recipes using different clustering algorithms. Our analyses are based on the relative amounts of ingredients included in the recipes. We compare different clustering algorithms (k-means, k-medoid, and hierarchical clustering) to find the optimal number of clusters. Besides the standard distance calculation (euclidean distance), we test three other distance metrics (hamming distance, manhattan distance, and cosine similarity). Additionally, we reduce the impact of raw materials used in large quantities by applying two different data transformations, namely the logarithm of the original data and the binarization of the original data. Clustering recipes based on their ingredients can improve the search for similar recipes and therefore help with the time-consuming process of developing new recipes. Using the hierarchical clustering on the logarithm of the original data, we can separate 704 recipes into three different clusters, achieving a Silhouette Score of 0.531. We visualize our results via dendrograms representing the recipes’ hierarchical separation into individual groups and sub-groups.

Keywords: machine learning; clustering; ingredient; baking recipes

Corresponding authors: Stefan Anlauf, Sebastian Dorl and Stephan Winkler, Department of Bioinformatics, University of Applied Sciences Upper Austria, Softwarepark 11, 4232 Hagenberg, Austria; and Department of Computer Science, Johannes Kepler University, Altenberger Straße 69, 4040 Linz, Austria, E-mail: stefan.anlauf@fh-hagenberg.at (S. Anlauf), sebastian.dorl@fh-hagenberg.at (S. Dorl), stephan.winkler@fh-hagenberg.at (S. Winkler)

Funding source: Österreichische Forschungsförderungsgesellschaft

Award Identifier / Grant number: INTEGRATE – 892418

Funding source: The research reported in this paper has been funded by BMK, BMDW, and the State of Upper Austria in the frame of the COMET Programme managed by FFG

Acknowledgment

The project is a cooperation between University of Applied Sciences Upper Austria, Software Competence Center Hagenberg and backaldrin International The Kornspitz Company GmbH. The data was provided by the company backaldrin International The Kornspitz Company GmbH.

Research ethics: Not applicable.
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Competing interests: The authors state no conflict of interest.
Research funding: The research reported in this paper has been funded by BMK, BMDW, and the State of Upper Austria in the frame of the COMET Programme managed by FFG. This research was also funded by Österreichische Forschungsförderungsgesellschaft (INTEGRATE – 892418).
Data availability: Not applicable.

References

1. Su, H, Lin, TW, Li, CT, Shan, MK, Chang, J. Automatic recipe cuisine classification by ingredients. In: UbiComp ’14 Adjunct. New York, NY, USA: Association for Computing Machinery; 2014:565–70 pp.10.1145/2638728.2641335Search in Google Scholar

2. Kicherer, H, Dittrich, M, Grebe, L, Scheible, C, Klinger, R. What you use, not what you do: automatic classification and similarity detection of recipes. Data Knowl Eng 2018;117:252–63. https://doi.org/10.1016/j.datak.2018.04.004.Search in Google Scholar

3. Nadamoto, A, Hanai, S, Nanba, H. Clustering for similar recipes in user-generated recipe sites based on main ingredients and main seasoning. In: 2016 19th international conference on network-based information systems (NBiS). Ostrava, Czech Republic: IEEE; 2016:336–41 pp.10.1109/NBiS.2016.49Search in Google Scholar

4. Faisal, M, Zamzami, E, et al.. Comparative analysis of inter-centroid K-Means performance using euclidean distance, canberra distance and manhattan distance. J Phys Conf 2020;1566:012112.10.1088/1742-6596/1566/1/012112Search in Google Scholar

5. Huang, A, et al.. Similarity measures for text document clustering. In: Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008), Christchurch, New Zealand, vol 4; 2008:9–56 pp.Search in Google Scholar

6. Norouzi, M, Fleet, DJ, Salakhutdinov, RR. Hamming distance metric learning. Adv Neural Inf Process Syst 2012;25:1061–9.Search in Google Scholar

7. Barlow, HB. Unsupervised learning. Neural Comput 1989;1:295–311. https://doi.org/10.1162/neco.1989.1.3.295.Search in Google Scholar

8. Sinaga, KP, Yang, MS. Unsupervised K-means clustering algorithm. IEEE Access 2020;8:80716–27. https://doi.org/10.1109/access.2020.2988796.Search in Google Scholar

9. Park, HS, Jun, CH. A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 2009;36:3336–41. https://doi.org/10.1016/j.eswa.2008.01.039.Search in Google Scholar

10. Johnson, SC. Hierarchical clustering schemes. Psychometrika 1967;32:241–54. https://doi.org/10.1007/bf02289588.Search in Google Scholar

11. Pedregosa, F, Varoquaux, G, Gramfort, A, Michel, V, Thirion, B, Grisel, O, et al.. Scikit-learn: machine learning in Python. CoRR. 2012;abs/1201.0490. Available from: http://arxiv.org/abs/1201.0490.Search in Google Scholar

12. Abdi, H, Williams, LJ. Principal component analysis. Wiley Interdiscip Rev Comput Stat 2010;2:433–59. https://doi.org/10.1002/wics.101.Search in Google Scholar

13. Rogovschi, N, Kitazono, J, Grozavu, N, Omori, T, Ozawa, S. t-Distributed stochastic neighbor embedding spectral clustering. In: 2017 international joint conference on neural networks (IJCNN). Anchorage, AK, USA: IEEE; 2017:1628–32 pp.10.1109/IJCNN.2017.7966046Search in Google Scholar

14. Shahapure, KR, Nicholas, C. Cluster quality analysis using silhouette score. In: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). Sydney, NSW, Australia: IEEE; 2020:747–8 pp.10.1109/DSAA49011.2020.00096Search in Google Scholar

15. Davies, D, Bouldin, D. A cluster separation measure. Pattern analysis and machine intelligence. IEEE Trans 1979;PAMI-1:224–7. https://doi.org/10.1109/tpami.1979.4766909.Search in Google Scholar

Received: 2023-02-02

Accepted: 2023-11-14

Published Online: 2024-03-11

You are currently not able to access this content.

https://doi.org/10.1515/ijfe-2023-0032

Keywords for this article

machine learning; clustering; ingredient; baking recipes