Adopting attention-mechanisms for Neural Logic Rule Layers

Jan Niclas Reimann; Andreas Schwung; Steven X. Ding

doi:10.1515/auto-2021-0136

Artikel

Adopting attention-mechanisms for Neural Logic Rule Layers

Jan Niclas Reimann

Jan Niclas Reimann received his B. Eng. in electrical engineering and his M. Sc. in Systems Engineering from the University of applied Sciences Soest, Germany. He has been working as a research assistant and Ph. D. student since late 2018 at the department for automation technology and learning systems at the University of applied Sciences Soest, Germany. His main research falls into the field of interpretability and explainability analysis with regards to deep learning, especially supervised learning.
, Andreas Schwung

Andreas Schwung received the Ph. D. degree in electrical engineering from the Technische Universität Darmstadt, Darmstadt, Germany, in 2011. From 2011 to 2015, he was an R&D Engineer with MAN Diesel & Turbo SE, Oberhausen, Germany. Since 2015, he has been a Professor of automation technology at the South Westphalia University of Applied Sciences, Soest, Germany. His research interests include model based control, networked automation systems, and intelligent data analytics with applications in manufacturing and process industry.
und Steven X. Ding

Steven X. Ding received Ph. D. degree in electrical engineering from the Gerhard-Mercator University of Duisburg, Germany, in 1992. From 1992 to 1994, he was a R&D engineer at Rheinmetall GmbH. From 1995 to 2001, he was a professor of control engineering at the University of Applied Science Lausitz in Senftenberg, Germany, and served as vice president of this university during 1998–2000. Since 2001, he has been a chair professor of control engineering and the head of the Institute for Automatic Control and Complex Systems (AKS) at the University of Duisburg-Essen, Germany. His research interests are model-based and data-driven fault diagnosis, control and fault-tolerant systems as well as their applications in industry with a focus on automotive systems, chemical processes and renewable energy systems.

Veröffentlicht/Copyright: 11. März 2022

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

Aus der Zeitschrift at - Automatisierungstechnik Band 70 Heft 3

Abstract

In previous works we discovered that rule-based systems severely suffer in performance when increasing the number of rules. In order to increase the amount of possible boolean relations while keeping the number of rules fixed, we employ ideas from well known Spatial Transformer Systems and Self-Attention Networks: here, our learned rules are not static but are dynamically adjusted to fit the input data by training a separate rule-prediction system, which is predicting parameter matrices used in Neural Logic Rule Layers. We show, that these networks, termed Adaptive Neural Logic Rule Layers, outperform their static counterpart both in terms of final performance, as well as training stability and excitability during early stages of training.

Zusammenfassung

In früheren Arbeiten haben wir dargelegt, dass neurale regelbasierte Systeme deutliche Probleme bekommen, wenn eine große Anzahl an Regeln gelernt werden muss. Um mehr Kombinationen zuzulassen, ohne die Anzahl der Regeln zu erhöhen, wenden wir grundlegende Ideen anderer Arbeiten wie Spatial Transformer oder Self-Attention Netze auf eigens entwickelte Netze an. Unsere Regeln sind so nicht mehr statisch, sondern veränderlich und passen sich dynamisch an den Eingang des Netzes an. Realisiert wird dies durch weitere, parallel geschaltete, Netze, um unsere ansonsten gelernten Gewichtsmatrizen vorherzusagen anstatt diese direkt zu trainieren. Im Vergleich zum statischen Ansatz erreichen wir nicht nur gleichwertige Klassifizierungsergebnisse, sondern erhöhen deutlich die Stabilität und Erregbarkeit während des frühen Trainings.

Keywords: Neural Logic Rule Layers; weight prediction; interpretability

Schlagwörter: Neurale Regelbasierte Netze; Gewichtsprediktion; Interpretierbarkeit

Dedicated to the 60th birthday of Prof. Dr.-Ing. Jürgen Adamy.

About the authors

Jan Niclas Reimann

Jan Niclas Reimann received his B. Eng. in electrical engineering and his M. Sc. in Systems Engineering from the University of applied Sciences Soest, Germany. He has been working as a research assistant and Ph. D. student since late 2018 at the department for automation technology and learning systems at the University of applied Sciences Soest, Germany. His main research falls into the field of interpretability and explainability analysis with regards to deep learning, especially supervised learning.

Andreas Schwung

Andreas Schwung received the Ph. D. degree in electrical engineering from the Technische Universität Darmstadt, Darmstadt, Germany, in 2011. From 2011 to 2015, he was an R&D Engineer with MAN Diesel & Turbo SE, Oberhausen, Germany. Since 2015, he has been a Professor of automation technology at the South Westphalia University of Applied Sciences, Soest, Germany. His research interests include model based control, networked automation systems, and intelligent data analytics with applications in manufacturing and process industry.

Steven X. Ding

Steven X. Ding received Ph. D. degree in electrical engineering from the Gerhard-Mercator University of Duisburg, Germany, in 1992. From 1992 to 1994, he was a R&D engineer at Rheinmetall GmbH. From 1995 to 2001, he was a professor of control engineering at the University of Applied Science Lausitz in Senftenberg, Germany, and served as vice president of this university during 1998–2000. Since 2001, he has been a chair professor of control engineering and the head of the Institute for Automatic Control and Complex Systems (AKS) at the University of Duisburg-Essen, Germany. His research interests are model-based and data-driven fault diagnosis, control and fault-tolerant systems as well as their applications in industry with a focus on automotive systems, chemical processes and renewable energy systems.

References

1. Fukushima, K. and S. Miyake. 1982. Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In: (S.-i.Amari and M. A. Arbib, eds) Competition and Cooperation in Neural Nets. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 267–285.10.1007/978-3-642-46466-9_18Suche in Google Scholar

2. Engstrom, L., B. Tran, D. Tsipras, L. Schmidt and A. Madry. 2019. Exploring the landscape of spatial robustness. In: Proceedings of Machine Learning Research, vol. 97. PMLR, pp. 1802–1811. Available from: https://proceedings.mlr.press/v97/engstrom19a.html.Suche in Google Scholar

3. Jo, J. and Y. Bengio. 2017. Measuring the tendency of CNNs to learn surface statistical regularities. ArXiv. Available from: https://arxiv.org/abs/1711.11561.Suche in Google Scholar

4. Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin. 2017. Attention is all you need. ArXiv. Available from: https://arxiv.org/abs/1706.03762.Suche in Google Scholar

5. Jaderberg, M., K. Simonyan, A. Zisserman et al. 2015. Spatial transformer networks. Advances in neural information processing systems 28: 2017–2025.Suche in Google Scholar

6. Lin, M., Q. Chen and S. Yan. 2014. Network in network. ArXiv. Available from: https://arxiv.org/abs/1312.4400.Suche in Google Scholar

7. M. Stollenga, J. Masci, F. Gomez and J. Schmidhuber (eds). 2014. Deep Networks with Internal Selective Attention through Feedback Connections. NIPS’14, vol. 27.Suche in Google Scholar

8. Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Available from: https://www.rctn.org/bruno/public/papers/Fukushima1980.pdf.10.1007/BF00344251Suche in Google Scholar PubMed

9. Goodfellow, I., D. Warde-Farley, M. Mirza, A. Courville and Y. Bengio. 2013. Maxout Networks. In: Proceedings of Machine Learning Research, vol. 28. PMLR.Suche in Google Scholar

10. Schaul, T., T. Glasmachers and J. Schmidhuber. 2011. High dimensions and heavy tails for natural evolution strategies. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation. New York, NY, USA: Association for Computing Machinery, pp. 845–852.10.1145/2001576.2001692Suche in Google Scholar

11. Reimann, J. N. and A. Schwung. 2019. Neural logic rule layers. ArXiv. Available from: http://arxiv.org/abs/1907.00878.Suche in Google Scholar

12. Lecun, Y., L. Bottou, Y. Bengio and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11): 2278–2324.10.1109/5.726791Suche in Google Scholar

13. Hasanpour, S. H., M. Rouhani, M. Fayyaz and M. Sabokrou. 2018. Lets keep it simple, using simple architectures to outperform deeper and more complex architectures. ArXiv. Available from: https://arxiv.org/abs/1608.06037.Suche in Google Scholar

14. He K., X. Zhang, S. Ren and J. Sun. 2016. Deep Residual Learning for Image Recognition. IEEE Computer Society. Available from: https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf.10.1109/CVPR.2016.90Suche in Google Scholar

15. Krizhevsky, A., V. Nair and G. Hinton. Cifar-10 (canadian institute for advanced research). Available from: http://www.cs.toronto.edu/~kriz/cifar.html.Suche in Google Scholar

16. Akiba, T., S. Sano, T. Yanase, T. Ohta and M. Koyama. 2019. Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.10.1145/3292500.3330701Suche in Google Scholar

17. Zeiler, M. D. 2012. Adadelta: An adaptive learning rate method. ArXiv. Available from: https://arxiv.org/abs/1212.5701.Suche in Google Scholar

18. Kiefer, J. and J. Wolfowitz. 1952. Stochastic estimation of the maximum of a regression function. Annals of Mathematical Statistics 23: 462–466.10.1007/978-1-4613-8505-9_4Suche in Google Scholar

19. Glorot, X. and Y. Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In: (Y. W. Teh and M. Titterington, eds) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 9. Chia Laguna Resort, Sardinia, Italy: PMLR, pp. 249–256. Available from: https://proceedings.mlr.press/v9/glorot10a.html.Suche in Google Scholar

20. Kingma, D. P. and J. Ba. 2017. Adam: A method for stochastic optimization. ArXiv. Available from: https://arxiv.org/abs/1412.6980.Suche in Google Scholar

21. Bergstra, J., R. Bardenet, Y. Bengio and B. Kégl. 2011. Algorithms for hyper-parameter optimization. In: Proceedings of the 24th International Conference on Neural Information Processing Systems. NIPS’11. Red Hook, NY, USA: Curran Associates Inc, pp. 2546–2554.Suche in Google Scholar

22. Goodfellow, I., Y. Bengio and A. Courville. 2016. Deep Learning. MIT Press.Suche in Google Scholar

23. Zeiler, M. D. and R. Fergus. 2014. Visualizing and understanding convolutional networks. In: In Computer Vision–ECCV 2014. Springer, pp. 818–833.10.1007/978-3-319-10590-1_53Suche in Google Scholar

24. Kokhlikyan, N., V. Miglani, M. Martin, E. Wang, B. Alsallakh, J. Reynolds, A. Melnikov, N. Kliushkina, C. Araya, S. Yan and O. Reblitz-Richardson. 2020. Captum: A unified and generic model interpretability library for pytorch. ArXiv. Available from: https://arxiv.org/abs/2009.07896.Suche in Google Scholar

25. Bengio, Y., N. Léonard and A. Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. ArXiv. Available from: https://arxiv.org/abs/1308.3432.Suche in Google Scholar

Received: 2021-09-20

Accepted: 2022-01-29

Published Online: 2022-03-11

Published in Print: 2022-03-28

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/auto-2021-0136

Schlagwörter für diesen Artikel

Neural Logic Rule Layers; weight prediction; interpretability