Adopting attention-mechanisms for Neural Logic Rule Layers

Jan Niclas Reimann; Andreas Schwung; Steven X. Ding

doi:10.1515/auto-2021-0136

Article

Adopting attention-mechanisms for Neural Logic Rule Layers

Jan Niclas Reimann

Jan Niclas Reimann received his B. Eng. in electrical engineering and his M. Sc. in Systems Engineering from the University of applied Sciences Soest, Germany. He has been working as a research assistant and Ph. D. student since late 2018 at the department for automation technology and learning systems at the University of applied Sciences Soest, Germany. His main research falls into the field of interpretability and explainability analysis with regards to deep learning, especially supervised learning.
, Andreas Schwung

Andreas Schwung received the Ph. D. degree in electrical engineering from the Technische Universität Darmstadt, Darmstadt, Germany, in 2011. From 2011 to 2015, he was an R&D Engineer with MAN Diesel & Turbo SE, Oberhausen, Germany. Since 2015, he has been a Professor of automation technology at the South Westphalia University of Applied Sciences, Soest, Germany. His research interests include model based control, networked automation systems, and intelligent data analytics with applications in manufacturing and process industry.
and Steven X. Ding

Steven X. Ding received Ph. D. degree in electrical engineering from the Gerhard-Mercator University of Duisburg, Germany, in 1992. From 1992 to 1994, he was a R&D engineer at Rheinmetall GmbH. From 1995 to 2001, he was a professor of control engineering at the University of Applied Science Lausitz in Senftenberg, Germany, and served as vice president of this university during 1998–2000. Since 2001, he has been a chair professor of control engineering and the head of the Institute for Automatic Control and Complex Systems (AKS) at the University of Duisburg-Essen, Germany. His research interests are model-based and data-driven fault diagnosis, control and fault-tolerant systems as well as their applications in industry with a focus on automotive systems, chemical processes and renewable energy systems.

Published/Copyright: March 11, 2022

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal at - Automatisierungstechnik Volume 70 Issue 3

Abstract

In previous works we discovered that rule-based systems severely suffer in performance when increasing the number of rules. In order to increase the amount of possible boolean relations while keeping the number of rules fixed, we employ ideas from well known Spatial Transformer Systems and Self-Attention Networks: here, our learned rules are not static but are dynamically adjusted to fit the input data by training a separate rule-prediction system, which is predicting parameter matrices used in Neural Logic Rule Layers. We show, that these networks, termed Adaptive Neural Logic Rule Layers, outperform their static counterpart both in terms of final performance, as well as training stability and excitability during early stages of training.

Zusammenfassung

In früheren Arbeiten haben wir dargelegt, dass neurale regelbasierte Systeme deutliche Probleme bekommen, wenn eine große Anzahl an Regeln gelernt werden muss. Um mehr Kombinationen zuzulassen, ohne die Anzahl der Regeln zu erhöhen, wenden wir grundlegende Ideen anderer Arbeiten wie Spatial Transformer oder Self-Attention Netze auf eigens entwickelte Netze an. Unsere Regeln sind so nicht mehr statisch, sondern veränderlich und passen sich dynamisch an den Eingang des Netzes an. Realisiert wird dies durch weitere, parallel geschaltete, Netze, um unsere ansonsten gelernten Gewichtsmatrizen vorherzusagen anstatt diese direkt zu trainieren. Im Vergleich zum statischen Ansatz erreichen wir nicht nur gleichwertige Klassifizierungsergebnisse, sondern erhöhen deutlich die Stabilität und Erregbarkeit während des frühen Trainings.

Keywords: Neural Logic Rule Layers; weight prediction; interpretability

Schlagwörter: Neurale Regelbasierte Netze; Gewichtsprediktion; Interpretierbarkeit

Dedicated to the 60th birthday of Prof. Dr.-Ing. Jürgen Adamy.

About the authors

Jan Niclas Reimann

Jan Niclas Reimann received his B. Eng. in electrical engineering and his M. Sc. in Systems Engineering from the University of applied Sciences Soest, Germany. He has been working as a research assistant and Ph. D. student since late 2018 at the department for automation technology and learning systems at the University of applied Sciences Soest, Germany. His main research falls into the field of interpretability and explainability analysis with regards to deep learning, especially supervised learning.

Andreas Schwung

Andreas Schwung received the Ph. D. degree in electrical engineering from the Technische Universität Darmstadt, Darmstadt, Germany, in 2011. From 2011 to 2015, he was an R&D Engineer with MAN Diesel & Turbo SE, Oberhausen, Germany. Since 2015, he has been a Professor of automation technology at the South Westphalia University of Applied Sciences, Soest, Germany. His research interests include model based control, networked automation systems, and intelligent data analytics with applications in manufacturing and process industry.

Steven X. Ding

Steven X. Ding received Ph. D. degree in electrical engineering from the Gerhard-Mercator University of Duisburg, Germany, in 1992. From 1992 to 1994, he was a R&D engineer at Rheinmetall GmbH. From 1995 to 2001, he was a professor of control engineering at the University of Applied Science Lausitz in Senftenberg, Germany, and served as vice president of this university during 1998–2000. Since 2001, he has been a chair professor of control engineering and the head of the Institute for Automatic Control and Complex Systems (AKS) at the University of Duisburg-Essen, Germany. His research interests are model-based and data-driven fault diagnosis, control and fault-tolerant systems as well as their applications in industry with a focus on automotive systems, chemical processes and renewable energy systems.

References

1. Fukushima, K. and S. Miyake. 1982. Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In: (S.-i.Amari and M. A. Arbib, eds) Competition and Cooperation in Neural Nets. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 267–285.10.1007/978-3-642-46466-9_18Search in Google Scholar

2. Engstrom, L., B. Tran, D. Tsipras, L. Schmidt and A. Madry. 2019. Exploring the landscape of spatial robustness. In: Proceedings of Machine Learning Research, vol. 97. PMLR, pp. 1802–1811. Available from: https://proceedings.mlr.press/v97/engstrom19a.html.Search in Google Scholar

3. Jo, J. and Y. Bengio. 2017. Measuring the tendency of CNNs to learn surface statistical regularities. ArXiv. Available from: https://arxiv.org/abs/1711.11561.Search in Google Scholar

4. Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin. 2017. Attention is all you need. ArXiv. Available from: https://arxiv.org/abs/1706.03762.Search in Google Scholar

5. Jaderberg, M., K. Simonyan, A. Zisserman et al. 2015. Spatial transformer networks. Advances in neural information processing systems 28: 2017–2025.Search in Google Scholar

6. Lin, M., Q. Chen and S. Yan. 2014. Network in network. ArXiv. Available from: https://arxiv.org/abs/1312.4400.Search in Google Scholar

7. M. Stollenga, J. Masci, F. Gomez and J. Schmidhuber (eds). 2014. Deep Networks with Internal Selective Attention through Feedback Connections. NIPS’14, vol. 27.Search in Google Scholar

8. Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Available from: https://www.rctn.org/bruno/public/papers/Fukushima1980.pdf.10.1007/BF00344251Search in Google Scholar PubMed

9. Goodfellow, I., D. Warde-Farley, M. Mirza, A. Courville and Y. Bengio. 2013. Maxout Networks. In: Proceedings of Machine Learning Research, vol. 28. PMLR.Search in Google Scholar

10. Schaul, T., T. Glasmachers and J. Schmidhuber. 2011. High dimensions and heavy tails for natural evolution strategies. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation. New York, NY, USA: Association for Computing Machinery, pp. 845–852.10.1145/2001576.2001692Search in Google Scholar

11. Reimann, J. N. and A. Schwung. 2019. Neural logic rule layers. ArXiv. Available from: http://arxiv.org/abs/1907.00878.Search in Google Scholar

12. Lecun, Y., L. Bottou, Y. Bengio and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11): 2278–2324.10.1109/5.726791Search in Google Scholar

13. Hasanpour, S. H., M. Rouhani, M. Fayyaz and M. Sabokrou. 2018. Lets keep it simple, using simple architectures to outperform deeper and more complex architectures. ArXiv. Available from: https://arxiv.org/abs/1608.06037.Search in Google Scholar

14. He K., X. Zhang, S. Ren and J. Sun. 2016. Deep Residual Learning for Image Recognition. IEEE Computer Society. Available from: https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf.10.1109/CVPR.2016.90Search in Google Scholar

15. Krizhevsky, A., V. Nair and G. Hinton. Cifar-10 (canadian institute for advanced research). Available from: http://www.cs.toronto.edu/~kriz/cifar.html.Search in Google Scholar

16. Akiba, T., S. Sano, T. Yanase, T. Ohta and M. Koyama. 2019. Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.10.1145/3292500.3330701Search in Google Scholar

17. Zeiler, M. D. 2012. Adadelta: An adaptive learning rate method. ArXiv. Available from: https://arxiv.org/abs/1212.5701.Search in Google Scholar

18. Kiefer, J. and J. Wolfowitz. 1952. Stochastic estimation of the maximum of a regression function. Annals of Mathematical Statistics 23: 462–466.10.1007/978-1-4613-8505-9_4Search in Google Scholar

19. Glorot, X. and Y. Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In: (Y. W. Teh and M. Titterington, eds) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 9. Chia Laguna Resort, Sardinia, Italy: PMLR, pp. 249–256. Available from: https://proceedings.mlr.press/v9/glorot10a.html.Search in Google Scholar

20. Kingma, D. P. and J. Ba. 2017. Adam: A method for stochastic optimization. ArXiv. Available from: https://arxiv.org/abs/1412.6980.Search in Google Scholar

21. Bergstra, J., R. Bardenet, Y. Bengio and B. Kégl. 2011. Algorithms for hyper-parameter optimization. In: Proceedings of the 24th International Conference on Neural Information Processing Systems. NIPS’11. Red Hook, NY, USA: Curran Associates Inc, pp. 2546–2554.Search in Google Scholar

22. Goodfellow, I., Y. Bengio and A. Courville. 2016. Deep Learning. MIT Press.Search in Google Scholar

23. Zeiler, M. D. and R. Fergus. 2014. Visualizing and understanding convolutional networks. In: In Computer Vision–ECCV 2014. Springer, pp. 818–833.10.1007/978-3-319-10590-1_53Search in Google Scholar

24. Kokhlikyan, N., V. Miglani, M. Martin, E. Wang, B. Alsallakh, J. Reynolds, A. Melnikov, N. Kliushkina, C. Araya, S. Yan and O. Reblitz-Richardson. 2020. Captum: A unified and generic model interpretability library for pytorch. ArXiv. Available from: https://arxiv.org/abs/2009.07896.Search in Google Scholar

25. Bengio, Y., N. Léonard and A. Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. ArXiv. Available from: https://arxiv.org/abs/1308.3432.Search in Google Scholar

Received: 2021-09-20

Accepted: 2022-01-29

Published Online: 2022-03-11

Published in Print: 2022-03-28

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/auto-2021-0136

Keywords for this article

Neural Logic Rule Layers; weight prediction; interpretability