Detection of extremist messages in web resources in the Kazakh language

Milana Bolatbek; Shynar Mussiraliyeva

doi:10.1515/lpp-2023-0020

Article

Detection of extremist messages in web resources in the Kazakh language

Milana Bolatbek
Milana Bolatbek holds PhD in Information security systems. She is senior lecturer at Al-Farabi Kazakh national university in Almaty (Kazakhstan). She is also supervisor of the project named “Development of models and methods to identify youth extremism and ensure the safety of youth in the modern information space”. Her research interests include information security, natural language processing, semantic analysis, social media analysis.
and Shynar Mussiraliyeva
Shynar Mussiraliyeva is candidate of physical and mathematical sciences. She is head of the department of Information systems at Al-Farabi Kazakh national university in Almaty (Kazakhstan). She is also supervisor of the project named “Multi-ideology Cyber Extremism Classification in the Kazakh language using Artificial Intelligence”. Her research interests include information security, cryptography, semantic analysis, social media analysis.

Published/Copyright: December 12, 2023

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Lodz Papers in Pragmatics Volume 19 Issue 2

Abstract

Currently, the Internet information and communication network has become an integral part of human life. People use social networks such as Twitter, VKontakte, Facebook, etc., to establish global contacts, exchange opinions, gain knowledge, etc. The active participation of not only individual users, but also information organizations in the entire world space makes it necessary to develop measures that correspond to modern trends in the development of information and communication technologies to ensure national security, in particular, the organization of events related to countering the strengthening of ideas of extremism and terrorism.

Countering the spread of aggressive information on the global network is an urgent problem of society and government agencies, this task is solved by filtering unwanted Internet resources. However, terrorist and extremist groups rationally use web technologies to perform various functions, including information dissemination, propaganda, fundraising and extremist missions. In such a situation, the Internet poses a threat to national security.

In this paper, we investigate the issue of creating semantic analysis models to identify extremist messages in the Kazakh language. For the study, a proprietary text corpus was assembled and models based on bigrams and word input methods were proposed. According to the results of experiments, the proposed model shows the highest indicators for evaluating machine learning methods.

Keywords: web-resources; social network; cybersecurity; extremism; text classification

About the authors

Milana Bolatbek

Milana Bolatbek holds PhD in Information security systems. She is senior lecturer at Al-Farabi Kazakh national university in Almaty (Kazakhstan). She is also supervisor of the project named “Development of models and methods to identify youth extremism and ensure the safety of youth in the modern information space”. Her research interests include information security, natural language processing, semantic analysis, social media analysis.

Shynar Mussiraliyeva

Shynar Mussiraliyeva is candidate of physical and mathematical sciences. She is head of the department of Information systems at Al-Farabi Kazakh national university in Almaty (Kazakhstan). She is also supervisor of the project named “Multi-ideology Cyber Extremism Classification in the Kazakh language using Artificial Intelligence”. Her research interests include information security, cryptography, semantic analysis, social media analysis.

References

Badjatiya, Pinkesh, Shashank Gupta, Manish Gupta & Vasudeva Varma. 2017. Deep Learning for Hate Speech Detection in Tweets. In Proceedings of the 26th International Conference on World Wide Web Companion (WWW ‘17 Companion), 759–760. International World Wide Web Conferences Steering Committee. DOI: https://doi.org/10.1145/3041021.3054223.10.1145/3041021.3054223Search in Google Scholar

Gaikwad, Mayur, Swati Ahirrao, Shraddha Phansalkar & Ketan Kotecha. 2021. Online Extremism Detection: A Systematic Literature Review with Emphasis on Datasets, Classification Techniques, Validation Methods, and Tools. IEEE Access 9. 48364–48404. DOI: 10.1109/ACCESS.2021.3068313.10.1109/ACCESS.2021.3068313.Search in Google Scholar

Govers, Jarod, Philip Feldman, Aaron Dant & Panos Patros. 2023. Down the Rabbit Hole: Detecting Online Extremism, Radicalisation, and Politicised Hate Speech. ACM Computing Surveys 55(145). 1–35. DOI: https://doi.org/10.1145/3583067.10.1145/3583067Search in Google Scholar

Irani, Darius, Avyakta Wrat & Silvio Amir. 2021. Early Detection of Online Hate Speech Spreaders with Learned User Representations. Conference and Labs of the Evaluation Forum. Available at: https://api.semanticscholar.org/CorpusID:237298964 (accessed 30 September 2022).Search in Google Scholar

Johansson, Fredrik, Lisa Kaati & Magnus Sahlgren. 2016. Detecting Linguistic Markers of Violent Extremism in Online Environments. Artificial Intelligence: Concepts, Methodologies, Tools, and Applications. IGI Global. DOI: 10.4018/978-1-5225-1759-7.ch118.10.4018/978-1-5225-1759-7.ch118.Search in Google Scholar

Khanday Akib Mohi Ud Din, Syed Tanzeel Rabani, Qamar Rayees Khan & Showkat Hassan Malik. 2022. Detecting twitter hate speech in COVID-19 era using machine learning and ensemble learning techniques, International Journal of Information Management Data Insights 2(2). 100120. DOI: https://doi.org/10.1016/j.jjimei.2022.100120.10.1016/j.jjimei.2022.100120Search in Google Scholar

Mansur, Zainab, Nazlia Omar & Sabrina Tiun. 2023. Twitter Hate Speech Detection: A Systematic Review of Methods, Taxonomy Analysis, Challenges, and Opportunities. IEEE Access 11. 16226–16249. DOI: 10.1109/ACCESS.2023.3239375.10.1109/ACCESS.2023.3239375.Search in Google Scholar

Munasinghe Sidath & Uthayasanker Thayasivam. 2022. A Deep Learning Ensemble Hate Speech Detection Approach for Sinhala Tweets. Moratuwa Engineering Research Conference (MERCon), 1–6. DOI: 10.1109/MERCon55799.2022.9906232.10.1109/MERCon55799.2022.9906232.Search in Google Scholar

Mussiraliyeva, Shynar, Batyrkhan Omarov, Paul Yoo & Milana Bolatbek. 2022. Applying machine learning techniques for religious extremism detection on online user contents, Computers, Materials & Continua 70(1). 915–934. DOI: https://doi.org/10.32604/cmc.2022.019189.10.32604/cmc.2022.019189Search in Google Scholar

Okechukwu, Chukwuemeka, Idris Ismaila, Joseph Ojeniyi, Morufu Olalere & Olawale Surajudeen Adebayo. 2023. Hate and Offensive Speech Detection Using Term Frequency-Inverse Document Frequency (TF-IDF) and Majority Voting Ensemble Machine Learning Algorithms. 4th International Engineering Conference (IEC 2023), Federal University of Technology, Minna, Nigeria. Available at: http://repository.futminna.edu.ng:8080/jspui/handle/123456789/18492 (accessed 20 September 2023).Search in Google Scholar

Zampieri, Nicolas, Carlos Ramisch, Irina Illina & Dominique Fohr. 2022. Identification of Multiword Expressions in Tweets for Hate Speech Detection. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, 202–210, Marseille, France: European Language Resources Association.Search in Google Scholar

Published Online: 2023-12-12

Published in Print: 2023-12-15

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/lpp-2023-0020

Keywords for this article

web-resources; social network; cybersecurity; extremism; text classification