Home Offensive language in user-generated comments in Lithuanian
Article
Licensed
Unlicensed Requires Authentication

Offensive language in user-generated comments in Lithuanian

  • Giedrė Valūnaitė-Oleškevičienė

    Giedrė Valūnaitė Oleškevičienė is a Vice-Dean for Scientific Research of the Faculty of Public Governance and Business and a professor at the Institute of Humanities, Mykolas Romeris University. Her scientific interests in humanities include discourse analysis, professional English, legal English, linguistics and translation research, while in the domain of social sciences, her scientific interests include social research methodology, modern education, philosophical issues, creativity development in modern education system, and second language teaching and learning. The researcher coordinated international research projects funded by the EU, publishes scientific articles, participates as a presenter in scientific conferences.

    ORCID logo EMAIL logo
    , Linas Selmistraitis

    Linas Selmistraitis has over 24 years of experience in higher education, specifically in developing and implementing quality assurance systems for higher educational institutions. He earned his PhD in the Humanities. Currently, Professor Dr. Linas Selmistraitis holds the position of Vice-Dean for Studies at the Faculty of Human and Social Studies at Mykolas Romeris University and the position of Professor at the Institute of Humanities at Mykolas Romeris University. His interests in research are semantics, morphology, cognitive linguistics, corpus linguistics. He publishes research articles and gives presentations at conferences.

    ORCID logo
    , Andrius Utka

    Andrius Utka is an associate professor at the department of Lithuanian studies and a senior researcher at the Institute of Digital Resources and Interdisciplinary Research (SITTI), Vytautas Magnus University (Kaunas). He defended the doctoral dissertation Statistical Identification of Text Functions in 2004 (VMU, Kaunas). He was the head of Centre of Computation Linguistics in 2010-2022. He coordinated a number of national and international research projects. His research interests: statistical text analysis, language resources, computer-assisted translation, automatic summarisation, terminology extraction, and the language of disinformation.

    ORCID logo
    and Dangis Gudelis

    Dangis Gudelis is a professor at Mykolas Romeris University, specializing in public administration and governance. He earned his PhD in Social Sciences, focusing on performance measurement in Lithuanian municipalities. Gudelis has led and contributed to various national and international research projects, particularly in public governance and public policy. His current research interests include applications of big data and AI technologies in the public sector. He is a prolific writer, with numerous publications in scientific journals and presentations at conferences. He teaches courses at both undergraduate and graduate levels. Additionally, he has played a role in policy analysis and consultancy, advising governmental and non-governmental organizations on strategic development and public sector innovation.

    ORCID logo
Published/Copyright: December 12, 2023

Abstract

The aim of the current research is to investigate the feasibility of identifying offensive language in Lithuanian by utilising the Simplified Offensive Language Taxonomy (SOLT). The key principle behind this taxonomy is its ability to complement existing offensive language ontologies and tagset systems, with the ultimate goal of integrating it into publicly accessible Linguistic Linked Open Data (LLOD) resources. The dataset used in the current study is a publicly available corpus of user-generated comments collected from a Lithuanian portal (Amilevičius et al. 2016). The study identified that offensive language predominantly focuses on collective derogatory language rather than individuals. The most common category of offensive language is related to physical and mental disabilities, followed by ideological offenses, xenophobic and sexist remarks, and less frequent categories like ageism, classism, homophobia, and religious discrimination. These results highlight the diverse range of offensive language online and underscore the need to combat discrimination and promote respectful discourse, particularly concerning marginalised groups.

About the authors

Giedrė Valūnaitė-Oleškevičienė

Giedrė Valūnaitė Oleškevičienė is a Vice-Dean for Scientific Research of the Faculty of Public Governance and Business and a professor at the Institute of Humanities, Mykolas Romeris University. Her scientific interests in humanities include discourse analysis, professional English, legal English, linguistics and translation research, while in the domain of social sciences, her scientific interests include social research methodology, modern education, philosophical issues, creativity development in modern education system, and second language teaching and learning. The researcher coordinated international research projects funded by the EU, publishes scientific articles, participates as a presenter in scientific conferences.

Linas Selmistraitis

Linas Selmistraitis has over 24 years of experience in higher education, specifically in developing and implementing quality assurance systems for higher educational institutions. He earned his PhD in the Humanities. Currently, Professor Dr. Linas Selmistraitis holds the position of Vice-Dean for Studies at the Faculty of Human and Social Studies at Mykolas Romeris University and the position of Professor at the Institute of Humanities at Mykolas Romeris University. His interests in research are semantics, morphology, cognitive linguistics, corpus linguistics. He publishes research articles and gives presentations at conferences.

Andrius Utka

Andrius Utka is an associate professor at the department of Lithuanian studies and a senior researcher at the Institute of Digital Resources and Interdisciplinary Research (SITTI), Vytautas Magnus University (Kaunas). He defended the doctoral dissertation Statistical Identification of Text Functions in 2004 (VMU, Kaunas). He was the head of Centre of Computation Linguistics in 2010-2022. He coordinated a number of national and international research projects. His research interests: statistical text analysis, language resources, computer-assisted translation, automatic summarisation, terminology extraction, and the language of disinformation.

Dangis Gudelis

Dangis Gudelis is a professor at Mykolas Romeris University, specializing in public administration and governance. He earned his PhD in Social Sciences, focusing on performance measurement in Lithuanian municipalities. Gudelis has led and contributed to various national and international research projects, particularly in public governance and public policy. His current research interests include applications of big data and AI technologies in the public sector. He is a prolific writer, with numerous publications in scientific journals and presentations at conferences. He teaches courses at both undergraduate and graduate levels. Additionally, he has played a role in policy analysis and consultancy, advising governmental and non-governmental organizations on strategic development and public sector innovation.

References

Barrow, Robin. 2005. On the duty of not taking offence. Journal of Moral Education 34(3). 265–275.10.1080/03057240500211600Search in Google Scholar

Basile, Valerio, Cristina, Bosco, Elisabetta, Fersini, Debora, Nozza, Viviana, Patti, Manuel Francisco, Rangel Pardo, Paolo Rosso & Manuela Sanguinettiet. 2019. Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter. In Proceedings of the 13th international workshop on semantic evaluation, 54–63. Minneapolis: Association for Computational Linguistics.10.18653/v1/S19-2007Search in Google Scholar

Bassignana, Elisa, Valerio Basile & Viviana Patti. 2018. Hurtlex: A multilingual lexicon of words to hurt. In CEUR workshop proceedings. Vol. 2253. CEUR-WS, 1–6. Torino: Academia University Press.10.4000/books.aaccademia.3085Search in Google Scholar

Culpeper, Jonathan. 2011. Impoliteness: Using language to cause offence. Vol. 28. Cambridge: Cambridge University Press.10.1017/CBO9780511975752Search in Google Scholar

Culpeper, Jonathan. 2016. Impoliteness strategies. In Alessandro Capone & Jacob Mey (eds.), Interdisciplinary studies in pragmatics, culture and society, 421–445. Cham: Springer.10.1007/978-3-319-12616-6_16Search in Google Scholar

Durant, Alan. 2010. Meaning in the media: Discourse, controversy and debate. Cambridge: Cambridge University Press.10.1017/CBO9780511810848Search in Google Scholar

Gomez, Raul, Jaume, Gibert, Lluis Gomez & Dimosthenis Karatzas. 2020. Exploring hate speech detection in multimodal publications. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, 1470–1478. Ithaca: Cornell University.10.1109/WACV45572.2020.9093414Search in Google Scholar

Günthner, Susanne. 1995. Exemplary stories: the cooperative construction of moral indignation. VS 70–71. 147–175.Search in Google Scholar

Hatzis, Nicholas. 2021. Offensive speech, religion, and the limits of the law. Oxford: Oxford University Press.10.1093/oso/9780198758440.001.0001Search in Google Scholar

Haugh, Michael & Valeria Sinkevičiūtė. 2019. Offence and conflict talk. In Matthew Evans, Lesley Jeffries & Jim O'Driscoll (eds.), Routledge handbook of language in conflict, 196–214. London: Routledge.10.4324/9780429058011-12Search in Google Scholar

Lewandowska-Tomaszczyk, Barbara, Slavko, Žitnik, Anna, Bączkowska, Chaya, Liebeskind, Jelena Mitrović & Giedrė Valūnaitė Oleškevičienė. 2021. LOD-connected offensive language ontology and tagset enrichment. In Sara Carvalho & Renato Rocha Souza (eds.), Proceedings of the workshops and tutorials held at LDK 2021 co-located with the 3rd Language, Data and Knowledge Conference, 135-150. CEUR Workshop Proceedings. Warden: Dagstuhl Publishing.Search in Google Scholar

Lewandowska-Tomaszczyk, Barbara. 2022. A simplified taxonomy of offensive language (SOL) for computational applications. Konin Language Studies 10. 213–227.Search in Google Scholar

Liebeskind Chaya & Shmuel Liebeskind. 2018. Identifying abusive comments in Hebrew Facebook. In 2018 IEEE International conference on the science of electrical engineering in Israel (ICSEE), 1–5.10.1109/ICSEE.2018.8646190Search in Google Scholar

Liu, Ping, Li Wen & Zou Liang. 2019. NULI at SemEval-2019 task 6: Transfer learning for offensive language detection using bidirectional transformers. In Proceedings of the 13th international workshop on semantic evaluation, 87–91. Minnesota: Association for Computational Linguistics.10.18653/v1/S19-2011Search in Google Scholar

Mitrović, Jelena, Bastian Birkeneder & Michael Granitzer. 2019. nlpUP at semeval-2019 task 6: A deep neural language model for offensive language detection. In Proceedings of the 13th international workshop on semantic evaluation, 722–726. Minnesota: Association for Computational Linguistics.10.18653/v1/S19-2127Search in Google Scholar

Moulinou, Iphigenia. 2014. Striving to make the difference: Linguistic devices of moral indignation. Journal of Language Aggression and Conflict 2(1). 74–98.10.1075/jlac.2.1.03mouSearch in Google Scholar

O'Driscoll, Jim. 2020. Offensive language: Taboo, offence and social Control. London: Bloomsbury.10.5040/9781350169708Search in Google Scholar

Zesis, Pitenis, Marcos Zampieri & Tharindu Ranasinghe. 2020. Offensive language identification in Greek. In Proceedings of the twelfth language resources and evaluation conference, 5113–5119. Marseille, France: European Language Resources Association.Search in Google Scholar

Qian, Jing, Anna, Bethke, Yinyin, Liu, Elizabeth Belding & William Yang Wanget. 2019. A benchmark dataset for learning to intervene in online hate speech. arXiv preprint arXiv:1909.04251.10.18653/v1/D19-1482Search in Google Scholar

Risch, Julian, Robin Ruff & Ralf Krestel. 2020. Offensive language detection explained. In Proceedings of the second workshop on trolling, aggression and cyberbullying, 137–143. Marseille, France: European Language Resources Association (ELRA).Search in Google Scholar

Ruzaitė, Jūratė. 2018. In search of hate speech in Lithuanian public discourse: A corpus-assisted analysis of online comments. Lodz Papers in Pragmatics 14(1). 93–116.10.1515/lpp-2018-0005Search in Google Scholar

Stollznow, Karen. 2020. On the Offensive: Prejudice in language past and present. Cambridge: Cambridge University Press.10.1017/9781108866637Search in Google Scholar

Swamy, Steve, Anupam Jamatia Durairaj & Björn Gambäck. 2019. Studying generalisability across abusive language detection datasets. In Proceedings of the 23rd conference on computational natural language learning (CoNLL), 940–950. Hong Kong: Association for Computational Linguistics.10.18653/v1/K19-1088Search in Google Scholar

Zampieri, Marcos, Shervin, Malmasi, Preslav, Nakov, Sara, Rosenthal, Noura Farra & Ritesh Kumar. 2019. Semeval-2019 task 6: Identifying and categorizing offensive language in social media (OffensEval). arXiv preprint arXiv:1903.08983.10.18653/v1/S19-2010Search in Google Scholar

Zampieri, Marcos, Preslav, Nakov, Sara, Rosenthal, Pepa, Karadzhov, Georgi, Atanasova, Hamdy, Mubarak, Leon Derczynski, Zeses Pitenis & Çağrı Çöltekin. 2020. SemEval-2020 task 12: Multilingual offensive language identification in social media (OffensEval 2020). arXiv preprint arXiv:2006.07235.10.18653/v1/2020.semeval-1.188Search in Google Scholar

Wulczyn, Ellery, Nithum Thain & Lucas Dixon. 2017. Ex machina: Personal attacks seen at scale. In Proceedings of the 26th international conference on World Wide Web, 1391–1399. Ithaca: Cornell University.10.1145/3038912.3052591Search in Google Scholar

Published Online: 2023-12-12
Published in Print: 2023-12-15

© 2023 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 12.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/lpp-2023-0013/html
Scroll to top button