An Unsupervised Learning Study on International Media Responses Bias to the War in Ukraine

Qinghao Guan; Melanie Nicole Lawi

doi:10.1515/csh-2023-0010

Article Open Access

An Unsupervised Learning Study on International Media Responses Bias to the War in Ukraine

Published/Copyright: December 21, 2023

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Corpus-based Studies across Humanities Volume 1 Issue 1

Abstract

Newspapers, as an important social media, is considered to be full of biased opinions. Whether newspapers in neutral state are neutral seems an interesting question. This research uses the topic modeling approach to probe into the aforementioned question on the basis of the Russian–Ukraine War. Comprehensively, we fully considered the results derived from LDA and Mallet and found that America and Switzerland reported more about their respective responses to the invasion and the countries involved in the war, whereas China tended to focus more on their country, negotiations and the effect on their citizens. Our results support the notion that international relations between countries affect the way that the media of the respective countries writes about each other. Further research could be on the larger datasets for improvement of comparability.

Keywords: topic modeling; social media bias; Ukraine war; international relations

1 Introduction

When historical events take place the media of the time will be there to cover the events as they transpire, therefore media published during such momentous event can offer researchers an interesting insight into the public perception of specific event. Furthermore, by gathering a large sample of the media published during a certain period of time stemming from the same country can provide information about general attitudes present in the respective country as well as give an indication of international relations. This notion that media responses and international relations are closely linked to each other is supported by Savrum’s and Miller’s paper that demonstrated “the impact that communication technology has on global affairs” and by Coban’s paper that showed the interdependence of the media and politics (Coban 2016; Savrum and Miller 2015).

In this research paper the focus is placed on a recent historical event namely the war waging in Ukraine commenced by the Russian full-scale attack on the country. The precise aim of this paper is the investigation of different newspaper responses to this event from different countries; those being China, Switzerland, and the USA. These countries were selected due to their different relationships with Russia, from being more friendly, to relatively neutral, to having a complex and rather strained relationship.

Previous research on different international media responses that focused on the topic of “changing business of news” found that the same type of media, such as newspapers, respond in a similar manner and that the responses across different countries were relatively similar (Cornia, Sehl, and Kleis Nielsen 2019). They observed that organizations within the same medium, newspapers for example, react similarly to changes and these reactions are surprisingly similar across different countries. Inspired by the previous study, this paper focuses on one type of news media, news articles available online, and also attempts to investigate whether the findings of Cornia’s paper, which is that responses from the same type of media and that countries responses do not vary significantly, also apply to the articles written about the war in Ukraine.

Cornia’s paper selected six countries (Finland, France, Germany, Italy, Poland, and the UK) to argue the reactions of legacy media organizations. To the best of our knowledge, these countries are in Europe thus potentially their cultures are similar. Due to the discrepancy between Swiss, Chinese and American cultures and their totally different relationships with Russia, we anticipate that their media response will differ in the topics they address and how they talk about the war in Ukraine. This research hypothesis will be tested using topic modeling and by comparing the word counts of differently connoted synonyms for the word war.

2 Previous Research

2.1 Topic Modeling and Social Science Studies

One of the most critical goals of data analytics is determining the characteristics that data points share (Vayansky and Kumar 2020), which is what topic modeling comes into play. Thus, topic modeling has been a popular method in the field of education, sociology, linguistics and so forth. It helps researchers to understand topics in large collections of unstructured texts. Basically, the main data source of topic modeling in social science is social media, including Twitter, news articles, etc. Based on social media, researchers are able to investigate international relations (Mishler et al. 2015), health (Sidana et al. 2018), and sentiment (Biktimirov et al. 2021). Meanwhile, on account of the particularity of texts, researchers proposed plenty of revised models, such as Topic Flow Model (Churchill et al. 2018) which can track temporal change of topics, sentiment topic models (Rao et al. 2014) that are capable of digging into the emotion of texts, visual topic model (Zhou, Liang, and Du 2012) that maps visual data to text topics, and Microblog Sentiment Topic Model (Ahuja, Wei, and Carley 2016) which especially suitable for analyzing short-length texts like tweets.

2.2 Studies in the Russia–Ukraine War

The bulk of articles focus on the huge changes that the war brought. Orhan (2022) comments that this war has given rise to a severe humanitarian catastrophe and endangered the stability of international geopolitical connections for the reason that this war causes a large number of people besieged and displaced. Given Russia and Ukraine are major producers of necessities, such as food, minerals and energy, the disruptions of supply chains and exports rose the costs for the global economy (Liadze et al. 2023) which resulted in a global rise in prices including the oil prize and gas prize (Appiah-Otoo 2023; Mbah and Wasum 2022). The disruption of supply chains as well as resource scarcity led to higher costs for obtaining these resources, which were passed on to consumers in the form of higher prices and became inflation influencing stock prizes (Deng et al. 2022). Additionally, some research notice the food security, such as Behnassi and El Haiba (2022) highlight that armed conflicts incurred prohibitively high cost for food causing adverse impacts for nations reliant on food imports and those with low-income populations.

Some scholars have shown interest in the role of social media in this war. Geissler et al. (2022) found robust support for a Russian propaganda campaign that manipulated beliefs and behaviors according to the propagandists’ interests. Aside from the power of social media, fake news and false information were an important focus by researchers, such as the proposal of Russia–Ukraine conflict hate speech datasets (Thapa et al. 2022), automatic fake information detection system (La Gatta et al. 2023), and so forth. Although the proliferation of discussion about the effects of this war on fake news attests to its significance, previous research ignores the potential influences on the online atmosphere. Heed the gap our study intends to bridge is the discussion of differences and potential guidance in social media and the recognition of biases existing within discourses.

3 Data

Since the corpora do not share the same language and thus are more challenging to compare, measures were taken to increase the comparability of the corpora. The primary measure that was taken in the data collection of the corpora was a shared time span between the three corpora. The aforementioned time frame is from 24th February 2022, the date when Russia launched the full-scale attack on Ukraine, to 20th March 2022.

3.1 American

The compilation of the American data was done manually using random sampling. The articles from the American corpus are from the three news publishing sites Politico, Washington Post and CNN. These three publishers were chosen because they are big companies, whose articles are read by a large amount of the US population and thus better reflect the media response of the nation. Politico was selected because they have a neutral reporting style and publish longer articles that cover topics concerning politics. Washington Post was selected to the similarly more objective reporting style, which aids in the process of obtaining less biased data and therefore a more representative result. The final news outlet that was chosen is CNN because according to YouGov (YouGov) it is in the top 10 of the most popular news websites in America. Furthermore, the selection of CNN was influenced by the factor of accessibility since the articles are published for free online and thus accessible to anybody that is able to use the internet, whereas another popular news outlet such as the NY Times requires a paid subscription to access their full articles.

The random sampling was done by filtering for a specific date within the predefined date range and by searching for the key words “Ukraine”, “war”, “invasion” and “Russia”. Afterward, for Politico and Washington Post either all articles were compiled in the corpus if there were few results, or if there were many results two to three articles were chosen randomly and added to the corpus. In contrast, the CNN data for the dates between the 16th and the 20th March was compiled by including all available articles in the corpus that were found, which may result in the overrepresentation of certain words and topics that were primarily discussed at that time, thus potentially skewing the output. The data for the CNN part of the corpus is limited to this short time span due to the amount of manual labor and time constraints. Nevertheless, since the Politico and Washington Post data is more balanced, the effect might not be as substantial. Moreover, since the articles from CNN are towards the end of the predefined range, topics discussed at the beginning of the war in Ukraine might be discussed as well. To ensure, the CNN data did not skew the topic modeling output two versions of the corpus were created; one including the CNN data and one without, and topic modeling was performed on both sets. The corpus containing the CNN data was 277,777 words long, and the corpus without contained 180,042 words.

3.2 Chinese

Chinese data are extracted from Renmin (People’s Daily), which is the “microphone” for the Chinese government and the Communist Party of China as well as a pivotal “window” of international communication. Given there are two channels by which Renmin spreads news, namely the Renmin website and Renmin in WeChat Subscription, we collected the Chinese data manually from these two platforms and eliminated the reduplicative data. Eventually, the size of the Chinese dataset is 351,389 characters from 24th February to 20th March 2022. In order to make it easy to be detected by Google Colab, different from America and Swiss data, Chinese data was input into an Excel file where each row is a piece of news. Then, the Excel file was converted into a CSV file.

3.3 Swiss

This corpus contains articles from various news outlets across Switzerland. These articles were newspaper articles as well as website articles. We use the Swissdox database by LiRI,^[1] which contains 260 sources easily accessible to researchers. Since the content of the articles overlapped occasionally due to the fact that they were either published by different newspapers, which publish the same articles but for different cantons, or they were both printed and released online, duplicate values were removed from the corpus together with the corresponding metadata. After the removal of the duplicate values 77 news publication sites and 7516 articles remained in the final version (Figure 1). This amounted to 3,813,007 words in total. The article sources ranking among the top eight in terms of quantity can be seen in Table 1, with the corresponding codes and full names derived from Swissdox.^[2]

Figure 1:

News sources in Swiss German corpus.

Table 1:

Newspaper sources in top ranks.

Source	Code	Number	Source	Code	Number
cash.ch	CASO	355	tagesanzeiger.ch	NNTA	193
blick.ch	BLIO	273	landbote.ch	LBO	188
srf.ch	SRF	270	derbund.ch	NNBU	183
Neue Zürcher Zeitung	NZZ	253	zuonline.ch	ZHUO	182

To facilitate the analysis of this corpus by both LDA and Mallet the encoding was converted into UTF-8, which is capable of decoding the German characters. Moreover, since these programs experienced issues while reading in the German umlauts, the umlauts were converted into their corresponding two letter version, for instance ö was changed to oe. Furthermore, all links and formatting tags were removed because it distorted the results since these two elements occurred frequently. Finally, the unneeded metadata was removed and the excel file was converted into a text file because it was less difficult for LDA and Mallet to read in.

We acknowledge that the corpus sizes from the media of Switzerland, America, and China are not balanced, with the corpus of Switzerland significantly exceeding the others. Due to the relatively singular nature and higher homogeneity of Chinese news sources, achieving the same level of diversity as Swiss and American media content is challenging. However, because of this singular source, we boldly hypothesize that, once the quantity of news reaches a certain level, the size of news articles does not significantly affect the diversity of topics and from unbalanced data, distinction in topics between different countries can be extracted. In detail, both the Chinese and English datasets have exceeded 270,000 words, providing us with the grounds to believe that the topics we uncover are meaningful. Of course, this is also a direction for our future exploration – whether the quantity of the corpus affects the differences in news topics.

3.4 Stop Word Lists

Stop words do not contain any clear meaning, which makes labeling a topic more challenging. The removal of stop words is essential because without this step the topics will not be clearly defined since these words occur frequently and thus likely will belong to several topics.

3.4.1 American

For the stop word list for the English words the Mallet stop word list was expanded upon using more extensive lists that were previously compiled and made available on GitHub.^[3] ^, ^[4] Furthermore, unique stop words corresponding to the topic, such as “Russia” and “Russian” were added to the list.

3.4.2 Chinese

Fundamentally, Chinese stopwords are from stopwords of the Harbin Institute of Technology, stopwords of Sichuan University Machine Learning Intelligence Laboratory and stopwords of Baidu. The fundamental stopwords are 1598 words in total. Then, in terms of the particularity of our texts, we added 54 words, including the source of newspaper, as 通讯社 (news service); temporal words, as 上午 (morning); reporters, as 刘亚南 (Liu Yanan). Besides, the name of counties, as 俄罗斯 (Russia) and name of Russian and Ukraine presidents including 普京 (Putin) and 泽连斯基 (Zelensky) are viewed as stopwords. Finally, there are some incomplete words because of tokenization mistakes. These words affect the result thus we extracted these words as stopwords. For example, “连斯基” is part of “泽连斯基 (Zelensky)”. The extended stopwords list is shown as follows (Table 2).

Table 2:

Extended Chinese stopwords.

Type	Stopwords
News agency	通讯社, 塔斯社, 新华社, 新闻网, 日电
Time	周一, 周二, 周三, 周四, 周五, 周六, 周日, 一月, 二月, 三月, 四月, 五月, 六月, 七月, 八月, 九月, 十月, 十一月, 十二月, 上午, 下午, 中午, 早上, 当天, 上次, 下次, 今日, 晚, 曾于
Function words	已有, 因为, 总共
Names	普京, 俄罗斯, 乌克兰, 泽连斯基, 刘亚南, 熊思浩
Country names	中方, 美方, 俄方, 乌方, 美国, 中国
Others	乌军, 俄军, 快讯, 报道

3.4.3 Swiss

Similarly, to the American list, the German stop words were compiled by using the Mallet list as the base and expanding upon the list by adding words from GitHub.^[5] Additionally, the unique words corresponding to the research topic were added as well when they were overrepresented.

4 Methods

4.1 LDA

LDA, namely Latent Dirichlet allocation, proposed by Blei et al. (2003), became a standard technique in topic modeling by ways of automatically discovering topics hidden in documents (Rohani Shayaa, and Babanejaddehaki 2016).

The mechanism of LDA is as follows. The alpha and beta over there are Dirichlet distribution, and the theta and phi are multinominal distributions. Based on the multinominal distributions, we create a bunch of topics and a large number of words (Figure 2).

Figure 2:

Mechanism of LDA model.

P W , Z , θ , φ ; α , β = ∏ j = 1 M P ( θ j ; a ) ∏ i = 1 K P ( φ i ; β ) ∏ t = 1 N P Z j , t | θ j P W j , t | φ z j , t

Figure 3:

Comparison of terms: war, invasion, conflict and crisis. These three pie charts are from Swiss data, American data, and Chinese data from the left to the right respectively.

On the left of the calculation, we have the probability that a document will appear because this machine could potentially spit out any document we want with tiny probability. On the right, we have four factors, the first two are Dirichlet distribution and the last two are the multinominal distributions, and each one comes with a probability. When we multiply these probabilities, we get the probability of the generated article. When the article is similar to the real document, we derive the topics and words we need.

4.2 Mallet

MALLET, the Machine Learning for Language Toolkit, is a product of the University of Massachusetts Amherst (Barde and Bainwad 2017). Basically, Mallet is overlapped with LDA and their difference can be seen in speed. Specifically, LDA uses the Variational Bayes sampling method which is fast but the precision is low to some extent. Additionally, Mallet supports English alphabet only thus the Chinese data were processed by LDA.

For this paper Mallet was set to 5000 iterations for the American data and 6000 iterations for the Swiss data due to the larger corpus. Furthermore, for the American data 10 topics were set to be determined to be able to better compare the LDA and Mallet results, whereas for the Swiss data 15 topics were set as the output, to minimize the size of the mega topic that was created during several attempt of running the topic model. The optimization interval was set to 1000 and the optimization burn in was set to 100. These settings produced the best results.

4.3 Comparison of Words Concerning War

For the final method of data analysis, the comparison of key words concerning war were decided upon. More precisely, the word “war” (“战争” in Chinese and “Krieg” in German) and the three synonymous words “invasion” (“入侵” in Chinese and “Invasion” in German), “conflict” (“冲突” in Chinese and “Konflikt” in German) and “crisis” (“危机” in Chinese and “Krise” in German) were designated to be evaluated by investigating their proportions to each other. These words were selected because they have different connotations. To elaborate, “war” and “invasion” are more negatively connoted and imply the existence of an aggressor, whereas “conflict” and “crisis” are terms that are used in a more neutral fashion and contrary to the words “war” and “invasion” there is no aggressor.

To generate a diagram demonstrating the proportions of these terms to each other the word count for each word was calculated using grep in the terminal for each respective nation. Subsequently, these results were entered into an Excel table and a pie chart was generated for the results for each country. The method of comparing pie charts and proportions of words to each other was determined to be the most suitable since the corpora were not the same length and this method allows for the processing of each corpus separately.

5 Results

5.1 American

These two tables (see Appendinces A and B) show that the topics with and without the CNN data are similar and some topics such as topics (Specific) Attacks on Cities and Refugees even contain some of the same words. One major difference is the disappearance of the topic Russia versus Ukraine, which appears to be split into two topics (Russia, Ukraine) in the topic model that contains the CNN data. Moreover, the topics China and Space are one topic in the topic model including the CNN data, whereas the other topic model created a separate topic for China and combined Space with Nuclear Power.

The topics generated fall into different categories that can be labeled as events in Ukraine, refugees, military support, technology, China, misinformation, and internal affairs. The topic of technology can be explained by the fact that there was fear of the use of nuclear weapons due to the fact that both the US and Russia possess nuclear weapons. Moreover, Russia launched an attack near a nuclear power plant and the topic might therefore concern this specific event. China was discussed by the US media because they are considered an ally of Russia and other nations were curious about how they would respond to the attack. The topic of misinformation occurred due to the fact that the Russian media claimed that they were justified in their attack due to various reasons which many other nations deemed to be invalid claims. Furthermore, social media was used to spread Russian propaganda and thus many social media companies occur in this topic. The topic of internal affairs occurs because the two houses and the president had to discuss their response to the war in Ukraine and thus US politics are mentioned in several articles. Many of these categories only contain one topic. However, this is notable because it shows that the American media covered many different facets of the war, both focusing on internal affairs and their response to the war, as well as the events in the Ukraine, and other countries’ responses.

5.2 Chinese

After fine-tunning, we found that, when the topic number is 8, the results were meaningful, which presented the main topics behind unstructured Chinese data. Topic 0 was “attitude of the Chinese Foreign Ministry spokesperson”, where the attitude of the Chinese government is clear. For instance, “正告” (warn) is an official word internationally and the target who was warned by the spokesperson was America and Japan. Topic 1 was “oil and gas”. It makes sense because the price of oil and gas increased as the war goes on. Topic 2 was “negotiation” such as “戈梅利” (Gomel) which is the place where Russia negotiated with Ukraine. Topic 3 was a story in which a fellow from Taiwan was happy because he was able to get on a private plane arranged by Chinese government and go back to China. Topic 4 was “economy”, such as Tokyo Stock Exchange. Topic 5 was “safety of Chinese people”. Chinese government reminded Chinese people of the danger in Ukraine. Topic 6 was “international involvement”. Many countries and organizations hoped to promote the negotiation between Ukraine and Russia. Topic 7 was “evacuation of Chinese citizens”.

Even the results were significant, some problems are revealed. First of all, the tokenization make some mistakes, such as “米科拉·波沃罗兹尼克 (Mykola Povoroznyk)” is an officer in Kyiv, but “尼克” was recognized as a word. Another example is “架接” in Topic 7, which is not a word in Chinese. Besides, some content words have no meaning, such as “之情 (the feeling of)”.

5.3 Swiss

The results generated by the model can be seen in Appendix D. In close inspection, we found that German topics are not very significant. Only four topics are meaningful. Topic 2 was titled as “Swiss response”, Topic 3 was given “weapon”, Topic 5 was “refugee and aid” and Topic 8 was “military”. Generally, the results are not as clear as the ones from the American data set. This might be due to the larger data set that also contained articles that mention other topics apart from the war in Ukraine.

5.4 Comparison of Words Concerning War

The pie charts (Figure 3) display that there is a difference between the nouns China uses to discuss the war and the words Switzerland and the USA use to talk about the situation. There is a tendency for China to use more neutral words. The frequency for the terms conflict and crisis is higher than the frequency of the terms war and invasion. These results are in opposition to the results of the American and Swiss data, where the more negatively connoted words war and invasion are favored over more neutral words to talk about the events happening in Ukraine. The Swiss and the American pie chart are similar, and there are only minor differences, such as the proportion of invasion in the American articles being slightly higher than the one from the Swiss articles.

6 Discussion

For the American data, both the LDA and Mallet topic modeling produced similar topics. Nevertheless, the Mallet topics were more conclusive. All American topic models showed that the USA focused on various aspects of the war, apart from talking about the event taking place in Ukraine, the American articles also wrote about the US response to the war and other nations’ responses in the Mallet topic models. The LDA topic model showed that the US seemed to place a heavier focus on the events in the Ukraine. Nevertheless, the input data for the American articles would have to be expanded upon by both adding more articles from the news sites used by this paper and by adding articles from other publications with different political affiliations to get a more representative corpus of the American media’s response to the war in Ukraine.

The Chinese data could not be processed by Mallet, but the topic model created by LDA demonstrated that the topics discussed by the Chinese media focused more on internal affairs. Further research would be needed to be able to process Chinese articles in Mallet, which could provide new insights into the data like it did for the American articles. When the Chinese news articles reported about the countries involved in the war or the war the topics tended to primarily concern negotiations and the evacuation of people. This is contrary to the American media that mentioned the military in multiple topics and thus appear more aggressive and actively involved than China in their response to the war.

As to the Swiss data, we observed that topics are not very significant, with only four topics holding significance, namely Swiss response, refugee and aid, weapon and military (see Appendix D). This failure to provide clear topics can be attributed to the fact that some of the articles contained information that did not concern the topic researched by this project, which means a manner to filter out irrelevant content would be needed if this research were to be repeated and be made more representative of the Swiss media’s response to the war.

In general, one aspect that could be improved upon is the comparability of the corpora. This issue could be mended by having the sizes of the corpora match. Nonetheless, this solution would not entirely resolve the issue since it does not account for the content of the articles and the differences in language.

Despite these issues, these results still provided interesting insights into the manner in which media from different countries report about the same event. For example, the comparison of the American and Chinese topic models demonstrates that China focuses more on their internal affairs than the other two countries did, whereas the American articles also focus on their internal affairs, but appeared to emphasized the events happening in Ukraine and more active international responses to the conflict. Moreover, the topics that mentioned internal affairs often mentioned the military, which also concerns other countries and demonstrates that they were more aggressive in their response to the war than China who had negotiations as one of their topics.

Comparing the word counts of the selected words was an effective method for this type of multilingual research because the selected words could be clearly translated for the three different languages used in this study, However, this method could be problematic if the translation is ambiguous or impossible.

The results from this method showed that China used different words for reporting about the war than Switzerland and America. Moreover, China displayed a preference for more neutral words that did not imply a clear aggressor. This may be due to the unique political situation of China where they both do not want to fall out of favor with Russia, nor the USA, and by using less accusatory words they can remain in a more neutral position that will not directly offend either of the countries that they are attempting to maintain a friendly relationship with (Wagner, Bornmann, and Leydesdorff 2015; Wishnick 2001).

The slightly increased use of the word invasion by the US might be due to the fact that opposed to Switzerland, who is considered a neutral country, the USA has a more strained relationship with Russia due to events such as the Cold War and their prior involvement in the Ukraine and Russia’s relationship (Ditrych 2014). Therefore, they may be less afraid to offend Russia by labeling them as the aggressor.

7 Conclusions

In conclusion, the research supported the initial hypothesis that there is a difference between the manner in which China, America and Switzerland report about the war in Ukraine. The topic modeling data showed that for the topics that could be labeled, America and Switzerland, although for Switzerland the results were less definitive, reported more about their respective responses to the invasion and the countries involved in the war, whereas China tended to focus more on their country, negotiations and the effect on their citizens. Furthermore, the results from the pie charts comparing the words that can be used for war demonstrated that both America and Switzerland preferred the utilization of words that imply an aggressor and a victim, such as war and invasion. In contrast, China favored more “neutral” terms that do not insinuate that a guilty party exists, such as conflict and crisis. Our results support the notion that international relations between countries affect the way that the media of the respective countries writes about each other.

This paper demonstrated how topic modeling and the comparison of select key words can aid in the process of discerning international relations and the interpretation of a large amount of news articles. Moreover, this paper shows how the chosen methods of topic modeling and key word comparison serve as a useful tool for multilingual research.

The amount of data for the manually composed corpora was not as large as that of the Swiss corpus, which offers the option of expanding upon said data and redoing the analysis for a future project. Moreover, although the methods that can be used for a multilingual comparison of digital corpora are limited, perhaps other researchers are able to find other suitable methods of comparing the data, which would offer new insights into the topic that could be interesting for both the field of computational linguistics as well as the study of international relations. Therefore, our research project provides several possibilities for expansion and future research.

Corresponding author: Qinghao Guan, Department of Computational Linguistics, University of Zurich, Andreasstrasse 15, 8050, Zurich, Switzerland, E-mail: qinghao.guan@uzh.ch.

Acknowledgement

We extend our heartfelt gratitude to Titulary Professor Gerold Schneider, Leader of LiRI, University of Zurich, for his invaluable contribution to this work. His provision of the Swiss dataset and insightful suggestions have been instrumental in the fruition of this project, undertaken as part of his course “Text Analytics in the Digital Humanities”. His guidance was pivotal to the revisions of this paper, and for that, we are deeply thankful.

Appendix A: Topics for American data without CNN

Topic number	Weight	Topic terms
0	0.53005	city forces mariupol kyiv people civilians killed military ukraine’s shelling building attack officials defense cities hit attacks children capital ministry
		→(Specific) Attacks on Cities
1	0.65744	putin war president country world people it’s putin’s zelensky russia’s government vladimir west ukraine’s russians western sanctions time europe invasion
		→Russia versus Ukraine
2	0.12912	aircraft military petraeus it’s cold statement told great flight made American wrote hudson route airspace norway make marine putin’s rescue
		→Military air support
3	0.34458	china biden president chinese war call united moscow military white invasion beijing house support security states u.s foreign china’s conflict
		→China
4	0.10837	biden house trump republicans congress administration lawmakers president bill senate sen republican americans democrats biden’s joe bipartisan american young white
		→US Internal Politics
5	0.09554	nuclear plant space power international station chernobyl asylum european site nasa seekers reactors agency rogozin yellow plants cosmonauts make university
		→bold: Nuclear Power underlined: Space combination: scientific progress
6	0.39637	nato u.s weapons defense military official troops officials european forces allies zelenskyy poland air countries biden zone told support aid
		→Military Support
7	0.15243	oil sanctions u.s energy gas prices global invasion economy economic russia’s financial supply ban percent food imports trade companies bank
		→Economy
8	0.21947	people refugees ukrainians country war million children fled refugee family health border poland invasion home fleeing united told crisis days
		→Refugees
9	0.11774	media social companies invasion twitter facebook news information company access disinformation government russia’s platforms state service online propaganda tech kremlin
		→(Misinformation on) Social Media and the News

Appendix B: Topics for American data with CNN

Topic number	Weight	Topic terms
Topic 0	0.28709	putin putin’s trump world sovereignty west political people territorial state intelligence soviet regime international goal history years research separatist current
		→unclear potentially: Russia
Topic 1	0.34067	weapons u.s poland jets military defense planes air zelenskyy aid nato fighter zone no-fly european pentagon aircraft missiles sending systems
		→Military Air Support
Topic 2	0.22556	oil energy gas u.s sanctions prices biden ban house lawmakers percent congress financial bill economy imports global billion food sen
		→Economy
Topic 3	0.32026	people zelensky war kyiv time ukrainians days woman it’s family years times life country ago year-old zelenskyy home wanted video
		→unclear potentially: Ukraine
Topic 4	0.11574	china chinese china’s beijing moscow space economic international relationship export station years rogozin wang strategic nasa controls invasion spokesperson partnership
		→bold: China underlined: Space
Topic 5	0.16065	media companies social twitter information facebook news company government access kremlin invasion platforms disinformation tech internet service online global group
		→(Misinformation on) Social Media and the News
Topic 6	0.12816	refugees european people poland asylum e.u countries border refugee million seekers points ukrainians rejected fled war crisis reported polish fleeing
		→Refugees
Topic 7	0.21429	nuclear plant power military forces chernobyl fire site experts reactors control director fighting plants statement alert hudson energy general international
		→Nuclear Power
Topic 8	1.19708	war president u.s united putin nato biden invasion states foreign military security country week europe it’s sanctions conflict officials government
		→US Response to the War
Topic 9	0.46273	city forces kyiv people mariupol civilians cities official killed fighting capital defense military war country troops shelling children soldiers days
		→(Specific) Attacks on Cities

Appendix C: Topics for Chinese data

Topic number	Weight	Topic terms
Topic 0	0.39368	济南 (Jinan) 生产 (production) 国家 (nation) 兴趣 (interest) 公民 (citizens) 超越自我 (go beyond myself) 议会 (parliament) 正告 (warn) 最早 (earliest) 退出 (exit) 国际 (international) 尼克 (Povoroznyk) 平民 (civilians) 关税 (custom duty) 展望未来
		→look forward to the future
Topic 1	0.38707	代表团 (delegation) 基辅 (Kyiv) 公民 (citizens) 时间 (time) 谈判 (negotiation) 联合国 (UN) 制裁 (sanction) 原油 (crude) 大使馆 (embassy) 部门 (department) (古)特雷斯 (Guterres) 国家 (nation) 近十年 (recent 10 years) 天然气 (natural gas) 中外记者
		→domestic and overseas reporters
Topic 2	0.38763	欧盟 (EU) 总统 (President) 戈梅利 (Gomel) 制裁 (sanction) 进行谈判 (negotiating) 军事行动 (military action) 绿色 (green) 禁止 (prohibition) 著名 (famous) 统计数据 (statistics) 白俄罗斯 (Republic of Belarus) 侵犯 (violate) 米哈伊尔 (Delyagin Mikhail) 反华 (anti-Chinese) 武器 (weapon)
		→ negotiation
Topic 3	0.38616	小伙 (fellow) 激动 (excitement) 难掩 (conceal ones’ emotion) 台湾同胞 (Taiwan compatriots) 宝岛 (Taiwan) 之情 (the feeling of) 形容 (describe) 不听话 (naughty) 感想 (sentiment) 感动 (moved) 母亲 (mother) 这位 (this one) 随即 (immediately) 感叹 (exclamation) 台胞证 (Mainland Travel Permit for Taiwan Residents)
		→ stories
Topic 4	0.41198	板块 (board) 日经 (Nikkei) 东京证券交易所 (Tokyo Stock Exchange) 午盘 (afternoon session in the stock market) 消息面 (information) 东证 (Tokyo Stock Exchange) 橡胶制品 (rubber) 陶瓷 (porcelain) 下挫 (decline) 主板 (main board) 四连跌 (go down four times consecutively) 矿业 (mining industry) 靠前 (at the top of the list) 多点 (points) 三大 (Three major indexes)
		→ economy
Topic 5	0.3903	布雷 (lay mines) 触摸 (touch) 危险物品 (dangerous things) 疑似 (seems like) 贵重物品 (valuable things) 可靠消息 (reliable message) 街区 (block) 街上 (on the streets) 可疑 (suspicious) 敖德萨 (Odessa) 总领馆 (Consulate-General) 警察局 (police station) 手机 (phone) 远离 (stay away from) 现场 (on-site)
		→ safety of Chinese people
Topic 6	0.87255	谈判 (negotiation) 总统 (President) 代表团 (delegation) 国家 (nation) 局势 (situation) 白俄罗斯 (Republic of Belarus) 生物 (biological) 国际 (international) 欧盟 (EU) 公民 (citizens) 制裁 (sanction) 北约 (NATO) 基辅 (Kyiv) 联合国 (UN) 危机 (crisis)
		→ international involvement
Topic 7	0.49669	架接 (-) 返自 (return back) 航班 (flight) 临时 (temporary) 撤离 (evacuate) 公民 (citizens) 抵达 (arrive at) 回国 (return back to the motherland) 此前 (before) 大连 (Dalian) 杭州 (Hangzhou) 第十一 (11th) 北京 (Beijing) 济南 (Jinan) 第十二 (12th)
		→ evacuation of Chinese citizens

Appendix D: Topics for Swiss data

Topic 0	0.226	“dollar” “setzt” “deutschland” “unternehman” “anfrage” “ungarn” “legte” “wichtigsten” ”ruecken” *“arbeiten”’
Topic 1	0.268	“praesident” “invasion” “mitglie” “verletzt” “zuletzt” “gelten” “unterschied” “angaben” “eingesetzt” “appell”
Topic 2	0.327	“schweiz” s“ebenfall” “bundesrat” “gas” “cassis” “invasion” “experten” “sitzt” “einreiseverbote” *“fall”
		→ Swiss response
Topic 3	0.214	“team” “erstmal” “unterweg” “gebaeude” “gemaess” “sprach” “waffen” “raketen” “schweizer” “entschie”
		→ weapon
Topic 4	0.256	“lage” “kriege” “fall” “frage” “problem” “material” “volke” “definitive” “wirtschaft” “gefluechtete”
Topic 5	0.311	“grenze” “prozent” “fluechtlinge” “bleibt” “haelt” “gebiete” “nacht” “gedanken” “hilfsgueter” “produktion”
		→ refugee and aid
Topic 6	0.232	“krieg” “twitter” “praesidenten” “aggression” “fahren” “hause” “system” “heimat” “schw” “leben”
Topic 7	0.425	“stadt” “frauen” “region” “video” “osten” “gefluechteten” “pro” “stand” “zahl” “personen”
Topic 8	0.134	“land” “westen” “europa” “truppen” “bild” “botschaft” “bleiben” “staaten” “soldaten” “lande”’
		→ military
Topic 9	0.353	“situation” “welt” “fall” “westlichen” “fast” “laender” “sieht” “moment” “finden” “private”

References

Ahuja, A., W. Wei, and K. M. Carley. 2016. “Microblog Sentiment Topic Model.” In 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), 1031–8. Barcelona: IEEE.10.1109/ICDMW.2016.0149Search in Google Scholar

Appiah-Otoo, I. 2023. “Russia–Ukraine War and US Oil Prices.” Energy Research Letters 4 (1): 1–5. https://doi.org/10.46557/001c.37691.Search in Google Scholar

Barde, B. V., and A. M. Bainwad. 2017. “An Overview of Topic Modeling Methods and Tools.” In 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), 745–50. Madurai: IEEE.10.1109/ICCONS.2017.8250563Search in Google Scholar

Behnassi, M., and M. El Haiba. 2022. “Implications of the Russia–Ukraine War for Global Food Security.” Nature Human Behaviour 6 (6): 754–5. https://doi.org/10.1038/s41562-022-01391-x.Search in Google Scholar

Biktimirov, E. N., T. Sokolyk, and A. Ayanso. 2021. “Sentiment and Hype of Business Media Topics and Stock Market Returns during the COVID-19 Pandemic.” Journal of Behavioral and Experimental Finance 31: 100542. https://doi.org/10.1016/j.jbef.2021.100542.Search in Google Scholar

Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. “Latent Dirichlet allocation.” Journal of Machine Learning Research 3: 993–1022.Search in Google Scholar

Coban, F. 2016. “The Role of the Media in International Relations: From the CNN Effect to the Al–Jazeere Effect.” Journal of International Relations and Foreign Policy 4 (2): 45–61. https://doi.org/10.15640/jirfp.v4n2a3.Search in Google Scholar

Cornia, A., A. Sehl, and R. Kleis Nielsen. 2019. “Comparing Legacy Media Responses to the Changing Business of News: Cross-National Similarities and Differences across Media Types.” International Communication Gazette 81 (6–8): 686–706. https://doi.org/10.1177/1748048518808641.Search in Google Scholar

Churchill, R., L. Singh, and C. Kirov. 2018. “A Temporal Topic Model for Noisy Mediums.” In Pacific-Asia Conference on Knowledge Discovery and Data Mining, 42–53. Cham: Springer.10.1007/978-3-319-93037-4_4Search in Google Scholar

Deng, M., M. Leippold, A. F. Wagner, and Q. Wang. 2022. “Stock Prices and the Russia–Ukraine War: Sanctions, Energy and ESG.” CEPR Discussion Paper: DP17207.10.2139/ssrn.4080181Search in Google Scholar

Ditrych, O. 2014. “Bracing for Cold Peace. US–Russia Relations after Ukraine.” The International Spectator 49 (4): 76–96. https://doi.org/10.1080/03932729.2014.963958.Search in Google Scholar

Geissler, D., D. Bär, N. Pröllochs, and S. Feuerriegel. 2022. Russian Propaganda on Social Media During the 2022 Invasion of Ukraine. arXiv preprint arXiv:2211.04154.10.1140/epjds/s13688-023-00414-5Search in Google Scholar

La Gatta, V., C. Wei, L. Luceri, F. Pierri, and E. Ferrara. 2023. Retrieving False Claims on Twitter During the Russia–Ukraine Conflict. arXiv preprint arXiv:2303.10121.10.1145/3543873.3587571Search in Google Scholar

Liadze, I., C. Macchiarelli, P. Mortimer‐Lee, and P. Sanchez Juanino. 2023. “Economic Costs of the Russia–Ukraine War.” The World Economy 46 (4): 874–86. https://doi.org/10.1111/twec.13336.Search in Google Scholar

Mbah, R. E., and D. F. Wasum. 2022. “Russian–Ukraine 2022 War: A Review of the Economic Impact of Russian–Ukraine Crisis on the USA, UK, Canada, and Europe.” Advances in Social Sciences Research Journal 9 (3): 144–53. https://doi.org/10.14738/assrj.93.12005.Search in Google Scholar

Mishler, A., E. S. Crabb, S. Paletz, B. Hefright, and E. Golonka. 2015. “Using Structural Topic Modeling to Detect Events and Cluster Twitter Users in the Ukrainian Crisis.” In International Conference on Human-Computer Interaction, 639–44. Cham: Springer.10.1007/978-3-319-21380-4_108Search in Google Scholar

Orhan, E. 2022. “The Effects of the Russia–Ukraine War on Global Trade.” Journal of International Trade, Logistics and Law 8 (1): 141–6.Search in Google Scholar

Rao, Y., Q. Li, X. Mao, and L. Wenyin. 2014. “Sentiment Topic Models for Social Emotion Mining.” Information Sciences 266: 90–100. https://doi.org/10.1016/j.ins.2013.12.059.Search in Google Scholar

Rohani, V. A., S. Shayaa, and G. Babanejaddehaki. 2016. “Topic Modeling for Social Media Content: A Practical Approach.” In 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), 397–402. Kuala Lumpur: IEEE.10.1109/ICCOINS.2016.7783248Search in Google Scholar

Savrum, M. Y., and L. Miller. 2015. “The Role of the Media in Conflict, Peace Building and International Relations.” International Journal on World Peace 32 (4): 13–34.Search in Google Scholar

Sidana, S., S. Amer-Yahia, M. Clausel, M. Rebai, S. T. Mai, and M.-R. Amini. 2018. “Health Monitoring on Social Media over Time.” IEEE Transactions on Knowledge and Data Engineering 30 (8): 1467–1480.10.1109/TKDE.2018.2795606Search in Google Scholar

Thapa, S., A. Shah, F. A. Jafri, U. Naseem, and I. Razzak. 2022. “A Multi-Modal Dataset for Hate Speech Detection on Social Media: Case-Study of Russia–Ukraine Conflict.” In CASE 2022-5th Workshop on Challenges and Applications of Automated Extraction of Socio-Political Events from Text, Proceedings of the Workshop. Abu Dhabi: Association for Computational Linguistics.10.18653/v1/2022.case-1.1Search in Google Scholar

Vayansky, I., and S. A. Kumar. 2020. “A Review of Topic Modeling Methods.” Information Systems 94: 101582. https://doi.org/10.1016/j.is.2020.101582.Search in Google Scholar

Wagner, C. S., L. Bornmann, and L. Leydesdorff. 2015. “Recent Developments in China–U.S. Cooperation in Science.” Minerva 53: 199–214. https://doi.org/10.1007/s11024-015-9273-6.Search in Google Scholar

Wishnick, E. 2001. “Russia and China.” Asian Survey 41 (5): 797–821. https://doi.org/10.1525/as.2001.41.5.797.Search in Google Scholar

YouGov. n.d. The Most Popular News Websites (Q1 2022). https://today.yougov.com/ratings/media/popularity/news-websites/all.Search in Google Scholar

Zhou, Y., M. Liang, and J. Du. 2012. “Study of Cross-Media Topic Analysis Based on Visual Topic Model.” In 2012 24th Chinese Control and Decision Conference (CCDC), 3467–70. Taiyuan: IEEE.10.1109/CCDC.2012.6244553Search in Google Scholar

Received: 2023-07-06

Accepted: 2023-12-04

Published Online: 2023-12-21

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/csh-2023-0010

Keywords for this article

topic modeling; social media bias; Ukraine war; international relations

Creative Commons

BY 4.0