Article Publicly Available

Reasoning and Appraisal in Multimodal Argumentation

Analyzing Building a community of shared future for humankind
  • Ting Wu (b. 1978) is Associate Professor at the School of Foreign Languages, Southeast University, Nanjing, China. Her research is in second language acquisition and multimodal discourse analysis. Her publications include: “A probe into two micro-lecture videos: A systemic-functional approach to intersemiosis analysis in multimodal discourse” (2017), “On correlation between teachers’ discipline strategies and college students’ willingness to communicate in English” (2016), and “On the communication of Chinese discourse acts from the moral perspective” (2015).

    EMAIL logo
Published/Copyright: August 19, 2020
Become an author with De Gruyter Brill

Abstract

The development of new media enlarges the repertoire of semantic resources in creating a discourse. Apart from language, visual and sound symbols can all become semantic sources, and a synergy of different modality and symbols can be used to complete argumentative reasoning and evaluation. In the framework of multimodal argumentation and appraisal theory, this study conducted quantitative and multimodal discourse analysis on a new media discourse Building a community of shared future for humankind and found that visual symbols can independently fulfill both reasoning and evaluation in the argumentative discourse. An interplay of multiple modalities constructs a multi-layered semantic source, with verbal subtitles as a frame and a sound system designed to reinforce the theme and mood. In addition, visual modality is implicit in constructing the stance and evaluation of the discourse, with the verbal mode playing the role of “anchoring,” i.e. providing explicit explanation. A synergy of visual, acoustic, and verbal modalities could effectively transmit conceptual, interpersonal, and discursive meanings, but the persuasive result with the audience from different cultural backgrounds might be mixed.

1 Introduction

Argumentation is an action in itself, where a series of evidence, premises or cases are presented to influence the audience and help ensure the acceptance of an idea which has not yet been fully accepted by a larger audience or has even been doubted or questioned to a certain extent (see Rocci 2017). When the argumentations take place in the social situations where people need to make decisions or to establish a common understanding of the world, such as in the political, judiciary, educational and scientific, commercial, and general public domains, as well as in the media, the semantic resources employed by the speaker and the rhetorical and pragmatic mechanism of the argumentation could be considered as essential in achieving effective persuasion.

In September 2015, Xi Jinping, China’s chairman, proposed building a “community of shared future for humankind” at the United Nations forum. This concept of building a community for all humankind was presented again at the nineteenth China People’s Congress and written into the Communist Party’s charter and later the amended constitution, rising to be the common will of the state and the Communist Party of China and attaining broad attention from home and abroad. The “community of shared future” (CSF) as one of the ideologies in China's global governance system is supposed to be an effective discursive response to reverse anti-globalization and trade protectionism. While this concept addresses the most essential proposition in dealing with world affairs, the rationale has never been fully developed and discussed. Recently, it has been materialized in different forms: articles, reports, photographs, and video clips. In 2017, a series of short videos about the CSF were released on the internet, explaining the idea of the CSF in multimodal discourse and arguing that in order to build a better world, people of all countries should discard their differences and unite to work together. Therefore, it is necessary to examine the reasoning process of these multimodal arguments and the development of values and attitudes of the composer in the process of constructing such an argument. This study aims to explore the rationale and persuasiveness in one of the short videos titled “构建人类命运共同体, 我们在一起 [To build a community of shared future for humankind, we should be together]” under the theoretical framework of multimodal argumentation and appraisal theory.

2 Theoretical framework

2.1 Multimodal argumentation

Multimodal argumentation was first proposed by Gilbert (1994), referring to the construction of premise and conclusion of an argument with a synergy of language symbols, images, sound, animation, and other kinds of non-verbal symbols and modalities. He then developed the traditional concept of argument and proposed that arguments can be categorized into four identifiable and distinct modes: “1. the logical […] 2. the emotional, which relates to the realm of feelings 3. the visceral, which stems from the arena of the physical, and 4. the kisceral (from the Japanese term ki meaning energy), which covers the intuitive and non-sensory areas” (Gilbert 1997).

In the past two decades, multimodal argumentation has received increasing attention and become a new area of interdisciplinary research. International journals such as Argumentation and Advocacy, Argumentation, and the Journal of Argumentation in Context emerged and published relevant research papers discussing the modal interaction and symbolic relationship between visual and verbal discourse. In 1996, Argumentation and Advocacy launched a special issue on the theme of visual argument, and since then, scholars have been debating about whether visuals can constitute argumentation (Fleming 1996; Blair 1996; Johnson 2003; Patterson 2010). The research focus then shifted to discussing how visual symbols construct argumentation when the journal Argumentation released a special issue in 2015 with a series of papers proposing the symbol semantics in rhetoric stylistics, transforming the traditional rhetorical structure by analysis of multimodal discourse in various context of arguments. In 2018, Semiotica further introduced the advanced methods and scenarios of multimodal argumentation research conducted by several scholars in a series of papers analyzing in depth the role and interaction of multimodal symbols in real-world arguments (Rocci and Pollaroli 2018; Tseronis 2018 ).

2.2 Systemic-functional grammar and meta-functions

Multimodal argument analysis is possible because of advancement not only in sociosemiotics but also in theories and practice from linguistics. Kress and van Leeuwen (1996) developed systemic-functional visual grammar with a revised framework of functional semiotic grammar in linguistics (Halliday 1978, 1985). In his view, language, as well as other modes and symbols, could also be used as a resource for meaning representation. The structural features and functions of symbols such as images are studied in the same way as the verbal mode; image design, fonts, and animated illustrations can be used as non-verbal means to support different viewpoints (Kress and Van Leeuwen 1996; Kress 2010).

Like linguistic structures, visual structures point to particular interpretations of experience and forms of social interaction. To some degree these can also be expressed linguistically. Meanings belong to culture, rather than to specific semiotic modes. And the way meanings are mapped across different semiotic modes, the way some things can, for instance, be ‘said’ either visually or verbally, others only visually, again others only verbally, is also culturally and historically specific. (Kress and Van Leeuwen 1996)

In order to further analyze the process of different modes and symbols in realizing the meaning potentials, he then adopted the notion of “meta-functions” (Halliday 1978), which are the ideational, the interpersonal, and the textual functions, and showcased how these meta-functions can also work in visual and multimodal discourses. Of the three meta-functions, the interpersonal function was not sufficiently developed and was later expanded by Martin (2000, 2003) with a theoretical framework named the “appraisal system,” which is now applied widely in various genres of discourse analysis. It is especially helpful in understanding the author’s attitude and stance in a particular culture. As pointed out by Kress (1996), “meaning belongs to culture,” and the way different semiotic modes are chosen to map out the meanings intended could be of interest to researchers. Apart from the representation of meaning, the choice of different modalities could also demonstrate the attitude and stance (whether explicit or hidden) of the author.

2.3 KC table

Groarke (2015) designed a tool named a “Key Component” table (KC table) to analyze the elements and acts in the arguing process in argumentative discourses. He also added a general account of various modes applied to the acts of argument. In traditional syllogism, there are mostly three elements: premises (minor and major), a stance, and a conclusion. The argumentation process is completed by arguing from the premises to the conclusion in achieving the acceptance of one’s stance by the targeted audience. An important step in the analysis of the argumentation process is the identification and examination of the premises, conclusions, and also the stance (sometimes implicitly). Both inductive and deductive methods are generally adopted in argumentative reasoning, most of which are deductive methods.

In a KC table, the elements in traditional argumentative discourses are categorized into different columns with art of arguing, argument, and argument modes to better present the arguments and their modal features. Examples are shown in Table 1.

Table 1

KC table example one

Act of arguing Argument Argument mode
Claim: Socrates is a person. Minor premise Verbal
Socrates is a human.
Major premise Enthymeme
Humans are mortal.
Claim: Socrates is mortal. Conclusion Verbal
Socrates is mortal.

In multimodal argumentative discourse, the premise and conclusion can be stated in non-verbal mode. He then suggested that the act of arguing can be verbal (a claim) or nonverbal, such as pointing, playing audio, tasting, and so on. Different argumentative acts play different argumentative roles, which are afforded by the corresponding modalities. It is nevertheless possible that the reasoning is completed from premises to the conclusions. This concept challenges the inherent “speech act” theory, giving the non-verbal modality more attention and credibility it deserves, pointing out that non-verbal acts of arguing can also be even more effective and salient in certain situations. For example, two persons saw a group of swans in the river. One said this was a group of mute swans, while the other said that they were trumpet swans. To settle the argument, they decided to rent a boat, go near the swans, and record their calls, and then they compared the recordings with the swan songs collected in the audio library to draw conclusions. The key component table analysis of this argument is shown in Table 2.

Table 2

KC table example two

Act of arguing Argument Argument mode
Listen to the swan songs on Minor premise Sound
the river. The swans on the river were singing a certain song.
Listen to the collection of Major premise Sound
swan songs in the library The swans singing that song are trumpet swans.
Assert that “the swan is a Conclusion Verbal
trumpet swan.” The swan in question is a trumpet swan.

Zhang (2017) used the KC table to analyze a poster titled Drunk driving/ Life-threatening. A careful analysis of this static multimodal discourse shows the roles that various modalities are playing and a synergy effect of these various modes in the process of argumentation and reasoning. He then proposed a framework which summarized the principles of effective multimodal discourse design.

Groarke’s KC table provides a structural framework for the analysis of multimodal discourse, but fails to illustrate why this particular mode is chosen as a premise or a conclusion by the composer/ author. It is necessary to examine the composer/author’s stance and appraisal in the argumentative process, which could be essential in persuasion. Therefore, a systematic evaluation of the multimodal discourse with an investigation into the interplay of the authors’ argument modes and symbolic choices is warranted. It not only examines the multimodal discourse’s logic construction and reasoning processes, but also provides a more in-depth explanation of the logic construction and reasoning modes, ultimately exploring the mixed persuasion results from the choices of different semiotic resources and symbols.

With a KC table examining the ideational aspect of argumentation, appraisal theory is employed to investigate the stance or attitude of the author in composing the multimodal discourse. Originally a linguistic theory, appraisal theory was mostly used for verbal text to evaluate the types and level of intensity of attitudes of the author. With appraisal theory (Martin 2003) as a framework, several scholars went further and delved into the field of how nonverbal symbols represent feelings and attitude in discourse (Martin 2002, 2008; Macken-Horarik 2004; Unsworth 2015). They mainly focus on the ways ethical judgements are communicated by images and other nonverbal symbols in the multimodal discourse. Yet the aesthetic function of these images and nonverbal symbols has been underexplored. Little has been found about what these images or other nonverbal symbols could do to the audience of different cultures when applied in a multimodal argumentative discourse. By adding a dimension of evaluation of the author’s stance and attitude to the KC table, we can better discern the ideational, textual, and interpersonal choices the author made to compose the multimodal discourse. This study will, therefore, explore how the attitude and stance is represented by various modalities in multimodal discourse under the framework of the appraisal system, which is usually employed to evaluate the author’s attitude in composing a verbal discourse. With the system applied to various symbols other than language in modal discourse, a complete description of the synergy of attitude representation by modalities is possible. Using the KC table and appraisal theory, we shall analyze both the ideational construction in the argumentative reasoning process and the stance/attitude representation in a multimodal discourse: a micro-video titled “ 构建人类命运共同体, 我们在一起 [To build a community of shared future for humankind, we should be together]”

Figure1 
						An overview of appraisal resources (adapted from Martin and White [2005])
Figure1

An overview of appraisal resources (adapted from Martin and White [2005])

3 Method

3.1 Research questions

China Network has released a series of micro-videos with the theme of Building a community of shared future for humankind. Among those, the most viewed one is chosen for research. The video’s full length is 5 minutes and 45 seconds, and it was released in January 2018 to commemorate the first anniversary of the Community of Human Destiny speech by Chairman Xi Jinping at the Geneva Conference. The micro-video was released on major online video media, and the number of hits has reached tens of millions. The video’s URL is http://my.tv.sohu.com/us/264683445/97793069.shtml

This study aims to answer the following three research questions:

-How does the streaming video construct the proposition of “We should be together to build a community of shared future for humankind” with different modalities and symbols?

-What are the particular roles that various modalities and symbols play in the argumentation process? Is the meaning potential resource expanded? Can the efficiency of arguments be improved?

-How are rhetorical propositions and evaluation integrated in this micro video?

-To what extent do visual or other modalities show stance or evaluation in the construction process of argumentative meaning potentials?

3.2 Materials and procedures

A combination of quantitative and qualitative analysis was employed in the study. The research tool is ELAN 5.0 (Elan Linguistic Annotator), a professional tool to annotate and transcribe audio or video recordings (https://tla.mpi.nl/tools/tla-tools/elan/download/) With a Chinese interface, its greatest advantage is a customizable hierarchical, layered annotation system and a semi-automatic statistics analysis system to set the correlation between the different signs and symbols in a modality (such as visual mode). In addition to the modal layer, another layer is defined as the argument system with different elements in argumentation rhetoric (such as major premise, minor premise, etc.). A sample page analysis is shown in Figure 2.

Figure 2 
						An interface of multimodal analysis
Figure 2

An interface of multimodal analysis

4 Result and discussion

The research data is a short video clip that lasts 5 minutes and 45 seconds with a title in Chinese 构建人类命运共同体,我们在一起 ‘To build a community of

shared future for humankind, we should be together’. It is chronologically divided into seven sections, which could be regarded as the sub-themes of arguments:

  1. 守卫和平家园/我们在一起 ‘Guard home peace/We are together’

  2. 文化/有不同/但没有壁垒 我们在一起 ‘Different culture but no barriers/We are together’

  3. 同舟共济/共同繁荣/我们在一起 ‘In the same boat/We are together’

  4. 没有另一个地球/我们在一起 ‘There is no other planet earth/We are together’

  5. 共享科技带来的改变/我们在一起 ‘Share the windfall of technology/ We are together’

  6. 没有什么避罪天堂/我们在一起 ‘There is no crime-free paradise/We are together’

  7. 生命平等而可贵/我们在一起 ‘Every life is equal and precious/We are together’

In the latter part of the video, some counter-arguments are refuted with the text in Chinese characters appearing on a blank screen: 告别零和博弈、霸权逻辑、冷战思维、丛林法则、历史终结论 ‘Farewell to zero-sum game, hegemonic logic, Cold War mentality, jungle laws, The end of history theory’. At the end of the video, the proposition is restated in text forms : 构建人类命运共同体,我们在一起 ‘To build a community of shared future for humankind, we should be together’, serving as a conclusion as well.

The whole video is a collection of images (videos, photo collections), texts (presented in a video format in a positive font), scene sounds, soundtracks, subtitles, and so on. Among them, images, colors, and animations are visual modalities, characters, subtitles, and character utterances are verbal modes, and scene sounds and soundtracks are sound modes. The duration and ratio of each part in the video are shown in Table 3.

Table 3

Modal symbol duration and proportion in micro video

Modal Level No. of labels (times) Average time (seconds) Total labeling time (seconds) Labeling time (%)
Visual image 30 10.36 310.81 90.71
color 7 18.75 131.22 38.3 0
animation 5 6.38 31.86 9.30
Verbal text 35 6.55 229.28 66.91
subtitle 5 8.39 41.97 12.25
character discourse 5 8.39 41.97 12.25
Sound scene sound 1 5 10.86 16 2.87 47.53
background music 5 54.93 27 4.67 29.83

It can be seen from the proportional distribution of the modal symbols in the table that the micro-video is dominated by visual modality, supplemented by verbal modality and auditory modality. The image symbol representation in the visual modality is dominant, while the verbal text, though scarce in proportion, plays the role of anchoring, explanation, and interpretation at the end of each section of the video. A few subtitles match the specific scene in the image. In the auditory mode, there is no commentary voiceover explaining or arguing, only the sound from the scenes and the background music. Overall, this is a new genre of the new medium, different from the traditional argumentative discourse, where verbal modes take the leading role in the reasoning process. In addition, it appears that it does not intend to impact the targeted audience with subjective verbal narration or persuasion. The greater use of moving images and other visual modalities seems to construct the proposition with more influence and persuasiveness by allowing viewers to arrive at a conclusion on their own.

The color of the font in the visual modality plays the role of segmenting themes. The fonts of the six sub-themes are presented by different color schemes, and the different meaning potentials are constructed together with the linguistic symbols. The “guarding peace” theme is presented in the colors of blue and white, while the “culture” theme is presented in orange fonts and the “environmental protection” theme is presented in green fonts. The “global criminal arrest” theme, in contrast, serves to warn and showcase the seriousness of combating international crimes, and therefore is presented in the color of red. This font color scheme segments and distinguishes different themes, constructs unique conceptual meanings, and seems more intimate and intuitive in instantiating meaning potentials in cross-cultural communication. It also promotes a smoother and more natural progression in terms of contextual layout.

Another unique advantage of this type of discourse is reflected in cross-language communication. The images as the major representation of meaning potential bypass the barriers of language symbols which cannot be universally understood. Also, by anchoring and expositing images, large subtitles as a verbal textual modality present key words so as to achieve a wider, cross-border, interracial dissemination of ideas and messages. A synergy of different modalities in a multimodal discourse is therefore accomplished.

4.1 Argumentative reasoning

The argumentative reasoning of the multimodal discourse presents a modular progression:

The process of argumentation and reasoning in the opening part of the video is shown in Table 4 and Table 5. They both consist of minor premise, major premise (sometimes omitted), and conclusion. This two-step process of reasoning develops the arguments by various arguing acts with a majority use of visual modalities. A final conclusion that the war-plagued countries in the world need to work together and Chinese people are helping is arrived at by a synergy of argumentations in different modalities. The first step of the reasoning process is shown in Table 4. Its minor premise “the homes of certain parts of the worlds are destroyed by war” is materialized by displaying gunfire and tanks moving on the screen, which represent the general notion of war. The major premise that “people suffer when homes are destroyed” is represented visually by moving images showing refugees crying beside the body of a loved one and damaged homes. It leads to the conclusion that “war makes people suffer and feel hopeless.” The text “What is the future of these people on earth if we don’t work together?” is then shown on the screen, explicitly reminding the viewer that the world is facing a series of global challenges, such as the danger of war, frequent terrorist attacks, the widening gap between rich and poor, exacerbated financial crisis, long-term economic downturn, environmental degradation, and other problems which plague mankind. Thus, it raises fundamental questions for mankind: Where will we go in the future in the face of great challenge? Why we are closely connected? Why can't we go our separate ways?

Table 4

KC table of the first section of Building a community of shared future together

Act of arguing Argument Argument mode
Play documentaries showing Minor premise Visual
scenes of war Wars cause destruction.
Major premise Visual
Destruction leads to people suffering.
Claim “The future is turbulent” Conclusion Verbal
Wars cause people to suffer.
Table 5

KC table of the second section of Building a community of shared future together

Act of arguing Argument Minor premise Argument mode
Play documentary images showing Visual
Chinese peacekeeping soldiers teaching African locals the concept of peace; Chinese peacekeeping forces are in action.
State “China has sent a total of 36,000 peacekeepers” Verbal
Major premise Enthymeme
Peace could be achieved by actions of peacekeeping forces and tolerance.
Play videos showing the people in Conclusion Visual
the conflict area chorus “家和万事兴” Peace could be achieved by the Chinese efforts and values.
Claim “Guarding the peace at home/We are together” Verbal (text)

Interestingly, these notions of war, when represented in visual forms, are not in close-up shots. Visual symbols are specific and concrete, with a dimension of a certain time and location. This may provide the argument with tangibility and realness, but only when it is generalized into a common notion can it be used as a premise for the argument. Therefore, the author intentionally distances the audience from the scene of wars in order to achieve the argument construction.

The second step of the reasoning (Table 5) is to undertake the above proposition. The minor premise “Chinese peacekeeping troops keep peace and teach locals the concept of peace” is in visual modes with documentary images showing the scene of these occasions. There is no verbal modality involved. With “Peace can be achieved not only by force but also by mutual understanding and love” as an omitted major premise, a conclusion is thus made: Peace can be achieved by Chinese efforts and values. Documentary images showing the local people chorusing in Chinese “家和万事兴” ‘A peaceful family will prosper’ showcase the importance of verbal modality in explicitly highlighting and constructing the conceptual meaning. In the end a further deduction is made in text: “To keep peace, we need to work together.” This verbal form addresses the audience directly, prompting them to think of their own experience and evoking the universal value schema “peace does not come easily.” “We need all parties to discard their differences and do their best to unite” is the essence of CSF.

We observe that in the latter phase of reasoning, where there is an overlap of the visual and verbal modality in presenting the minor premise and the conclusion, this forms a surplus effect, with textual content interpreting the image and the image strengthening the text. Therefore, through the “surplus” effect, the author ensures that the audience get enough input in order to facilitate the understanding of the discourse (Zhang 2017).

The reasoning mode at the last section of the video is shown in Table 6. As in the previous segments, a typical reasoning progress has been achieved from the premises to the conclusion. The difference is that the presentation of the conclusion is no longer implied or shown silently in text, but through a rather strong verbal modality, that is, the voice of President Xi from his public speech. This is also a unique feature in the entire micro-video, where the most commonly used modalities are visual and textual verbal. However, the use of the vocal verbal modality gives the presentation of the conclusion of the argument a stronger sense of presence and intentness. The confidence and personality in vocalization enable the audience to resonate with the speaker's point of view, highlighting and strengthening the argument. The vocal characteristics of a strong personality tend to affect the audience where a slow but steady speech voice has more convincing power and is more likely to persuade the audience into taking positive action(Baker 1990).

Table 6

The KC table for the last section of Building a community of shared future together

Act of arguing Argument Argument mode
Show moving images Minor premise Visual
presenting the international summit Countries discuss global affairs together
Major premise Enthymeme
When people discuss and share their opinions openly and frankly, a better understanding and thus a brighter future can be attained.
Play Xi Jinping’s speech Conclusion Verbal (speech)
recordings The destiny of the world should be mastered by all countries. International rules should be jointly implemented by all countries. Global affairs should be governed by all countries. Development gains should be shared by all countries. It is necessary to jointly build a community for humankind.
Claim “To build a community of shared future, we should be together” Verbal (text)

Interestingly, following the argument act of President Xi’s speech, which is in verbal (vocal) mode of argument, is another important argument tool, counterargument, but in text form. In this clip, the counter-arguments to the proposition of building the community with the shared future seem to be “zero-sum game,” “hegemonic logic,” “Cold War thinking,” “ jungle law,” etc., emerging in the form of text mode one by one and then erased by an invisible hand in the final section of this video. In traditional argumentation, it is essential to consider all aspects of a proposition and includes different or opposite views in the debate. It is for the benefit of the audience to get the whole picture, and to arrive at a well-informed decision. Hence, the opposite view of CSF is presented verbally and explicitly: “zero-sum game,” “hegemonic logic,” “Cold War thinking,” and “jungle law” appear in the center of the video screen in bold fonts, only to be erased one by one by an invisible hand.

This “wiping out” animation effect is actually the negation and resolution of the above negative views, thus completing the discussion of the entire argumentation proposition. By contrasting different concepts by juxtaposing embodied images, the author gives the audience a choice to make, interpreting the argument, and making a decision on their own. This is consistent with Tseronis’s (2018) work where two contrasting images could be perceived in one advertisement poster if viewed from different perspectives. And these different images represent opposite views on the same topic. Pan and Zheng (2017) also discuss multimodal perception through different perspectives of analysis. They point out that a critical analysis of multimodal texts could be conducted in the context of different cultural perspectives and characteristics. Visual modality tends to be more flexible and fluid in concept representation than language, and more in line with human observation and perception of the world, thus having better interpersonal function and persuasiveness.

Serious doubt was cast upon whether visual images could be analyzed as rhetorical discourse because the symbols of visual or other modes are not inherently propositional (Fleming 1996; Johnson 2003; Patterson 2010). Therefore, visual symbols need to be anchored by verbal ones to ensure a specific proposition is represented in the sense of an argumentative discourse. However, in this case, we observed a pattern of argumentation in multimodal modalities. In the above short video, we can see that while the visual symbols are often adopted as the major modality in presenting the minor premise in the act of arguing as well as constructing the context of the meaning potential, the verbal modes are utilized to put forward the ultimate proposition, fulfilling the complete argumentation. This pattern is repeatedly shown throughout this short video clip and other relevant materials. The visual modality prevails in minor premises not only because it provides realness in context construction, but also as a concept lead-in. With that lead-in, a major premise, if omitted, can be more easily deducted because visual modes often appeal to our subconscious cognition, ethics, and emotions.

4.2 The multimodal representation of stance and attitude

In rhetorical discourse, the presentation and the acceptance of the author’s stance are supposed to be the communicative result. This end could be achieved by adapting the discourse to the cultural and situational context and intentionally choosing various forms of meaning potential. To further explore the choice of different modes of discourse by the author in demonstrating the stance and attitude, both explicitly and implicitly throughout the video, we evaluate the process of the augmentation under the theoretical framework of appraisal theory, trying to discuss and discern patterns in multimodal stance construction.

Appraisal theory (Martin 2000) includes attitude, engagement, and graduation as the three sub-systems, with attitude as the core and the main part of the interpersonal tenor evaluation system. There are three kinds of attitude: affect, judgement, and appreciation. These three ways are used to construct the author's discourse attitude. Wang (2017) found that from the manifestation of explicit attitudes at the level of grammar, semantics, and symbols, the implicit embodiment of attitude can be inferred, and an explicit–implicit continuum of attitude is formed. In the multimodal argumentative composition, it is especially important to discuss this explicit–implicit continuum in order to understand the affordances of the various modalities in the construction of the rhetorical stance and how these affordances are chosen by the author. Three dimensions under attitude of appraisal theory are examined: Affect, Judgement, and Appreciation.

1) Affect: This sub-system belongs to the category of attitude in appraisal theory and is the emotional expression of or reaction to representations in the argumentative discourse, including behaviors, texts or processes, and phenomena. Feng and O’Halloran (2012) analyzed the emotive meaning in visual images and concluded that emotive behaviors could be abstracted to represent iconic behaviors which could arouse emotion through images. However, in a single image or picture without sufficient context, those semiotic choices are often made supposedly by the character depicted in the image, not by the author. Therefore, the emotions and attitude, though vividly conveyed through the images, hardly indicate the author’s stance or evaluation. In the short video, nevertheless, things are different. Not only the behaviors and interactions of all characters, but also the context and the scenery were included for the analysis, and therefore, it is the author’s attitude that could be represented through his or her careful composition of the symbols and modalities. A synergy of verbal, visual, and acoustic modes is thus employed to ensure effective meaning construction as well as an attitude representation in the argumentations.

For example, extreme experiences such as natural disasters, wars, fighting, and dying of refugees if represented in visual and acoustic modalities can be more convincing than in verbal modes. The audience, supposedly mostly civilians, can relate to these experiences and empathize with the characters in the video. Kress (2010: 68) discussed meaning potential principles in rhetorical discourse. The signifier of symbols could extend the experience in the meaning construction process, transforming our understanding of the outside world in a metaphorical manner. A picture paints a thousand words. The visual modality, especially the image of a specific and real scene with a context, is self-sufficient and self-evident, so that the author's attitude is conveyed to the audience in the process of argumentation. In the first section of this micro-video “Guarding the peace at home/We are together,” films of gunshots, fire, ruins, and crying people are displayed without verbal explanation, and in these visual and acoustic modalities, the discourse conveys the despair, heartache, darkness, and other potential meanings which the verbal mode would hardly materialize by itself. The integration of visual and auditory modalities is, therefore, more effective. The extreme experience of the audience is triggered and extended, invoking the attitude and triggering deep thoughts.

In the fourth section with the title of “There is no other planet earth/ We are together,” the author presented natural disasters in images or footage of tsunamis, hurricanes, floods, and collapsed houses. These visual symbols are prone to arouse emotions of shock, fear, and sadness in the audience and therefore, they have lain a solid argumentative premise for the subsequent conclusions and actions, which helps the author to construct a discursive position more effectively and serve the purpose of persuasion.

2) Judgment: The second subsystem belongs to the ethical category and is the judgment of the personality and behavior of the agents in the discourse. It is mainly constructed by spoken language in the verbal modality and the close-up shots of the agents in the visual modality. In this video, the agents are: Chinese peacekeeping forces; Chinese doctors who provide humanitarian medical service in Ebola-stricken areas; Chinese journalists who investigate the African ivory trade business at the risk of their life. These agents are presented not only in close-up visual images but also through a verbal explanation of the motives for their heroic deeds. From construing both the social symbols (reflected by their appearances and behaviors) and the cognitive symbols (reflected by their inner thoughts and voice), more comprehensive judgments of their character are thus achieved: braveness, fearlessness, warm-heartedness, willingness to sacrifice, etc. The close-up visual images of these characters give the audience more intuitive stimuli, while the verbal mode (mostly their voices) renders these personalities more credible, making these arguments more personal and real, and thus more persuasive.

3) Appreciation: This subsystem belongs to a category that is used to express the author's evaluation and appreciation of things. For this type, we find that the fifth section of the video, “Share the windfall of technology/ We are together,” is a case in point. The focus is on technological development with the spread of knowledge and collaboration of nations: China Railway express, the space rocket, the astronauts working in the space capsule. It is mainly represented in the visual mode combined with the auditory mode with no verbal modes involved. The magnificence of creations of mankind is quite evident in the images and background sound. It seems that the language mode is no longer necessary in this type of attitude construction.

Through appraisal analysis, we found that a stance or the author’s attitude could be achieved by the employment of different symbols and modalities in the argumentative discourse. In the process, visual modality is especially salient and indispensable in invoking the emotion of the audience (Affect), and complementary in passing the ethical opinions on the characters in the video (Judgement). It is as effective as the verbal modes in showing the magnificence of artifacts and wonders by simply juxtaposing the images (Appreciation). It demonstrates the author’s belief that the representation of different concepts, ideas, and stance should be relevant to the choice from the reservoir of various modalities in the communication process.

The visual modality effect of Affect is the strongest among all the factors in attitude representation of the appraisal system. As mentioned above, in the given context, visual modality can construct the meaning potential independently and arouse the emotional reaction of the audience. While the Affect module is distinctively visual, the second factor in the Attitude matrix, Judgement, is the most language-dependent, because judgement of personality traits or achievements needs to be explicit in elaborate verbal representations, with a purpose of ascertaining its quality positioning meaning. However, visual presentations like personal portraits still prevail in this section, which might suggest that images in visual modality can assist the language modality, supporting the verbal claim with concrete evidence from real life. Thus an implicit indication of attitude is achieved through a combination of visual and verbal representations. The synergy in the third factor, Appreciation, is somewhere between the previous two kinds. Visual contrast affords sentiments like proudness by juxtaposing the slow motion of revolving satellites and fast moving of transatlantic trains. Visual modality can not only effectively construct the meaning potential intended by the author, but can also express the author's positive attitude to the discourse in an implicit manner, leaving the audience to marvel and appreciate without further influence. Without description or explanation, the verbal modality simply presents in the form of data and terms. It is the multimodal synergy that makes the meaning construction process more fluid and real and achieves a more convincing expression of rhetorical position.

An interplay of multiple modalities constructs a multilayered semantic source, with verbal subtitles as a frame and a sound system designed to reinforce the theme and mood. In addition, visual modality is implicit in constructing the stance and evaluation of the discourse, with verbal mode playing the role of “anchoring,” i.e. explicit explanation. A synergy of visual, acoustic, and verbal modalities can effectively transmit conceptual, interpersonal, and discursive meanings, but the persuasive result with the audience from different cultural backgrounds may be mixed.

5 Concluding remarks

With the increasing frequency of cross-border exchanges and the advancement of information technology, various social symbols other than linguistic symbols can effectively present an argument and consolidate a stance. For example, visual modality in the form of images can construct a logical argument in the process of reasoning with functions other than aesthetic in a different arguing act. Multimodal argumentation, therefore, is a new trend of interdisciplinary research in linguistics combined with communication and social semiotics. To explore the field and initially construct the framework of multimodal argumentation reasoning and social semantics, this study analyzed the mode and effect of argumentation mode and rhetorical position on the micro-video To build a community of shared future for humankind, we should be together, and concludes that the micro-video as a unique type of argument presentation can orchestrate a synergy of modes in arguing more effectively and efficiently than any other single-modality media production. An investigation through the lenses of Appraisal Theory seems to suggest that there is a pattern in semantic choices when the author utilizes different modalities to compose an argumentative discourse.

Although only a preliminary study, our inquiry may serve as a precursor to the study of multimodal communication in the real and virtual worlds. Future research may shed light on how this multimodal communication transforms the perception of the viewers using cognition measuring apparatus like eye trackers or an ERP. A cross-difference analysis study would also be necessary with a larger data base where more responses from different countries and regions could be compared.



About the author

Ting Wu

Ting Wu (b. 1978) is Associate Professor at the School of Foreign Languages, Southeast University, Nanjing, China. Her research is in second language acquisition and multimodal discourse analysis. Her publications include: “A probe into two micro-lecture videos: A systemic-functional approach to intersemiosis analysis in multimodal discourse” (2017), “On correlation between teachers’ discipline strategies and college students’ willingness to communicate in English” (2016), and “On the communication of Chinese discourse acts from the moral perspective” (2015).

Acknowledgement

The research for this paper was financially supported by the Jiangsu Social Science Fund Project, grant no. 18YYB001 and the “13th Five-Year Plan” Jiangsu Education Science Planning Key Project, grant no. C-a/2016/01/28.

References

Blair, Anthony. 1996. The possibility and actuality of visual arguments. Argumentation and Advocacy 33(1). 23–39.Search in Google Scholar

Fleming, David. 1996. Can pictures be arguments? Argumentation and Advocacy 33. 11–22.Search in Google Scholar

Gilbert, Michael. 1994. Multi-modal argumentation. Philosophy of the Social Sciences 24. 159– 177.10.1177/004839319402400202Search in Google Scholar

Groarke, Leo. 2015. Going Multimodal: What is a mode of arguing and why does it matter? Argumentation 29(2). 133–155.10.1007/s10503-014-9336-0Search in Google Scholar

Halliday, Michael. 1978. Language as social semiotic: The social interpretation of language and meaning London: Edward Arnold.Search in Google Scholar

Johnson, Ralph. 2003. Why “visual arguments” aren’t arguments. In Hans Hansen, Christopher Tindale, Anthony Blair & Ralph Johnson (eds.), Informal Logic at 25: Proceedings of the Windsor Conference CD-ROM, 1–13. OSSA: Windsor, ON.Search in Google Scholar

Kjeldsen, Jens. 2015. The study of visual and multimodal argumentation. Argumentation 29(2). 115–13210.1007/s10503-015-9348-4Search in Google Scholar

Kress, Gunther. 2010. Multimodality: A social semiotic approach to contemporary communication. New York: Routledge.Search in Google Scholar

Kress, Gunther & Theo Van Leeuwen. 1996. Reading images: The grammar of visual design. London: Routledge.Search in Google Scholar

Macken-Horarik, Mary. 2004. Interacting with the multimodal text: Reflections on Image and verbiage in Art Express. Visual Communication 3(1). 5–26.10.1177/1470357204039596Search in Google Scholar

Martin, James. 2000. Beyond exchange: Appraisal system in English. In Susan Hunston & Geoff Thompson (eds.), Evaluation in text: Authorial stance and the construction of discourse Oxford: Oxford University Press.10.1093/oso/9780198238546.003.0008Search in Google Scholar

Martin, James & David Rose. 2003. Working with discourse: Meaning beyond the clause. New York: Continuum International Publishing Group.Search in Google Scholar

Martin, James & Peter White. 2005. The language of evaluation: Appraisal in English. London & New York: Palgrave/Macmillan.Search in Google Scholar

Pan, Yanyan & Zhiheng Zheng. 2017. 国防话语的多模态认知批评视角———以中美征兵宣传片的对比分析为例 [A multimodal cognitive approach to the discourse of defense: A comparative study of Chinese and American conscription promo]. Foreign Languages Research 6. 11–18.Search in Google Scholar

Roque, Georges. 2015. Should visual arguments be propositional in order to be arguments? Argumentation 29(2). 177–195.10.1007/s10503-014-9341-3Search in Google Scholar

Rocci, Andrea & Chiara Pollaroli. 2018. Introduction: Multimodality in argumentation. Semiotica (220). 1–17.10.1515/sem-2017-0150Search in Google Scholar

Tseronis, Assimakis. 2018. Multimodal argumentation: Beyond the verbal/visual divide. Semiotica (220). 41–67.10.1515/sem-2015-0144Search in Google Scholar

Unsworth, Leonard. 2015. Persuasive narratives: Evaluative images in picture books and animated movies. Visual Communication 14(1). 73–96.10.1177/1470357214541762Search in Google Scholar

Van den Hoven, Paul. 2015. Cognitive semiotics in argumentation: A theoretical exploration. Argumentation 29(2). 157–176.10.1007/s10503-014-9330-6Search in Google Scholar

Wang, Zhenhua. 2017. 语类、评价: 理论及其适用性 [Genre, appraisal theory and its applicability]. Journal of University of Science and Technology Beijing (Social Sciences Edition) 1. 1–2.Search in Google Scholar

Zhang, Delu. 2017. 多模态论辩修辞框架探索 [A working framework for multimodal argumentation rhetoric]. Contemporary Rhetoric 1. 1–8.Search in Google Scholar

Published Online: 2020-08-19
Published in Print: 2020-08-26

© 2020 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 14.4.2026 from https://www.degruyterbrill.com/document/doi/10.1515/css-2020-0023/html
Scroll to top button