Multimodal analysis of conjoined comparatives

Gaëlle Ferré

doi:10.1515/cog-2024-0057

Article Open Access

Multimodal analysis of conjoined comparatives

Gaëlle Ferré

Published/Copyright: September 17, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Cognitive Linguistics

Abstract

Repeated comparatives joined by the conjunction and (for instance more and more, bigger and bigger, etc.) are used to express a gradually increasing/decreasing degree of quality or quantity. The repetition in these phrases reveals an even higher/lower degree of the quality or quantity than with a single comparative form. The aim of this study is to determine whether this gradation is associated with specific gesture sequences, whether the progressive aspect inherent in the construction’s semantics is also expressed in the verb of the clause or in gesture, and finally whether the semantic prominence expressed in the phrase finds a correlation with prosodic emphasis or emphasis marked with the use of bodily behaviour like head and eyebrow movements. In the perspective of CxG, the study aims at exploring whether these forms are entrenched in usage as multimodal constructions and how speakers express gradation in space. Results show that although gestures and prosodic emphasis are not frequent enough to consider [COMP 1 and COMP 1] as a multimodal construction, some gesture patterns are certainly more frequent than others, especially the use of iconic gestures in sequences that match the verbal repetition of the comparative.

Keywords: conjoined comparatives; multimodality; prosody; gesture

1 Introduction

The English construction [ COMP 1 and COMP 1], commonly called reduplicated comparatives (Jackendoff 2000) or conjoined comparatives (Quirk et al. 1972),^[1] is part of a larger construction family (Miller 2014: 220–236) that involves two elements linked by the conjunction and, as illustrated in Figure 1 below. The construction can be symmetrical, in which case the same element is repeated, and it may include either an adjective in the comparative form (e.g. bigger and bigger), a verb (e.g. laugh and laugh) or a preposition/adverb (e.g. over and over). The construction can also be asymmetrical in the sense that the repeated element belongs to the same morphological category as the one of the first mention, but is a different item (e.g. bigger and better), so that the construction involves two different comparatives. While conjoined comparatives occur within a single clause, they share some properties with comparative correlatives (e.g. the harder you work, the more you succeed), which also involve scalar progression but are distributed across two coordinated clauses (Hoffmann 2020; Hoffmann et al. 2020). In conjoined comparatives, while the asymmetrical branch of the construction involves more or less set phrases, the symmetrical branch is used to express a repetition of some event or a gradually increasing/decreasing degree of the quality or quantity denoted by the adjective (or adverb) as far as the comparative form is concerned. The repetition of verbs extends the normal temporal expression of the process, whereas conjoined comparatives are typically used for emphasis and express a continuing increase in degree over time (Jackendoff 2000; Quirk et al. 1972), a durational aspect also found in “reduplicative adverbial constructions” (Kuryu 2025). This emphasis is often realised not only through lexical repetition but also through prosodic features, such as increased pitch, duration, or stress (Jackson 2016), which reinforce the speaker’s evaluative stance. As such, conjoined comparatives are a useful site for exploring how emphasis and scalar progression are marked across different modalities. Beyond their emphatic function, conjoined comparatives also convey a sense of aspectual unfolding, particularly a durative interpretation. In this respect, they can be considered as carrying a form of aspect similar to that expressed by conjoined prepositions (Rice and Newman 2004), which is not only limited to processes but may also apply to noun or adjective phrases (Haas 2011). Cienki and Iriskhanova (2018: 39) observe that verbal repetition in expressions such as laugh and laugh serves to convey an imperfective aspect. This insight can be extended to conjoined comparatives, which similarly express a sense of durativity.

Figure 1:

Constructions involving two adjectives, verbs or prepositions linked by the conjunction and.

Understanding how language encodes durative or punctual, bounded or unbounded events is crucial to the study of meaning-making across communicative modes. As noted by Cienki and Iriskhanova (2018: 7), “the capacity to segment our past, present, or future experience into events and construe them as durative or punctual, complete or incomplete entities, is a fundamental cognitive ability that manifests itself both in language and other modes of communication.” While such distinctions are typically associated with verbal aspect, they also extend to other linguistic constructions, such as the conjoined comparative. Comparative forms are considered core grammatical concepts cross-linguistically in the expression of degree (Diessel 2019; Hilpert 2014), and conjoined comparatives, in particular, represent complex linguistic patterns that speakers internalize and use productively. They thus qualify as constructions in the sense of Hilpert (2014: 8). These constructions, understood as socially established form-meaning pairings functioning as intersubjective cues (Leclercq and Morin 2025: 6), are increasingly being analyzed through a multimodal lens, a shift only recently emphasized in constructional research (Leclercq and Morin 2025: 40).

The way we perceive the world is the way we understand it, and we express ourselves accordingly. Yet this does not mean that we perceive the world in just one way (Paradis 2001: 48). Since there is a “direct correspondence between linguistic expression and conceptual structure”, examining the multimodal realization of conjoined comparatives may provide deeper insight into how speakers mentally represent the scalar and aspectual meanings encoded in this construction. In line with usage-based theory, which emphasizes general cognitive processes such as cross-modal form-meaning associations (Cienki and Iriskhanova 2018: 10), investigating these constructions from a multimodal perspective thus contributes to a broader understanding of how grammar, cognition, and communication interact.

Building on these considerations, the general aim of this study is to carry out a multimodal analysis of constructions involving conjoined comparatives in order to get a deeper understanding of how speakers conceive gradation in comparison. The study is based on comparatives that involve a gradation in quantity, dimension (size, height) or distance. These concepts have been chosen over others because they are very likely to be accompanied with gestures since they are deeply rooted in the physical world. Beattie and Shovelton (2006), for instance, showed that size is very likely to be represented through iconic gestures in narratives, using McNeill’s classification (McNeill 1992, 2005). When using comparative forms, speakers have a choice of several gesture types: they may perform iconic gestures to represent the physical aspects evoked in the comparative, but they may also use metaphoric gestures (for instance, a cyclic movement of the hand to represent a gradual increase) or beat gestures (rhythmic up/down movements of the hands performed to express emphasis in speech). Other gesture types are possible as well.

Speakers appear to have some flexibility in how they represent gradual increase or decrease through gesture. This can be conveyed by a single gesture in which the hand(s) move across a portion of the space in front of the speaker, but it may also involve two or more gestures located either in the same or in different parts of the gesture space. When two gestures occur in the same location, they may be interpreted as independent and possibly repeated for emphasis. By contrast, when gestures are produced in distinct locations, they can sometimes be perceived as forming a sequence of two dependent gestures, in which the second (and subsequent) gesture(s) derive part of their meaning in relation to the first, an interpretation that has been discussed in previous work on gesture sequencing and spatial contrast (Ferré 2023; Hinnell 2019; Hinnell 2020; Laparle et al. 2024). While this type of spatial differentiation is often found in the expression of gradation, more work is needed to determine how robust and systematic this pattern is across contexts and speakers.

The present study therefore addresses three interrelated questions concerning the multimodal expression of conjoined comparatives. First, from a semantic and grammatical perspective, conjoined comparatives convey gradation and quantity, but they also encode an aspectual meaning, specifically a durative interpretation. Second, these constructions often serve an emphatic function, prompting the question of whether this emphasis is signaled solely in verbal form, or whether it is also systematically marked prosodically and gesturally. Third, since gestures have been shown to convey both semantic content (e.g., dimension, quantity, distance) and grammatical functions (e.g., aspectual or temporal structure), this study explores whether the gestures that accompany conjoined comparatives reflect primarily one type of meaning or whether they integrate both. In doing so, the study aims to determine how meaning is distributed across modalities, and whether the observed forms of gestural co-expression can be considered part of speakers’ entrenched multimodal constructions. To explore these issues, the following research questions will guide the analysis:

Since gestures can convey both semantic and grammatical information, do the gestures that accompany conjoined comparatives primarily reflect their semantic content (such as dimension, quantity, or distance), their aspectual meaning (such as durativity or temporal unfolding), or do they tend to express both simultaneously?
Given the emphatic nature of conjoined comparatives, are they systematically accompanied by prosodic emphasis or by emphatic gestures such as beats, pointing, or head and eyebrow movements?
Do conjoined comparatives constitute entrenched constructions in speakers’ grammars? If so, are they regularly paired with gestural or prosodic features to the extent that they may be analyzed as multimodal constructions?

The first part of the paper contextualizes the study, presenting the different types of salient features that may be found in the multimodal expression of speech. After a description of the construction itself, the section briefly presents Construction Grammar and the different degrees of entrenchment that can be observed in usage-based approaches of multimodal discourse. The section ends with a summary of the gesture and prosodic (clusters of) features that are to be expected in relation to lexical, aspectual, and prosodic features. The second section presents the data, both in terms of the corpus on which the study is based and the methodology adopted for its annotation. The results section is then divided into four parts that directly answer the research questions: the first and second parts address how gestures convey semantic and/or aspectual information in the construction, the third examines the presence and nature of prosodic emphasis. This is followed by an exploration of the degree of entrenchment and the potential multimodal status of the construction. The paper then finishes with a discussion of the results and a conclusion.

2 Context of the study

2.1 The concept of salience

Salience in spoken discourse can emerge through multiple channels, including the content and structure of utterances, prosodic highlighting, and accompanying gestures. These different types of prominence contribute to guiding the listener’s attention and structuring the flow of information.

2.1.1 Verbal salience

The information in the linguistic message is often presented in a hierarchical manner, i.e. speakers elaborate their discourse by presenting certain information as more salient than others in the spoken chain. Yet, salience in verbal structures can be approached from multiple theoretical angles.

This concept has been defined by Landragin (2011: 68) as “referring above all to the emergence of a figure on a ground, i.e. the highlighting of an element in a message. In linguistics, this emergence is due to prosodic, lexical, syntactic or semantic mechanisms, and its main consequence is the highlighting of an entity (or part of the message), which is thus favoured over the ground (the rest of the message and its context) during the comprehension process.” The figure can be considered as the most salient entity, whereas the ground bears a secondary prominence as compared to the background which is not salient (Talmy 1972). Schmid (2010: 119) also notes that “irrespective of how a cognitive unit has been activated, it is said to be salient if it has been loaded, as it were, into current working memory and has thus become part of a person’s centre of attention.”

As discussed above, salience can be understood in terms of Figure/Ground alignment within cognitive linguistic frameworks such as Langacker’s, where subjects – typically in clause-initial position – are construed as more figure-like and thus more prominent. This view ties salience closely to syntactic position and conceptual prominence.

However, syntactic initiality does not always coincide with informational salience. From a discourse-functional perspective – particularly in frameworks such as the Prague School or systemic-functional linguistics (e.g., Halliday and Matthiessen 2004) – a distinction is drawn between sentence-initial and sentence-final elements in terms of information structure. In these models, the sentence-initial theme typically conveys given or presupposed content, while the sentence-final rheme introduces new, and therefore more informationally salient, content. Thus, informational salience often aligns with clause-final position and prosodic prominence (Schmid 2010; Schmid and Günther 2016).

This distinction can be illustrated with the following examples (from Schmid 2010):

(1)

The red

jar

contains

sugar.

(2)

The sugar is in the red jar.

In (1), the phrase the red jar occupies the theme position and frames the subsequent information (contains sugar). In (2), the sugar becomes the starting point, and the location (in the red jar) carries the new, rhematic information. From an information-structural perspective, then, the final constituent in each case is where the informational prominence lies. This view contrasts with but complements cognitive notions of subject salience.

Salience in verbal expression also often emerges at the syntax–semantics interface, particularly in the use of scalar modifiers, adverbs, and gradable adjectives. Constructions such as bigger and bigger do not merely indicate a change in degree; they also imply that the referent already possesses a significant amount of the property in question, an implicit assertion that reinforces the speaker’s evaluative stance. This results in a kind of double salience that combines scalar progression with a heightened focus on the current state (e.g., size or intensity). However, the precise nature of this emphasis, that is to say whether it aligns more closely with thematic (backgrounding) or rhematic (foregrounding) structures, remains somewhat ambiguous. This semantic and pragmatic salience also raises the question of whether it is marked prosodically, a point explored in the following section.

2.1.2 Prosodic salience

In prosody, the largest part of our linguistic messages is uttered in what is understood as broad focus. In broad focus, the whole utterance is considered as relevant in the activation state of listeners. Broad focus is generally marked in statements by a regularly decreasing pitch, a final falling tone, and the last syllable of the utterance is usually longer too (Féry 2001; Hirst 1998; Katz and Selkirk 2011; Leemann et al. 2016; Shattuck-Hufnagel and Turk 1998). Within an utterance in broad focus, the last lexical item is more likely to carry the nuclear stress, which is the most important stress in the utterance, than other items (Wells 2006). In contrast, only part of the linguistic message is considered as relevant in the activation state of listeners in narrow focus and there is some emphasis on this particular part of speech. Some utterances may even show a distinctive beat prosody which has been defined by Simon and Grobet (2005) as higher density of strongly stressed syllables in utterances.

A gradation can therefore be established in prosody in which a nuclear stress is perceived as more prominent than any other stressed syllable in the Intonation Phrase. Emphatic stresses are in turn perceived as more prominent than nuclear stresses in broad focus first because of more pronounced acoustic features (in terms of intensity, pitch height and variation and syllable lengthening) but also because they come in contrast with the expected non-emphatic final focus. Finally, stresses in beat prosody are also perceived as more salient than a single contrastive stress because of the presence of multiple emphatic stresses which are typically uttered in separate Intonation Phrases and thus detached from the rest of the utterance.

2.1.3 Gesture salience

As far as gestures are concerned, their mere presence supposes that the speaker has selected and highlighted some elements in speech (Alibali and Kita 2010), and gestures tend to align with prosodic salience (Rohrer et al. 2023). These physical features selected by the speaker through coverbal gestures acquire a certain prominence because of the schematic nature of the gestures (Mittelberg 2018). Beyond their alignment with prosodic units, gestures contribute actively to communication: they enhance the fluency and informativeness of speech and engage the listener’s attention in a way similar to what prosody does (Hostetter 2011). Moreover, gestures, especially those that accompany spatial information, significantly improve comprehension and have a mnemonic effect on listeners, and they are thus powerful communicative tools.

These processes are linked to inherent salience, but when speakers intend to emphasize a part of speech, they have other options. They may raise their brows as a way to indicate special emphasis on a word (Krahmer et al. 2002; Krahmer and Swerts 2007; Swerts and Krahmer 2008). They may also move their heads and use specific hand gestures. This is the case of beats, gestures that regularly accompany prosodic emphasis, although it has been shown that they may express emphasis themselves, without any prosodic salience in speech. Pointing gestures have been shown by semioticians Edeline and Klinkenberg (2021) to be linked to the expression of emphasis as well. But whereas beats serve to highlight some part of speech, points highlight some part of space or some concrete referent in that space. Larger gestures are also generally perceived as more salient than smaller versions of the same gestures and may even influence the way speech is perceived (Ferré 2018). Similarly, Müller (2024: 18) notes that not only gesture size, but also gesture height, and the use of gesture clusters all contribute to a gesture’s salience, a property that Cienki (2022: 6) considers to be gradient rather than a binary opposition. Finally, repetition in gesture is also possibly linked to the expression of emphasis, among other functions (Bressem 2014).

It is important to note, however, that gestures are not all of the same type. A crucial distinction must be made between representational gestures, which are tightly linked to the semantic content of speech, and non-representational gestures, which tend to serve pragmatic functions. As discussed by Hostetter (2011), representational gestures (such as iconic gestures) depict aspects of meaning – for instance, the shape of an object or the trajectory of a movement – and directly reflect the speaker’s mental imagery. In contrast, non-representational gestures – often termed pragmatic gestures in the tradition of Kendon (2004) – do not map directly onto the semantic content of the utterance. Instead, they support discourse organization (e.g., turn-taking, emphasis, or topic shifts), play grammatical roles (such as marking contrast or conditionals), or contribute to the interpersonal dimension of communication.

Although both representational and pragmatic gestures can highlight salient elements in discourse, they do not emphasize the same type of information: what becomes prominent in gesture may thus differ significantly in nature depending on the gesture’s function. For instance, a representational gesture might trace the arc of a ball being thrown, visually reinforcing a concrete spatial trajectory mentioned in speech. By contrast, pragmatic gestures such as the recurrent Palm-Up-Open-Hand (PUOH) gesture (Cooperrider et al. 2018; Müller 2004), as well as holding away or sweeping gestures, among others (Bressem and Müller 2014; Harrison 2018; Hinnell and Parrill 2020), do not depict content but instead play grammatical or interactional roles (for example, expressing contrast, open-endedness, or discourse structuring), thereby making salient aspects of speaker stance, argumentation, or conversational flow rather than referential meaning.

One particular gesture of interest in the present study is the cyclic gesture, which has been described in detail by Ladewig (2014, 2020, 2024), Hinnell (2018), and Müller (2024). This recurrent hand movement, which typically involves circular or repetitive motion, is frequently used to express processes that are ongoing, habitual, or iterative in nature. The cyclic gesture is also described in Calbris (2011: 255) as possibly expressing “development, evolution, change, repetition, linked succession” depending on context. The form of the gesture, especially its circularity and temporal extension, iconically maps onto the temporal unfolding of the event described in speech. It thus contributes to the construal of the action as continuous or without a clear endpoint. Cyclic gestures are therefore closely linked to the conceptualization of aspect, especially the imperfective.

Beyond the form of specific gestures, Cienki and Iriskhanova (2018) have proposed a broader approach, as they take into account the movement types of gestures that accompany verbs with different aspectual forms in English, Russian, and French. Their findings show that bounded movements, such as a sharp stop or a clearly delimited trajectory, can align with the perfective aspect by highlighting the completion or delimitability of an event. Conversely, unbounded or repetitive gesture forms, like sweeping, iterative, or circular motions, tend to be associated with imperfective aspect, reinforcing the idea of duration, incompleteness, or processuality. These gestural features thus contribute to the multimodal expression of aspectual meaning.

2.2 Description of the [COMP 1 and COMP 1] construction

In order to better understand the construction, an understanding of simple comparatives is necessary. Paradis (2001) classifies adjectives into two distinct categories: “bounded” adjectives are not gradable or scalar and do not accept comparative or superlative forms (for instance, dead is a bounded adjective). “Unbounded” adjectives like big or high are gradable and are called “scalar” adjectives. They accept comparative and superlative forms and can be combined with scalar modifiers (e.g. very, quite, fairly). They denote a range on a scale which is open-ended. Yet, the author explains that boundedness is not a fixed concept and may change depending on how adjectives are used in constructions. She notably found that “comparatives and superlatives are both bounded in the schematic mode of differentiality, while their cognate base forms are unbounded in their schematic domain of gradability” (Paradis 2001: 60). She does not mention conjoined comparatives, but the inherent repetition and progressive development involved in the construction evokes a reverse move towards unboundedness.

Conjoined comparatives involve a form of repetition that, as noted by Sapir (1921: 79) (quoted in Brdar 2013: 492), reflects a symbolic process commonly used to express concepts such as distribution, plurality, repetition, habitual activity, increase in size, added intensity, and continuance. Several of these functions relate to aspectual notions, as discussed by Jila et al. (2004: 309–310), who link reduplication not only to the perfective aspect but also to continuous, progressive, and habitual readings. Interestingly, as Moravcsik (1978: 317) observes, reduplication can convey seemingly opposite meanings – such as augmentation and diminution. While it is frequently used cross-linguistically to signal increased quantity, this quantity may refer either to the repeated linguistic element or to the emphasis conveyed by referents.

This brings us to the relevance of aspectual categories for understanding the semantics of conjoined comparatives. Following Ogihara (1990), Hinnell (2018) and Cienki and Iriskhanova (2018), a fundamental distinction can be made between the perfective and imperfective aspects. The perfective aspect presents events as temporally bounded wholes, focusing on the event in its entirety, while the imperfective aspect allows access to the internal temporal structure of the event, construing it as ongoing, habitual, or incomplete. Within the imperfective, the progressive form (e.g., is rising) presents the event as dynamically unfolding, without reference to its beginning or end. Conjoined comparatives such as more and more or higher and higher clearly align with the imperfective aspect, as they depict gradual, incomplete developments extending over time.

In contrast, the perfective aspect, as analyzed by Ogihara (1990: 17), “has a result state interpretation – it asserts some after-effect.” That is, it typically expresses either the result of a past event that is still observable at the moment of speaking (He has arrived, meaning he is here now), or the continuation of an event up to the present (He has lived here for ten years). This dual potential of the perfect – resultative and durative – is central to its temporal semantics. Moreover, when an event sentence is followed by a state sentence, the state is often interpreted as temporally overlapping with or resulting from the event, further reinforcing the role of aspect in construing temporal relations between clauses (Ogihara 1990: 27).

While the perfective aspect constructs temporal relationships between events and their resulting states, iconic structures such as reduplication also contribute to the temporal construal of events, not through grammatical inflection, but through formal repetition that maps onto durative or scalar meanings. Givón (1985: 198) argues that reduplication (here exemplified by conjoined comparatives) manifests an “iconic-isomorphic relation between code and coded,” in which the repetition of a form correlates with an increase in meaning, such as size, distance, or quantity. This “iconism” is thought to be grounded in a transparent cognitive principle (Givón 1985: 214). Expressing decrease through repetition is less straightforward, however: even when the semantic value implies reduction, the formal expression involves an increase in verbal material. Yet in both cases, whether increase or decrease, the durational or progressive aspect conveyed through repetition remains constant.

Jackson (2016), in her review of the literature, highlights the link between repetition and emphasis. She discusses the “intensifying effects” of repeated scalar modifiers, noting, for example, that “very very good” conveys a more intense evaluation than “very good.” This expressive or attitudinal function of repetition, observed in scalar adverbs, applies equally to conjoined comparatives. While a sentence like “there are more people in the room” is relatively factual, “there are more and more people in the room” implies a subjective stance (positive or negative) toward the growing number of people, depending on the context. Because conjoined comparatives simultaneously convey scalar meaning, temporal progression, and often an evaluative perspective, they provide rich ground for multimodal elaboration. While emphasis can be expressed in prosody, gestures can reinforce the intensity or direction of the change, mark the speaker’s stance (Debras 2017), or visually map the progression expressed in speech. This layered meaning makes conjoined comparatives particularly suited to being accompanied by specific prosodic features, hand gestures, or other bodily signals.

2.3 Construction Grammar (CxG)

According to Leclercq and Morin (2025: 14), “meaning drives grammar. It occupies a central, generative role in the language system and can by no means be separated from linguistic form”. In other words, Construction Grammar views grammar as inseparable from meaning: rather than being abstract or autonomous, grammatical structures are shaped by and encode meaningful patterns (Fillmore 1988; Goldberg 1995).

Furthermore, Leclercq and Morin (2025) argue that “meaning is usage-based,” and identify three facets of this view: “(i) meaning is emergent because it relies on repeated usage; (ii) meaning is experiential because it is a record of usage contexts; and (iii) meaning is conventional because language use is a social practice” (Leclercq and Morin 2025: 14). This highlights the dynamic, context-sensitive nature of meaning in CxG. It emerges from patterns of use, is shaped by speakers’ experiences in context, and becomes shared through social interaction.

This usage-based perspective also underlies the very definition of a construction in Construction Grammar (Fillmore 1988). Goldberg (1995: 4) defines a construction as “posited in the grammar if it can be shown that its meaning and/or its form is not compositionally derived from other constructions existing in the language.” In this view, constructions are stored as form-meaning pairings, and their psychological reality is grounded in usage and experience. Their degree of entrenchment (defined by Schmid 2020: 2, who extends Langacker (2017)’s definition, as “the continual reorganization of linguistic knowledge in the minds of speakers, which is driven by repeated usage activities in usage events and subject to the exigencies of the conventionalization processes taking place in speech communities”) depends on their frequency and familiarity in speakers’ linguistic experience. Entrenchment plays an essential role in language development, as emphasized by Diessel (2019: 1).

Two points of view have been adopted concerning the existence of “multimodal constructions”, defined by Ziem (2017: 5) as “a conventionalized pairing of a complex form that consists, at least, of a verbal element combined with a kinetic element.” The first viewpoint relies on the strict pairing of gesture/speech forms in which “verbal and gestural components each have to play an essential, not only accidental role” (Ningelgen and Auer 2017: 1). Although gestures are frequent in face-to-face interactions, they are never an obligatory component of the linguistic message (even when they accompany deictic terms), and the existence of multimodal constructions can then be doubted. The second viewpoint, which is the one adopted in this paper, relies on a usage-based model of language, a perspective also adopted by Parrill et al. (2010: ix) for whom “both cognitive and functional (or usage-based) approaches share the assumption that language happens within a social and conceptual context, and that grammar is motivated by use.” In this perspective, entrenchment is conceived as a continuum (Schmid 2020; Zima 2017; Zima and Bergs 2017) from less entrenched constructions to more frequent, and therefore more entrenched ones which can be described as being composed of more or less central and peripheral features (Cienki 2017), so that constructions’ “status as a unit is a matter of degree because each occurrence contributes to its further entrenchment and subsequent ease of activation” (Langacker 2017: 41).

This usage-based model also implies that constructions reflect specific perspectivizations of events, even when they describe the same underlying situation. A well-known illustration is provided by Fillmore et al. (2003), who analyze verbs such as send and receive within the same Transfer frame.^[2] In this frame, the theme refers to the object being transferred (e.g., a letter or package), while the path refers to the trajectory or direction along which the transfer takes place (e.g., from the sender to the recipient). Although both verbs evoke the same general conceptual structure, they highlight different participants as figure, depending on the speaker’s focus. This difference in perspective is encoded constructionally and reflects how language users construe events from distinct vantage points, as shown in Table 1:

Table 1:

Construal differences between the verbs send and receive within the Transfer frame, illustrating how distinct constructions foreground different participants as figure in Construction Grammar (from Fillmore et al. 2003).

Expression	Evoked frame	Figure	Ground
Send	Transfer	Agent (sender)	Recipient, theme, path
Receive	Transfer	Recipient	Agent, theme, path

3 Expected features of the [COMP 1 and COMP 1] multimodal construction

The present paper aims at examining whether all the [COMP 1 and COMP 1] constructions under study show the same degree of entrenchment and if a set of features can be described as recurrent for at least some of these constructions. As stated above, constructions like more and more or higher and higher both involve comparative incrementals, in which repetition signals a gradual, scalar change along a given dimension, be it quantity, intensity, or spatial orientation. These expressions evoke frames that involve change over time or space, where each repeated element marks a step in an ongoing progression.

Take the construction more and more, for example. It evokes a frame of scalar increase, typically in quantity, degree, or intensity. The figure in this frame is the increase: it is what is changing and being foregrounded. The ground, by contrast, is the previous degree or state, which serves as the implicit reference point against which the new increase is evaluated.

Similarly, higher and higher invokes a vertical spatial scale, where each instance of repetition tracks upward movement. The figure is whatever is rising – an object, a person, or an abstract quantity such as value or intensity – while the ground consists of the earlier stages or levels of elevation, which are presupposed and backgrounded in the comparison.

In both cases, the repetition foregrounds a dynamic and continuous process of increase. The most recent, intensified state is construed as the figure, while the prior, less intense states serve as the ground as they provide a cognitive baseline for comparison but remain less prominent in the mental representation.

This construal can be summarized in Table 2, which outlines the frames evoked by each construction and the respective figure/ground structure they encode.

Table 2:

Figure/ground construal in conjoined comparatives (based on the scalar analysis in Fillmore et al. 2003).

Phrase	Evoked frame	Figure	Ground
More and more	Scalar increase	Current higher degree	Previous lower degrees
	(Intensity/quantity)
Higher and higher	Vertical spatial scale	Rising element	Previous lower positions

In light of the theoretical framework outlined above, a set of specific features can be hypothesized as recurrent in the multimodal realization of the conjoined comparative construction. The aim of the present study is to identify which of these features consistently appear in actual usage. The next three subsections examine the lexical, grammatical, and prosodic domains, outlining the predicted features in each and illustrating them with examples drawn from the corpus introduced in the data and methodology section.

3.1 Lexical domain

Since representational gestures are grounded in our embodied knowledge of the world and reflect our everyday interactions with objects and actions (Streeck 2009), we can expect a degree of resemblance between gesture form and the conceptual content it expresses. This is particularly relevant for lexical items whose semantics are closely tied to concrete, embodied experiences, like size or distance. This representation may operate at two levels:

Gesture type: Iconic gestures, which visually represent the shape, motion, or position of referents, are more likely to be used to convey lexical information than metaphoric gestures (e.g., cyclic or PUOH gestures) or beats. For instance, iconic gestures might depict size, height, or motion associated with the compared items in a conjoined comparative.
Gesture sequencing: Given that the construction semantically encodes a comparison between two degrees, typically a prior and a current state, it may trigger the production of dependent gesture sequences. In line with the findings of Laparle et al. (2024) and Hinnell (2019), these could include two similar gestures performed in different spatial locations to highlight contrast or scalar progression.

These two qualities are illustrated in Example 3. The speaker is describing the small size of the lettering on a marquee while uttering smaller and smaller. She performs a sequence of two iconic gestures: drawing her index finger and thumb close together to depict the small size of the letters. The first gesture begins relatively high in the gesture space (Figure 2a) and ends slightly lower (Figure 2b). After a brief pause, she produces a second gesture with the same handshape, this time gliding slightly lower still (Figure 2c) before retracting her hands. The two gestures form a dependent sequence in which the second gesture is interpreted in relation to the first, reinforcing the notion of incremental reduction.

Figure 2:

Two iconic gestures expressing size reduction in Example 3.

(3)

They made the letters get [smaller]_G1 [and smaller]_G2, so by the time you get to New York City it’s really small.

2016-06-02_1600_US_KABC_Live_With_Kelly 3

3
All the examples in this paper are from the NewsScape Library of Digital TV News described in Section 4. The reference includes the name of the program and the date of its airing, as well as the reference of the video clip in the data search.

A similar type of gradation is expressed in Example 4, where the speaker performs two iconic gestures referring to height in the context of mountain climbing, illustrated in Figure 3. As in Example 3, the gestures form a sequence, with the second gesture representing a greater height than the first. We can identify these as two distinct gestures rather than a single gradual gesture based on their kinematic structure. Although there is no full pause between the two movements, the speaker momentarily relaxes the hand trajectory before initiating the second stroke. Each gesture begins with a similar handshape and upward motion, but the slight transition between them suggests a re-articulation rather than a continuous gesture. This temporal and spatial segmentation indicates a gesture sequence composed of two related but separate strokes, rather than a single gesture expressing continuous upward gradation.

Figure 3:

Two iconic gestures expressing height in Example 4.

(4)

You plant yourself in one place, your body adjusts to that lack of oxygen and so on. Then you go [higher]_G1 [and higher]_G2

2016-04-15_0635_US_KABC_Jimmy_Kimmel_Live

3.2 Grammatical domain

Because the conjoined comparative expresses gradual change, it is likely to co-occur with progressive aspectual markers (e.g., be + -ing) rather than resultative forms (e.g., have + -en) or non-marked aspectual forms. Example 5 below gives an illustration of verbal aspect -ing in the clause that contains a conjoined comparative.

(5)

I could see the silhouette on the horizon getting larger and larger and my heartbeat was getting faster.

2016-11-13_1430_US_KCBS_CBS_Sunday_Morning_with_Jane_Pauley

This grammatical aspect may also be reflected gesturally. One might expect the presence of one or two gestures expressing gradation, either a metaphoric gesture of the cyclic type or gestures characterized by a smooth, continuous movement, which would evoke gradual transformation. Whereas punctual, bounded gestures would likely be less compatible with the aspectual semantics of the construction, a single two-handed iconic gesture gradually expanding in shape, for instance, would be more effective to illustrate incremental progression.

This is the case in Example 6 below illustrated in Figure 4, in which the speaker on the right of the screen makes a continuous cyclic gesture with his right hand, initiated as he begins the sentence that contains the construction and continued until the middle of the next clause.

Figure 4:

Single metaphoric (cyclic) gesture in Example 6.

(6)

This is his choice. [You’re seeing more and more athletes, NFL players, who have left the sport early in their careers]_G because of concussions.

2016-02-11_1500_US_KABC_Good_Morning_America_1081-1091

Figure 5 provides an example of a single iconic gesture performed by the speaker, who lifts and opens his hands in a smooth, continuous motion to illustrate expansion and large size. The gradual quality of the gesture (its fluid unfolding over time) mirrors the durative aspect of the event being described. This aspectual meaning is also reinforced linguistically through the use of the verb get combined with the progressive marker be + -ing, indicating an ongoing, incremental process.

Figure 5:

Single iconic gesture in Example 7.

(7)

Of course the pokemon craze just seems to [be getting bigger and bigger]_G

2016-09-22_1500_US_KCBS_This_Morning_2400-2409

Figure 6 below illustrates the same gradual quality of movement, this time in a deictic gesture. As the speaker describes the traffic jam following a car accident, she points to a yellow line on the screen and gradually traces its contour to evoke the long queue of cars stretching away from the accident scene.

Figure 6:

Tracing deictic gesture in Example 8.

(8)

You can see a problem here with two accidents and a very close area in our slowest spot in the morning. It will pack it up, [and this yellow is kind of backing up further and further past the zoo], well past the metro curve this morning.

2016-08-02_1100_US_WUAB_Cleveland_19_News_on_43

3.3 Prosodic domain

At the prosodic level, we may expect markers of salience to align with the construction’s evaluative and contrastive functions. Since conjoined comparatives typically establish a contrast between a current state and a prior reference point, prosodic emphasis is likely to play a role in highlighting this difference.

The construction often conveys both a semantic gradation and a pragmatic evaluative stance, which may be supported by emphatic prosodic cues. These include increased acoustic stress and bodily signals such as hand and head beats or eyebrow raises, all of which contribute to the perception of salience in speech.

Beats, brief, rhythmic up-down movements of the hand or head, are described by Calbris (2011: 255) as having a semiotic “cutting” function. They segment discourse in a way that aligns with bounded events or closed comparisons. As such, both beats and deictic gestures could serve to focus attention on the outcome of the comparison, rather than on the ongoing process of gradation.

In Example 9, the speaker is discussing the growth of blood cells. His delivery is markedly emphatic, as it features strong prosodic contrasts between rapid stretches of speech with reduced syllables and lengthened syllables such as grow and the two instances of the word bigger. Figure 7a shows the speaker standing in front of a background screen, with his arm extended as he performs two small beat gestures with his left hand while saying bigger and bigger. Figure 7b, a screenshot of the example analysed using the acoustic analysis software Praat (Boersma and Weenink 2022), displays the acoustic realisation of the utterance. The elongated segments are clearly visible, and the pitch curve remains relatively high across the two Intonation Phrases.^[4]

Figure 7:

Prosodic and gestural emphasis in Example 9.

(9)

They grow, they get bigger and bigger.

2016-12-15_2100_US_KTTV-FOX_The_Dr_Oz_Show

In Example 10 below, the speaker, former US Secretary of State John Kerry, raises his eyebrows and performs two successive head beats while uttering the conjoined comparative. The two beats, the second of which is shown in Figure 8, serve to underscore his speech. In this instance, emphasis is conveyed solely through bodily behaviour, as there is no acoustic prominence in the comparative.

Figure 8:

Head beat and eyebrow raise in Example 3.

(10)

We need to persuade the reluctant performers within industry. And there are [fewer and fewer] of them, I may say, Mark.

2016-04-05_1100_US_MSNBC_Morning_Joe

All of the features described in this section are summarised in Table 3. In theory, for the construction to fully express its core meaning in a multimodal way, it would ideally be accompanied by two continuous iconic gestures (reflecting both aspectual and lexical meaning) which should be dependent to highlight contrast. It should also co-occur with a verb in the progressive form to mark gradual change and be delivered with emphatic prosody to reinforce its expressive function. However, this ideal configuration represents a theoretical endpoint, and is unlikely to be consistently observed in spontaneous data. The following section presents the data and methodology used to examine which of these features actually occur in context. It outlines the corpus, annotation procedures, and analytical framework adopted in the study.

Table 3:

Summary of expected verbal, gestural and oral correlates of the conjoined comparative construction in the lexical, grammatical, and prosodic domains.

Domain	Features	Description	Function in conjoined comparatives
Lexical	Gesture type	Iconic gestures (vs. metaphoric or beats); visually represent shape, size, motion, etc.	Iconic gestures likely represent embodied lexical content (e.g., size, height) associated with the comparison
	Gesture sequencing	Sequences of similar gestures performed in different spatial locations	Highlight contrast or progression between two degrees (previous vs. current state)
Grammatical	Verbal aspect	Progressive aspect (e.g., be + -ing) preferred over resultative (have + -en) or no aspect	Reflects the semantics of gradual change inherent to the construction
	Gesture dynamics	Smooth, continuous gestures (e.g., metaphoric cyclic gestures, expanding two-handed shapes) vs. punctual, bounded movements	Iconic or metaphoric gestures may signal gradation or continuous increase, in line with the aspectual meaning
Prosodic	Prosodic emphasis	Acoustic stress in speech	Highlights the difference between states and emphasizes the construction’s pragmatic stance
	Multimodal salience	Beats (rhythmic head/hand gestures), eyebrow raises, deictics	Contribute to salience perception; beats and deictics focus attention on the result rather than the process of gradation

4 Data and methodology

The study presented here was conducted using the online archive and facilities of the Distributed Little Red Hen Lab™, co-directed by Francis Steen and Mark Turner.^[5] The NewsScape corpus is part of UCLA’s NewsScape Library of Digital TV News which comprises a variety of genres ranging from TV broadcast news to weather reports and talk shows. It counted 234,432,755 word tokens at the time of the study, although the rest of the archive is constantly enriched with new documents. All of the videos of the corpus were previously automatically transcribed by the Red Hen Lab and tagged with morphological categories.

The corpus was queried using CQPweb (Hardie 2012), a concordancer that includes 3-second video clips of the sequences under study as shown in Figure 9. All of the instances of quantity/dimension/distance were retained except in the case of more and more for which the query returned 3.817 matches and was therefore thinned to 550 hits in order to avoid too great an imbalance while still keeping a sufficient number of occurrences for observation. Queries that returned less than 10 hits were discarded even if the constructions were related to the semantic fields under study.

Figure 9:

The CQP web concordancer (Hardie 2012).

Query results where then cleaned to remove false hits and duplicates in video clips (which happens when the same video footage is used in different programs, but also if the speaker pronounces a sequence of three comparatives in a row as in bigger and bigger and bigger). The resulting query was then exported to the rapid annotator presented in Uhrig (2018). The annotator is particularly useful as users enter their coding scheme and then proceed with the annotation by clicking on values, as shown in Figure 10 below.

Figure 10:

The rapid annotator developed by Uhrig (2018) for the Red Hen Lab.

The coding scheme that was used for this study involved the parameters and values listed in Table 4. One of the features of this type of corpus is that speakers may not necessarily be visible on screen as there is quite frequently a voice-over with video footage for some events in broadcast news. Speakers’ hands may also not be visible on screen in close-up shots. This means that all the gesture and bodily behaviour annotations must include a “not visible” value. All the other parameters include a “Non applicable” value (N/A), used when no other value in the scheme is appropriate or when the parameter cannot be observed on screen.^[6] As shown in the coding scheme, four sets of parameters were of interest in this study: verbal content, hand gestures, head and eyebrow movements, and prosody. Any gesture performed at the beginning of the clip but not related to the construction or not performed in overlap with it was not coded here. Triplets (bigger and bigger and bigger) were considered as one mention of the construction and only the first two items counted for the annotation.

Table 4:

Coding scheme used in the annotation process and Kappa results of the intercoder agreement test following the interpretation proposed by Landis and Koch (1977).

Parameters	Values	Kappa
Verbal domain
Semantic field	Dimension, Distance, Quantity
Gradation	Increase, Decrease
Verbal aspect in the clause^a	N/A, BE + ING, HAVE + EN, HAVE + EN + BE + ING	0.75 substantial agr.
Phrase introduced	N/A, NP, VP, AdjP, AdvP, PP	0.82 almost perfect agr.
Gestural domain

(a) Hands

Hand gesture presence	Yes, No, Not visible	0.87 almost perfect agr.
Hand gesture type	N/A, Beat, Deictic, Iconic, Metaphoric, Other	0.66 substantial agr.
Hand gesture relationships	N/A, 1 gesture, 2 independent gestures, 2 dependent gestures	0.69 substantial agr.
Hand gesture dynamics	N/A, gradual, punctual, both	0.88 perfect agr.

(b) Face and head

Head movement presence	Yes, No, Not visible	0.58 moderate agr.
Head movement type	N/A, Shake, Nod, Tilt, Turn, Forward, Backward	0.35 Fair agr.
Eyebrow movement presence	Yes, No, Not visible	0.83 almost perfect agr.
Eyebrow movement type	N/A, Raise, Frown	0.63 substantial agr.
Prosodic domain
Prosodic emphasis	Presence of nuclear stress on the construction based on perception: Yes, No	0.58 moderate agr.

^aSome constructions belong to two different clauses, as in I could see more and more people becoming frustrated with the situation, where more and more people is both object of the verb see and subject of become. In such a case, BE + ING would be coded as present, although the first clause has no grammatical aspect. Besides, when the construction was inserted in a clause that contained a periphrastic construction like be going to, this also counted as a form of BE + ING aspect.

Annotations were carried out across the verbal, gestural, and prosodic domains. In the verbal domain, as outlined in a previous section, the semantic fields selected for analysis were dimension, distance, and number, as utterances in these domains are particularly likely to involve multimodal expression. The meanings conveyed within these three semantic fields may be interpreted either literally or metaphorically, depending on the context of the utterance. In all cases, the gradation expressed in the conjoined comparative construction may be either increasing or decreasing.

Dimension includes references to size, height, length, width, or other physical properties (e.g., bigger, smaller, taller, wider).
Distance concerns spatial relations or movement across space (e.g., further, closer).
Number relates to quantity, frequency, or countability (e.g., more and more, fewer, increasingly numerous).

The verbal domain also included the annotation of verb aspect, whether perfective (marked by have + -en), durative (marked by be + -ing), or both, as well as the syntactic type of the phrase containing the construction.

Several types of annotation were carried out in the gestural domain.^[7] Hand gestures were coded using McNeill’s classification (McNeill 1992, 2005), which includes iconics (gestures that represent objects or actions), beats (rhythmic gestures), deictics (pointing gestures), and metaphorics (gestures that represent abstract concepts). This last category includes the grammatical gestures mentioned in the theoretical background section (e.g. cyclic gestures, PUOH gestures, sweeping away gestures), which can be interpreted as conveying metaphorical meaning (Cienki 2021; Cienki and Müller 2008; Mittelberg 2018). The coding scheme also included the category other to account for rare gesture types found in this corpus.

Gesture movements were further classified as either punctual or gradual. Punctual movements are brief and refer to a single, precise point in space, such as a quick point or tap-like motion that highlights a specific location or moment in the discourse. In contrast, gradual movements may unfold over a longer duration and involve a continuous spatial trajectory, such as slowly tracing a path or progressively extending the arm.^[8] Sequences of two gestures were annotated as a single punctual or gradual gesture sequence, since gestures produced in close succession generally share the same dynamic quality (Cienki and Iriskhanova 2018: 138). However, some gestures were classified as involving both types of movement dynamics. This occurred either when a gesture displayed features of both continuity and a clear spatial landmark, or when a sequence of two unrelated gestures in a row displayed contrasting dynamics, for instance, a punctual gesture immediately followed by a more gradual one, or vice versa.

In addition to hand gestures, the annotation also included head and eyebrow movements, which are frequent markers of emphasis, contrast, or speaker stance. Head movements were classified according to type (e.g. nods, shakes, tilts, jerks, etc.), and eyebrow movements as raises or frowns.

Finally, prosodic emphasis was annotated based on auditory perception, focusing on the presence or absence of nuclear stress, defined in Section 2.1.2, on the conjoined comparative construction. Annotators listened to each excerpt and judged whether the construction carried prosodic prominence, typically marked by greater pitch movement, increased duration, or higher intensity on the stressed element. The parameter was binary, with possible values of “Yes” (nuclear stress present) or “No” (no perceivable nuclear stress).

Intercoder reliability was measured on 10 % of the data using the “irr” package (Gamer et al. 2012) in R version 4.1.2 (R Core Team 2024) based on the annotations of two students at the University of Poitiers. The first coder was an undergraduate student with no prior training in multimodal analysis, while the second was a graduate student who had completed a course on multimodality. Both were naive to the purposes of the study. Kappa results were quite good overall, although they were a bit lower for the detection of head movements (speakers often perform minute head movements which can be interpreted either as beats/nods or as unrelated to linguistic content) and prosody, so that the results obtained in the present study will have to be considered with caution and will need to be confirmed in further work. Gradation and semantic field were coded separately and were not included in the intercoder reliability test since the values were quite straightforward for all the forms apart from a few instances.

5 Results

Before analysing the multimodal features of conjoined comparatives, it is interesting to examine the distribution of occurrences across the three semantic fields selected for the present study. A total of 1,879 occurrences were included in the study but the first thing to be noticed in Table 5 below is that there is a strong bias towards the expression of increasing gradations in the corpus which are almost 1.5 times more frequent than decreasing gradation in this type of corpus. Public media has a strong tendency to account for increasing degrees of a quality/quantity rather than decreasing degrees and will be more likely to speak of more and more people voting for a party rather than fewer and fewer people voting for another party. News discourse often emphasizes intensification and positive framing to attract attention and convey urgency or growth (Van Dijk 1991). The second point of interest is that some constructions are much more frequent than others, with closer and closer being the most frequent construction expressing a decreasing distance, followed by less and less and fewer and fewer for quantity whereas dimension is expressed mainly in terms of size with smaller and smaller and height with lower and lower. The fact that dimension includes several notions (width, height, length) involves the distribution of occurrences across several constructions.

Table 5:

Number of occurrences of the construction in each semantic field.

Construction	Dimension	Distance	Quantity	Total
Decrease	111	464	190	765

Closer and closer		464		464
Fewer and fewer			75	75
Less and less			106	106
Lower and lower	34			34
More and more			2	2
Narrower and narrower	9			9
Shorter and shorter	8		1	9
Slimmer and slimmer	7			7
Smaller and smaller	53		6	59

Increase	388	169	557	1,114

Bigger and bigger	200	1	8	209
Further and further		163		163
Greater and greater			17	17
Higher and higher	107		8	115
Larger and larger	34			34
Longer and longer	29	1	36	66
More and more	3	4	488	495
Wider and wider	15			15

Total	499	633	747	1,879

With increasing gradation, quantity has more occurrences than the other two semantic fields (557 occurrences) and would even have had more if we had not been constrained to reduce the number of occurrences for more and more (a process we did not have to apply to the corresponding negative gradation expressed by less and less). Dimension comes in second position with 388 occurrences, and is mainly expressed in terms of size (bigger and bigger) and height (higher and higher), whereas longer and longer is used both to measure a dimension and a quantity of time. Distance comes last with 169 occurrences largely expressed by further and further.

Now considering the table as a whole, one sees that with the exception of the two decreasing occurrences of more and more, there are as many types of constructions used to express decreasing and increasing gradation, and overall, quantity is more often mentioned than distance and dimension respectively in this type of public media corpus.

5.1 Lexical domain

5.1.1 Hand gesture type

In order to examine how the lexical domain is expressed in the construction, we analyzed the distribution of gesture types and sequencing within the corpus. Of the 1,879 constructions, only 523 (27.8 %) were accompanied by a hand gesture. Among the 685 video clips in which the hands were visible, 162 (8.6 %) showed the construction being uttered without any hand gesture. This suggests that the construction tends to elicit hand gestures in 76.3 % of occurrences.

A Chi2 and Fisher exact test were conducted using the R software (R Core Team 2024) to determine which gesture types were most commonly associated with the construction. The test revealed a significant difference in the proportions of gesture types (X-squared = 131.63, df = 3, p-value < 0.001^[9]). Analysis of the residuals confirmed that iconic gestures were more frequently associated with the conjoined comparative construction, whereas deictic gestures occurred less often.

The test also revealed a significant difference among the three semantic fields (X-squared = 34.71, df = 6, p-value < 0.001). The residuals indicate that iconic gestures are particularly associated with the expression of dimension, while distance is more frequently accompanied by deictic gestures than the other two semantic fields. Quantity is more often expressed through metaphoric and beat gestures.

5.1.2 Hand gesture sequences

As far as gesture sequences are concerned, the test also revealed statistically significant differences in proportions (X-squared = 34.55, df = 2, p-value < 0.001). There is a strong tendency for the construction to be accompanied by a sequence of two dependent gestures. There is however no significant difference among the three semantic fields (X-squared = 3.08, df = 4, p-value > 0.05).

5.2 Grammatical domain

In order to explore the grammatical potential of the construction, its aspectual dimension was examined to determine whether gradual increment is expressed through accompanying verbal aspect or through the dynamics of the hand gestures performed. The Fisher test revealed a highly significant difference in verbal aspect proportions for the construction (n = 1879, X-squared = 1,606.4, df = 3, p-value < 0.001). The residuals show that the construction is equally likely to be accompanied by no aspect at all or by a progressive aspect (be +-ing). The perfective aspect (have + -en) is very unlikely, and a mix of the two aspects even less so. There is a slight difference between the three semantic fields in this respect (X-squared = 20.929, df = 6, p-value = 0.001) with distance being more often expressed with be+-ing while the highly complex have + -en+be+-ing rather marks quantity, but this result should be considered with caution because of the small number of occurrences of the complex aspectual form.

In terms of hand gesture dynamics, the test reveals a significantly higher proportion of punctual gestures (n = 523, X-squared = 98.161, df = 2, p-value < 0.001) compared with gradual gestures. The residuals also show that the production of mixed-type gestures is extremely unlikely. There is a slight difference between the three semantic fields in this respect (X-squared = 12.524, df = 4, p-value = 0.01), with a greater tendency for dimension to be expressed with a gradual rather than a punctual gesture, but again, residuals are quite low. We saw earlier that gesture sequences (one gesture, two (in)dependent gestures) did show a significantly higher proportion of two dependent gestures. Now, when this feature is measured in association with gesture dynamics, the test is highly significant (X-squared = 232.33, df = 8, p-value < 0.001) and residuals reveal that the most frequent form used with conjoined comparatives is a sequence of two dependent punctual gestures. The percentage of each form is given in Table 6 below. There is a slight difference between the three semantic fields (distance, quantity, dimension) in this respect (X-squared = 29.492, df = 16, p-value < 0.05), but the highest residual is negative and indicates that dimension is typically not expressed with a single punctual gesture, a pattern which is found more often in the field of quantity.

Table 6:

Number and percentage of gesture patterns and verbal aspects in the visible speaker condition (n = 685; 1 = 1 gesture, 2 = 2 gestures, GRAD = gradual, PUNCT = punctual, DEP = dependent, INDEP = independent).

Hands	Nb	%	Aspect	Nb	%
1 BOTH	14	2.04 %	BE + ING	252	49.41 %
1 GRAD	96	14.01 %	HAVE + EN	12	1.69 %
1 PUNCT	41	5.99 %	HAVE + EN + BE + ING	7	1.18 %
2 DEP BOTH	57	8.32 %	N/A	325	47.45 %
2 DEP GRAD	47	6.86 %
2 DEP PUNCT	132	19.27 %
2 INDEP BOTH	11	1.61 %
2 INDEP GRAD	31	4.53 %
2 INDEP PUNCT	94	13.72 %
No gesture	162	23.65 %

We can therefore conclude that the aspectual dimension is either conveyed in the progressive form of the verb or not at all. It is in no way conveyed by the dynamics of accompanying hand gestures.

5.3 Prosodic domain

Prosodic weight of the constructions was calculated with the following scores: a positive score of 1 was assigned to (a) any head beat, (b) any eyebrow raise and (c) presence of nuclear stress. All other values were assigned a score of 0. The total prosodic score for each occurrence thus ranged between 0 (no prosodic emphasis in speech or gesture) and 3 (highest prosodic emphasis). Scores of 0–1 were then considered as unstressed, whereas scores of 2–3 were considered as stressed. A Fisher exact test was then conducted only on the non-final forms in which head and eyebrows were visible (n = 777) since final forms are almost always stressed (94 %, X-squared = 90.291, df = 1, p - value < 0.001) due to the Last Lexical Item rule mentioned in the state of the art which assigns nuclear stress to the last lexical item in broad focus by default. The question was to know whether the forms would be stressed when not final (and would therefore show a stronger degree of salience than expected). The test showed a small effect of semantic field (X-squared = 9.6554, df = 2, p - value < 0.05) with constructions expressing quantity being less often stressed than the other two, whereas forms expressing distance are more often stressed.

5.4 Conjoined comparatives in multimodal constructions

In order to test the predictions made for multimodal constructions, the distribution of the data was calculated in a four-dimensional way, as shown in Table 7, based on videos in which head, eyebrow, and hand gestures were visible (n = 448).

Table 7:

Number of gestures or gesture sequences accompanying a verbal aspect in the clause and mean prosodic score (range = 0–3) for each association of features.

Gestures/aspect	BE + ING	HAVE + EN	HAVE + EN	N/A	Total/
			+BE + ING		Mean
1 BOTH	8	N/A	N/A	6	14
	1.13	N/A	N/A	1.33	1.21
1 GRAD	40	2	N/A	36	78
	0.88	0	N/A	0.86	0.85
1 PUNCT	17	1	N/A	16	34
	0.82	1	N/A	0.63	0.74
2 DEP BOTH	26	2	N/A	23	51
	1	1.5	N/A	1	1.02
2 DEP GRAD	13	N/A	1	20	34
	1	N/A	2	0.9	0.97
2 DEP PUNCT	59	3	2	53	117
	0.93	1	2	0.77	0.88
2 INDEP BOTH	4	1	N/A	4	9
	1	1	N/A	1	1
2 INDEP GRAD	13	N/A	1	16	30
	0.69	N/A	2	0.88	0.83
2 INDEP PUNCT	37	1	1	42	81
	1	0	0	1.02	0.99
Total	217	10	5	216	448
Mean	0.93	0.8	1.6	0.89	0.92

Besides, prosodic mean was calculated for each association based on the scores attributed to each construction described in Section 5.3. The values are presented in Table 7. Hand beats were not included in the calculation of the prosodic score so as to avoid the addition of the same variable twice in the model. Mean prosodic score was then calculated for each [gesture sequence x movement dynamics x aspect] combination and inserted into a separate column into the table. The three variables were then plotted into a graph using the ggplot2 package in R (Wickham 2016), which is reproduced in Figure 11. It plots the relationship between gesture configurations (x-axis) and verbal aspect (y-axis). Each dot’s size indicates the number of occurrences for a given pairing of features, while the colour gradient represents the mean prosodic score, with lighter blues reflecting lower prosodic emphasis and darker blues indicating higher levels of prosodic prominence.

Figure 11:

Multimodal feature clusters of the [COMP 1 and COMP 1] construction.

The most notable pattern is the frequent use of sequences of two dependent gestures, which occur more often than any other gesture configuration. This is followed by configurations involving a single gesture or two independent gestures, e.g. repetitions of the same movement.

On the verbal aspect axis, the most common patterns are BE + ING and cases where aspect is not expressed verbally (labelled as N/A), and both are evenly distributed across the various gesture types. In contrast, HAVE + EN and especially the complex form HAVE + EN + BE + ING are rare.

Prosodically, the mean emphasis tends to be low across most feature combinations, with slightly higher values observed in constructions that include two dependent gestures. The highest mean prosodic score (1.2) is found in the case of a single gesture combining both punctual and gradual dynamics, although this remains relatively low compared to the highest possible score of 3.

In order to test the interactions between factors, a generalized linear mixed model (GLMM, binomial family) was applied to the data using the lme4 package in R (Bates et al. 2015). Results are summarized in Table 8. The fixed factors were Hand gestures, Aspect^[10] and Prosody and the random factor was Form of the construction as this might explain some of the variation observed in the data. The intercept is significant in all of the gesture patterns, except in single punctual gestures and sequences of two gradual independent gestures. What is immediately obvious from the table is that, although the form is not very emphatic, there is still an effect of prosody on the predictability of gesture patterns, with a higher effect for punctual and gradual dependent gestures. Verbal aspect doesn’t play any role in the predictability of gesture patterns, which is in line with the fact that the same aspects are used whatever the assemblage of gesture features, namely BE + ING or no aspect at all, as illustrated in Figure 11.

Table 8:

Fixed effects table for the generalized linear mixed model (GLMM) fitted to the number of gradual and punctual hand gestures detected in visible condition.

	Value	Std. error	DF	t-Value	p-Value
One gradual gesture

(Intercept)	4.917002	1.6976850	429	2.896298	0.0040	***
Aspect	0.054320	0.1670904	429	0.325096	0.7453	ns
ProsoMean	−7.239484	1.9603515	429	−3.692952	0.0003	***

One punctual gesture

(Intercept)	−0.1602097	1.5415150	429	−0.1039300	0.9173	ns
Aspect	0.0166956	0.1630075	429	0.1024224	0.9185	ns
ProsoMean	−2.4778487	1.7309902	429	−1.4314632	0.1530	ns

2 Grad. Dep. gest.

(Intercept)	−3.388989	1.1028096	429	−3.0730498	0.0023	***
Aspect	−0.035575	0.1197842	429	−0.2969898	0.7666	ns
ProsoMean	2.301170	1.1994611	429	1.9185033	0.0557	*

2 Punct. Dep. gest.

(Intercept)	−4.348216	0.9984944	429	−4.354772	0.0000	***
Aspect	0.100762	0.1014305	429	0.993411	0.3211	ns
ProsoMean	4.386782	1.0798753	429	4.062304	0.0001	***

2 Grad. Indep. gest.

(Intercept)	0.1304996	1.5927146	429	0.0819353	0.9347	ns
Aspect	−0.0573465	0.1659187	429	−0.3456299	0.7298	ns
ProsoMean	−2.7861841	1.8028866	429	−1.5454018	0.1230	ns

2 Punct. Indep. gest.

(Intercept)	−16.27716	2.3662275	429	−6.878949	0.0000	***
Aspect	−0.11906	0.1495385	429	−0.796182	0.4264	ns
ProsoMean	16.23145	2.5199580	429	6.441157	0.0000	***

Significance levels are indicated as *p < 0.05, **p < 0.01, and ***p < 0.001.

Now that the most frequent forms of multimodal constructions have been determined for the [COMP 1 and COMP 1] construction, a last question remains: can we say that the multimodal construction is entrenched in usage for speakers? In order to answer this question, it is useful to look again at Table 6 above, in which the percentage of gesture patterns and verbal aspects has been reported independently of any interaction between the two. As far as gesture patterns are concerned, it is interesting to note that the conjoined comparative is accompanied by at least one gesture in 75 % of the occurrences, which is in itself quite a high percentage. However, the gestures involved in the constructions can be single gestures or sequences of two or more gestures. Even if there is a higher percentage of two or more dependent gestures, in which the meaning of the second gesture is to be understood in relation to the meaning of the first one, representing a third of the total occurrences, the percentage is too low to speak of a high degree of entrenchment. The same is true of verbal aspect. The preferred verbal aspect in the clauses that contain a conjoined comparative is close to 50 % which is not a very high percentage when compared with the possibility of the construction to be expressed in a clause with no verbal aspect at all (47 % of the occurrences).

6 Discussion and conclusion

This study examined the multimodal realization of the conjoined comparative construction in English, using a usage-based approach grounded in Construction Grammar. Focusing on constructions that express distance, dimension, or quantity, the study analysed 1,879 instances from the Red Hen corpus, with the aim of investigating how semantic and aspectual meaning, as well as emphasis, are distributed across verbal, gestural, and prosodic modalities.

One of the central questions was whether the durative meaning inherent in conjoined comparatives is supported by the progressive aspect in the clause or mirrored in the dynamics of the gestures. The results show that the construction is equally likely to appear with no verbal aspect or with be + -ing, while the perfective aspect is rare, and have + en+be+ing even more so. However, contrary to expectations, gesture dynamics do not correlate with verbal aspect: punctual gestures were more frequent than gradual ones, and mixed dynamics were extremely rare. This suggests that although the construction encodes a durative meaning linguistically, this is not systematically reinforced gesturally. There is no compensation between the verbal and gestural domains as far as aspect is concerned, and the lexical dimension is clearly favoured over the grammatical one.

While hand gestures were visible in only about a quarter of all cases due to video framing, when visible, they accompanied the construction in over 75 % of those instances. Among these, iconic gestures were strongly associated with expressions of dimension, deictic gestures were more common for distance, and quantity tended to elicit metaphoric or beat gestures. As far as gesture sequences and dynamics are concerned, the data revealed a preference for sequences of two dependent punctual gestures in which the second gesture is interpreted in relation to the first. This contrasts with the higher proportion of cyclic gestures and beats observed by Kuryu (2025) in reduplicative adverbial constructions (e.g., over and over, again and again).

The expectation that the construction’s emphatic function would be reinforced prosodically and/or by bodily behaviour was only partially confirmed. Most constructions occur at the end of the clause, a position shown by Schmid (2010) and Schmid and Günther (2016) to be more prominent in terms of information structure, as it typically corresponds to the rheme and carries the default nuclear stress. In the subset of examples where the construction was not final in the Intonation Phrase (and therefore not automatically prominent), prosodic salience remained relatively low overall. However, a small effect of semantic field was found: constructions expressing quantity were even less frequently emphasized prosodically than those expressing distance or dimension. The highest prosodic scores were associated with constructions involving a single gesture of mixed dynamics, though even these scores remained modest compared to the maximum possible value.

The presence or absence of prosodic and gestural marking in conjoined comparatives raises important questions about how speakers conceptualise and mentally represent scalar change. If certain instances of the construction are accompanied by prosody and gesture, while others are not, this suggests variability not just in surface expression, but potentially in the degree of conceptual salience or embodied activation of the meaning being conveyed. From a cognitive perspective, the multimodal realisation of comparatives may reflect how vividly or dynamically speakers are engaging with the idea of gradation. The direct implication is that some comparisons are mentally processed more as situated, embodied experiences, while others remain more abstract or schematic. This opens the door to further research on the cognitive load, processing depth, and communicative intent associated with multimodal versus unimodal instantiations of grammatical constructions.

The final aim of the study was to assess whether conjoined comparatives could be considered entrenched multimodal constructions. While they are frequently accompanied by hand gestures when visible, and while certain gesture configurations (such as two dependent gestures) are more frequent, the overall frequency of these pairings remains too low to support claims of strong multimodal entrenchment. Likewise, prosodic and gestural emphasis – although occasionally present – do not appear consistently enough to qualify as conventionalized features of the construction.

Nevertheless, the recurrent use of representational gestures and specific bodily behaviours suggests that some instances of the construction may be moving along the entrenchment continuum, particularly when used in semantically embodied contexts. These findings support a nuanced view of multimodal constructions as gradient rather than categorical, aligning with Zima (2017)’s, Schmid (2020)’s, and Langacker (2017)’s view of entrenchment as a matter of degree.

However, it would be interesting in the future, to compare the conjoined comparative construction with single comparatives and superlatives as they may reveal differences in assemblages of verbal, prosodic and gestural features. It should also be kept in mind that the present study is based on a limited number of (potentially) multimodal occurrences of the construction and that further research should be conducted on this topic before definitive conclusions can be drawn.

Yet, beyond its specific focus, this study contributes more broadly to cognitive linguistics by demonstrating how the expression of meaning is distributed across multiple modalities and how different semiotic resources interact, or fail to interact, in the encoding of grammatical and semantic features. It reinforces the idea that constructions are not purely linguistic but can involve embodied and prosodic components, thereby supporting a more integrated, usage-based account of language as multimodal and grounded in general cognitive processes.

Corresponding author: Gaëlle Ferré, FOReLLIS (ER 15076) & University of Poitiers, Poitiers, France, E-mail: gaelle.ferre@univ-poitiers.fr

Acknowledgments

This study received no external funding, but I am deeply grateful to Mark Turner and Francis Steen for granting me access to the NewsScape corpus of the Distributed Red Hen Lab, as well as to the two students who participated in the intercoder reliability experiment. I would also like to thank Peter Uhrig for providing access to the Rapid Annotator he developed (Uhrig 2018). Finally, I am especially indebted to the two anonymous reviewers and to the journal’s editor, Hans-Jörg Schmid, for their insightful comments on an earlier version of this paper.

Conflict of interest: The author declares none.
Data availability: The datasets and R codes used in this study are available at https://osf.io/6v2uh/ (Open Science Framework). Raw data that support the findings of this study are available at the UCLA NewsScape Library of International Television News.

References

Alibali, M. W. & S. Kita. 2010. Gesture highlights perceptually present information for speakers. Gesture 10(1). 3–28. https://doi.org/10.1075/gest.10.1.02ali.Search in Google Scholar

Bates, D., M. Mächler, B. Bolker & S. Walker. 2015. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67(1). 1–48. https://doi.org/10.18637/jss.v067.i01.Search in Google Scholar

Beattie, G. & H. Shovelton. 2006. When size really matters: How a single semantic feature is represented in the speech and gesture modalities. Gesture 6(1). 63–84. https://doi.org/10.1075/gest.6.1.04bea.Search in Google Scholar

Boersma, P. & D. Weenink. 2022. Praat: Doing phonetics by computer (Version 6.2.22) [Computer program].Search in Google Scholar

Brdar, M. 2013. Adjective reduplication and diagrammatic iconicity. In M. Liovic (ed.), Sanjari i znanstvenici. Zbornik u čast 70-godišnjice rođenja Branke Brlenić-Vujić, 489–514. Osijek: Sveuciliste Josipa Jurja Strossmayera, Filozofski fakultet.Search in Google Scholar

Bressem, J. 2014. Repetitions in gesture. In C. Müller, A. Cienki, E. Fricke, S. Ladewig, D. McNeill & J. Bressem (eds.), Body – language – communication, vol. 2, 1641–1649. Berlin, München & Boston: De Gruyter Mouton.Search in Google Scholar

Bressem, J. & C. Müller. 2014. In C. Müller, A. Cienki, E. Fricke, S. Ladewig, D. McNeill & J. Bressem (eds.), A repertoire of German recurrent gestures with pragmatic functions, vol. 2, 1575–1591. Berlin, München & Boston: De Gruyter Mouton.10.1515/9783110302028.1575Search in Google Scholar

Calbris, G. 2011. Elements of meaning in gesture. Amsterdam & Philadelphia: John Benjamins.10.1075/gs.5Search in Google Scholar

Cienki, A. 2017. Utterance Construction Grammar (UCxG) and the variable multimodality of constructions. Linguistics Vanguard 3(s1). paper 20160048. https://doi.org/10.1515/lingvan-2016-0048.Search in Google Scholar

Cienki, A. 2021. From the finger lift to the palm-up open hand when presenting a point: A methodological exploration of forms and functions. Languages and Modalities 1. 1–14. https://doi.org/10.3897/lamo.1.68914.Search in Google Scholar

Cienki, A. 2022. The study of gesture in cognitive linguistics: How it could inform and inspire other research in cognitive science. WIREs Cognitive Science 13(6). e1623. https://doi.org/10.1002/wcs.1623.Search in Google Scholar

Cienki, A. & O. K. Iriskhanova. 2018. Aspectuality across languages. Amsterdam & Philadelphia: John Benjamins.10.1075/hcp.62Search in Google Scholar

Cienki, A. & C. Müller. 2008. Metaphor and gesture. Amsterdam & Philadelphia: John Benjamins Publishing Company.10.1075/gs.3Search in Google Scholar

Cooperrider, K., N. Abner & S. Goldin-Meadow. 2018. The Palm-Up Puzzle: Meanings and origins of a widespread form in gesture and sign. Frontiers in Communication 3(23). 1–16. https://doi.org/10.3389/fcomm.2018.00023.Search in Google Scholar

Debras, C. 2017. The shrug: Forms and meanings of a compound enactment. Gesture 16(1). 1–34. https://doi.org/10.1075/gest.16.1.01deb.Search in Google Scholar

Diessel, H. 2019. The grammar network: How linguistic structure is shaped by language use. Cambridge: Cambridge University Press.10.1017/9781108671040Search in Google Scholar

Edeline, F. & J.-M. Klinkenberg. 2021. L’index. Un dispositif sémiotique puissant et méconnu. In D. Bertrand & I. Darrault-Harris (eds.), À même le sens. Hommage à Jacques Fontanille, 253–263. Limoges: Lambert Lucas.Search in Google Scholar

Ferré, G. 2018. Gesture/speech integration in the perception of prosodic emphasis. In Proceedings of Speech Prosody, 35–39. Poznan, Poland.10.21437/SpeechProsody.2018-7Search in Google Scholar

Ferré, G. 2023. Pragmatic gestures and prosody. In Proceedings of the GESPIN Conference, paper 6893. Nijmegen.Search in Google Scholar

Féry, C. 2001. Focus and phrasing in French. In C. Féry & W. Sternefeld (eds.), Audiatur vox sapientiae. a festschrift for arnim von stechow, 153–181. Berlin: Akademie-Verlag.10.1515/9783050080116.153Search in Google Scholar

Fillmore, C. J. 1988. The mechanisms of Construction Grammar. In Proceedings of the Fourteenth Annual Meeting of the Berkeley Linguistics Society, 35–55. Berkeley: Berkeley Linguistics Society.10.3765/bls.v14i0.1794Search in Google Scholar

Fillmore, C. J., C. R. Johnson & M. R. Petruck. 2003. Background to framenet. International Journal of Lexicography 16(3). 235–250. https://doi.org/10.1093/ijl/16.3.235.Search in Google Scholar

Gamer, M., J. Lemon & P. Singh. 2012. irr: Various coefficients of interrater reliability and agreement. R package version 0.84.Search in Google Scholar

Givón, T. 1985. Iconicity, isomorphism, and non-arbitrary coding in syntax. In J. Haiman (ed.), Iconicity in syntax, 187–219. Amsterdam & Philadelphia: John Benjamins.10.1075/tsl.6.10givSearch in Google Scholar

Goldberg, A. E. 1995. Constructions. A Construction Grammar approach to argument structure. Chicago & London: The University of Chicago Press.Search in Google Scholar

Haas, P. 2011. L’expression de l’aspect grammatical dans le domaine nominal: le cas de en plein Naction. Travaux de Linguistique 63(2). 85–107. https://doi.org/10.3917/tl.063.0085.Search in Google Scholar

Halliday, M. & C. Matthiessen. 2004. An introduction to functional grammar, 3rd edn. London: Hodder Arnold.Search in Google Scholar

Hardie, A. 2012. CQPweb – combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics 17(3). 380–409. https://doi.org/10.1075/ijcl.17.3.04har.Search in Google Scholar

Harrison, S. 2018. The impulse to gesture: Where language, minds, and bodies intersect. Cambridge: Cambridge University Press.10.1017/9781108265065Search in Google Scholar

Hilpert, M. 2014. Construction Grammar and its application to English. Edinburgh: Edinburgh University Press.Search in Google Scholar

Hinnell, J. 2018. The multimodal marking of aspect: The case of five periphrastic auxiliary constructions in North American English. Cognitive Linguistics 29(4). 773–806. https://doi.org/10.1515/cog-2017-0009.Search in Google Scholar

Hinnell, J. 2019. The verbal-kinesic enactment of contrast in North American English. American Journal of Semiotics 35(1-2). 55–92. https://doi.org/10.5840/ajs20198754.Search in Google Scholar

Hinnell, J. 2020. Language in the body: Multimodality in grammar and discourse (PhD Thesis). Alberta: University of Alberta.Search in Google Scholar

Hinnell, J. & F. Parrill. 2020. Gesture influences resolution of ambiguous statements of neutral and moral preferences. Frontiers in Psychology 11. paper 587129. https://doi.org/10.3389/fpsyg.2020.587129.Search in Google Scholar

Hirst, D. 1998. Intonation in British English. In D. Hirst & A. Di Cristo (eds.), Intonation systems. A survey of twenty languages, 56–77. Cambridge: Cambridge University Press.Search in Google Scholar

Hoffmann, T. 2020. Multimodal Construction Grammar: From multimodal constructs to multimodal constructions. In X. Wen & J. R. Taylor (eds.), The routledge handbook of cognitive linguistics, 78–92. New York & London: Routledge.10.4324/9781351034708-6Search in Google Scholar

Hoffmann, T., T. Brunner & J. Horsch. 2020. English comparative correlative constructions: A usage-based account. Open Linguistics 6(1). 196–215. https://doi.org/10.1515/opli-2020-0012.Search in Google Scholar

Hostetter, A. B. 2011. When do gestures communicate? A meta-analysis. Psychological Bulletin 137(2). 297–315. https://doi.org/10.1037/a0022128.Search in Google Scholar

Jackendoff, R. 2000. Curiouser and curiouser. Snippets 1. 8.Search in Google Scholar

Jackson, R. C. 2016. The pragmatics of repetition, emphasis and intensification (PhD Thesis). Salford: University of Salford.Search in Google Scholar

Jila, G., R. Jackendoff, R. Nicole & K. Russell. 2004. Contrastive focus reduplication in English (The salad-salad paper). Natural Language & Linguistic Theory 22(2). 307–357. https://doi.org/10.1023/B:NALA.0000015789.98638.f9.10.1023/B:NALA.0000015789.98638.f9Search in Google Scholar

Katz, J. & E. Selkirk. 2011. Contrastive focus vs. discourse-new: Evidence from prosodic prominence in English. Language 87(4). 771–816. https://doi.org/10.1353/LAN.2011.0076.Search in Google Scholar

Kendon, A. 2004. Gesture. Visible action as utterance. Cambridge: Cambridge University Press.10.1017/CBO9780511807572Search in Google Scholar

Krahmer, E., Z. Ruttkay, M. Swerts & W. Wesselink. 2002. Pitch, eyebrows and the perception of focus. In B. Bel & I. Marlien (eds.), Proceedings of Speech Prosody 2002, 443–446. Aix en Provence: Laboratoire Parole et Langage.10.21437/SpeechProsody.2002-96Search in Google Scholar

Krahmer, E. & M. Swerts. 2007. The effects of visual beats on prosodic prominence: Acoustic analyses, auditory perception and visual perception. Journal of Memory and Language 57. 396–414. https://doi.org/10.1016/j.jml.2007.06.005.Search in Google Scholar

Kuryu, D. 2025. Crossmodal collostructional analysis of English [ADV and ADV] constructions: Multimodal constructions or crossmodal collostructions? Language and Cognition 17. e39. https://doi.org/10.1017/langcog.2025.8.Search in Google Scholar

Ladewig, S. H. 2014. Recurrent gestures. In C. Müller, A. Cienki, E. Fricke, S. H. Ladewig, D. McNeill & S. Teßendorf (eds.), Body – language – communication, 1558–1574. Berlin & Boston: Mouton de Gruyter.Search in Google Scholar

Ladewig, S. H. 2020. Integrating gestures. Berlin & Boston: de Gruyter Mouton.10.1515/9783110668568Search in Google Scholar

Ladewig, S. H. 2024. Recurrent gestures: Cultural, individual, and linguistic dimensions of meaning-making. In A. Cienki (ed.), The Cambridge handbook of gesture studies, 32–55. Cambridge: Cambridge University Press.10.1017/9781108638869.003Search in Google Scholar

Landis, J. R. & G. G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33(1). 159–174. https://doi.org/10.2307/2529310.Search in Google Scholar

Landragin, F. 2011. De la saillance visuelle à la saillance linguistique. In O. Inkova (ed.), Saillance. aspects linguistiques et communicatifs de la mise en évidence dans un texte, 67–84. Besançon: Presses Universitaires de Franche-Comté.Search in Google Scholar

Langacker, R. W. 2017. Entrenchment in cognitive grammar. In H.-J. Schmid (ed.), Entrenchment and the psychology of language learning, 39–56. Berlin & Boston: De Gruyter Mouton.10.1037/15969-003Search in Google Scholar

Laparle, S., G. Ferré & M. Scholman. 2024. More than one gesture but less than two? Inter-stroke dependencies in form and meaning. In Proceedings of the 26th international conference on human-computer interaction, 245–264. Washington DC, USA.10.1007/978-3-031-61066-0_15Search in Google Scholar

Leclercq, B. & C. Morin. 2025. The meaning of constructions. Cambridge: Cambridge University Press.10.1017/9781009499620Search in Google Scholar

Leemann, A., M.-J. Kolly, Y. Li, R. Chan, G. Kwek & A. Jespersen. 2016. Towards a typology of prominence perception: The role of duration. In Proceedings of speech prosody, 445–449. Boston, USA.10.21437/SpeechProsody.2016-91Search in Google Scholar

McNeill, D. 1992. Hand and mind: What gestures reveal about thought. Chicago & London: The University of Chicago Press.Search in Google Scholar

McNeill, D. 2005. Gesture and thought. Chicago & London: University of Chicago Press.Search in Google Scholar

Miller, G. 2014. English Lexicogenesis. Oxford: Oxford University Press.10.1093/acprof:oso/9780199689880.001.0001Search in Google Scholar

Mittelberg, I. 2018. Gestures as image schemas and force gestalts: A dynamic systems approach augmented with motion-capture data analyses. Cognitive Semiotics 11. 1–21. https://doi.org/10.1515/cogsem-2018-0002.Search in Google Scholar

Moravcsik, E. A. 1978. Reduplicative constructions. In H. J. Greenberg, C. A. Ferguson & E. A. Moravcsik (eds.), Universals of human language. Volume 3: Word structure, 297–334. Stanford: Stanford University Press.Search in Google Scholar

Müller, C. 2004. Forms and uses of the palm Up open hand. In C. Müller & R. Posner (eds.), The semantics and pragmatics of everyday gestures: Proceedings of the Berlin conference April 1998, 233–256. Berlin: Weidler.Search in Google Scholar

Müller, C. 2024. A toolbox of methods for gesture analysis. In A. Cienki (ed.), The Cambridge handbook of gesture studies, 182–216. Cambridge: Cambridge University Press.10.1017/9781108638869.009Search in Google Scholar

Ningelgen, J. & P. Auer. 2017. Is there a multimodal construction based on non-deictic so in German? Linguistics Vanguard 3(s1). 20160051. https://doi.org/10.1515/lingvan-2016-0051.Search in Google Scholar

Ogihara, T. 1990. The semantics of the progressive and the perfect in English. In H. Kamp (ed.), Dyana deliverable r2.3.13, esprit basic research action br3175, 2–40. Edinburgh: University of Edinburgh, Centre for Cognitive Science.Search in Google Scholar

Paradis, C. 2001. Adjectives and boundedness. Cognitive Linguistics 12. 47–64. https://doi.org/10.1515/cogl.12.1.47.Search in Google Scholar

Parrill, F., M. Turner & V. Tobin (eds.). 2010. Meaning, form, and body. Stanford: CSLI Publications.Search in Google Scholar

Quirk, R., S. Greenbaum, G. Leech & J. Svartvik. 1972. A grammar of contemporary English. London: Longman.Search in Google Scholar

R Core Team. 2024. R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Available at: https://www.R-project.org/.Search in Google Scholar

Rice, S. & J. Newman. 2004. Aspect in the making: A corpus analysis of English aspect-marking prepositions. In M. Achard & S. Kemmer (eds.), Language, culture, and mind, 313–327. Chicago: CSLI Publications.Search in Google Scholar

Rohrer, P. L., E. Delais-Roussarie & P. Prieto. 2023. Visualizing prosodic structure: Manual gestures as highlighters of prosodic heads and edges in English academic discourses. Lingua 293. 103583. https://doi.org/10.1016/j.lingua.2023.103583.Search in Google Scholar

Sapir, E. 1921. Language. An introduction to the study of speech. New York: Harcourt, Brace.Search in Google Scholar

Schmid, H.-J. 2010. Entrenchment, salience, and basic levels. In D. Geeraerts & H. Cuyckens (eds.), The Oxford handbook of cognitive linguistics, 117–138. Oxford: Oxford University Press.10.1093/oxfordhb/9780199738632.013.0005Search in Google Scholar

Schmid, H.-J. 2020. The dynamics of the linguistic system. Usage, conventionalization, and entrenchment. Oxford: Oxford University Press.10.1093/oso/9780198814771.001.0001Search in Google Scholar

Schmid, H.-J. & F. Günther. 2016. Toward a unified socio-cognitive framework for salience in language. Frontiers in Psychology 7. 1–4. https://doi.org/10.3389/fpsyg.2016.01110.Search in Google Scholar

Shattuck-Hufnagel, S. & A. Turk. 1998. The domain of phrase-final lengthening in English. The Journal of the Acoustical Society of America 103(5). 2889. https://doi.org/10.1121/1.421798.Search in Google Scholar

Simon, A.-C. & A. Grobet. 2005. Interprétation des scansions rythmiques en français. In Actes du colloque interface discours prosodie, 1–19. Aix en Provence.Search in Google Scholar

Streeck, J. 2009. Gesturecraft the manufacture of meaning. Amsterdam & Philadelphia: John Benjamins Publishing Company.10.1075/gs.2Search in Google Scholar

Swerts, M. & E. Krahmer. 2008. Facial expression and prosodic prominence: Effects of modality and facial area. Journal of Phonetics 36. 219–238. https://doi.org/10.1016/j.wocn.2007.05.001.Search in Google Scholar

Talmy, L. A. 1972. Semantic structures in English and Atsugewi (Unpublished doctoral dissertation). Berkeley, CA: University of California at Berkeley.Search in Google Scholar

Uhrig, P. 2018. NewsScape and the distributed little Red Hen Lab – A digital infrastructure for the large-scale analysis of TV broadcasts. In A.-J. Zwierlein, J. Petzold, K. Böhm & M. Decker (eds.), Anglistentag 2017 in Regensburg. Proceedings of the Conference of the German Association of University Teachers of English, 99–114. Trier: Wissenschaftlicher Verlag Trier.Search in Google Scholar

Van Dijk, T. A. 1991. News as discourse. New York & London: Routledge.Search in Google Scholar

Wells, C. John. 2006. English intonation. An introduction. Cambridge: Cambridge University Press.Search in Google Scholar

Wickham, H. 2016. ggplot2: Elegant graphics for data analysis. New York: Springer-Verlag.10.1007/978-3-319-24277-4_9Search in Google Scholar

Ziem, A. 2017. Do we really need a multimodal Construction Grammar? Linguistics Vanguard 3(s1). paper 20160095. https://doi.org/10.1515/lingvan-2016-0095.Search in Google Scholar

Zima, E. 2017. On the multimodality of [all the way from X PREP Y]. Linguistics Vanguard 3(s1). paper 20160055. https://doi.org/10.1515/lingvan-2016-0055.Search in Google Scholar

Zima, E. & A. Bergs. 2017. Multimodality and Construction Grammar. Linguistics Vanguard 3(s1). paper 20161006. https://doi.org/10.1515/lingvan-2016-1006.Search in Google Scholar

Received: 2024-06-02

Accepted: 2025-08-30

Published Online: 2025-09-17

This work is licensed under the Creative Commons Attribution 4.0 International License.

https://doi.org/10.1515/cog-2024-0057

Keywords for this article

conjoined comparatives; multimodality; prosody; gesture

Creative Commons

BY 4.0