Semantic and contextual constraints on the causative alternation in English: a multifactorial analysis

Jiyoun Kim; Hanjung Lee; Ye-eun Cho

doi:10.1515/cllt-2024-0047

Article Open Access

Semantic and contextual constraints on the causative alternation in English: a multifactorial analysis

Jiyoun Kim , Hanjung Lee and Ye-eun Cho

Published/Copyright: April 22, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Corpus Linguistics and Linguistic Theory

Abstract

Many verbs in English show causative and noncausative uses. The goal of this paper is to identify a factor most strongly associated with the realizations of the causative alternation. We report a corpus study that tested effects of three semantic and contextual factors – intentionality, contextual identifiability, and external causality – against 3,864 instances of causative and noncausative uses of 135 alternating verbs extracted from the automatically parsed British National Corpus. Our results of a series of multifactorial analyses of the corpus data indicate that intentionality and contextual identifiability are significantly associated with the realizations, with contextual identifiability being the most predictive factor: causative situations with a clear identifiable agent are realized predominantly as a causative, whereas those with a less clear, non-agentive cause are generally expressed noncausatively. Building on Rappaport Hovav, Malka. (2014). Lexical content and context: The causative alternation in English revisited. Lingua 141. 8–29. DOI:10.1016/j.lingua.2013.09.006, Rappaport Hovav, Malka. (2020). Deconstructing internal causation. In Elitzur A. Bar-Asher Siegal & Nora Boneh (eds.), Perspectives on causations: Selected papers from the Jerusalem workshop 2017, 219–256. Berlin: Springer and Lee, Hanjung. (2023). Cause identifiability and the causative alternation in English: A corpus-based analysis. Linguistic Research 40(3). 353–385, accounts of contextual constraints on the causative alternation, we propose that the observed pattern of form-meaning associations in the data can be interpreted as a consequence of general principles of communicative efficiency.

Keywords: causative alternation; cause identifiability; change-of-state verbs; communicative efficiency; external causality; intentionality

1 Introduction

Many verbs in English show the two argument realization options in (1a) and (1b), which together constitute the causative alternation (also known as the “anticausative” or “causative/inchoative” alternation). This alternation is characterized by verbs that have both causative and noncausative uses, where the causative use roughly means ‘cause to V-intransitive’. The causative variant expresses causation wherein the causer brings about a change in location or state of the causee or patient as in (1a), whereas the noncausative variant expresses just the caused event as in (1b). Concomitantly, the causative variant includes a causer argument absent in the noncausative variant. This paper investigates semantic and contextual factors which determine the (non)expression of the cause argument.^[1]

(1)

a.	Perry broke the fence.	(causative variant)
b.	The fence broke.	(noncausative variant)

There is growing consensus in the literature that the constraints on the alternation involve both lexical and nonlexical factors (e.g., Rappaport Hovav 2014, 2020; Rappaport Hovav and Levin 2012; Schäfer 2008), contra earlier approaches that tried to attribute the alternation to the lexically specified nature of the event and the cause argument (e.g., Alexiadou et al. 2006; Levin and Rappaport Hovav 1995; Reinhart 2002). Recent quantitative studies incorporate various semantic and contextual factors such as the nature of causation and the identifiability of the ultimate cause of the event in specific discourse contexts, to successfully account for the corpus frequency distribution of the alternation variants (e.g., Haspelmath et al. 2014; Lee 2023; Samardžić and Merlo 2018). However, these factors have been considered in isolation by different scholars, and few studies have simultaneously investigated them in relation to the causative alternation. This gap led us to ask whether the nature of causation and cause identifiability in context influence the choice of the alternation variant when tested simultaneously. The goal of this paper is to identify the factor most strongly associated with the variant (that is, causative or noncausative) on the basis of a careful analysis of 3,864 instances of causative and noncausative uses of 135 alternating verbs extracted from the British National Corpus (BNC).

This paper is organized as follows. Section 2 provides a review of major theoretical and empirical approaches to the lexical and other constraints on the causative alternation in English. Section 3 describes the data source and the data extraction process and discusses the variables tested in this study. Section 4 introduces the statistical methods and presents the results of the quantitative analyses. The results of a series of multifactorial analyses indicate that both intentionality and contextual identifiability are significantly associated with the realizations, with contextual identifiability emerging as the most predictive factor. In Section 5, we discuss a possible explanation of these findings and the cross-linguistic implications of this study. Section 6 concludes the paper by discussing further theoretical and empirical implications of this study and challenges for future work.

2 Theoretical background

This section examines major theoretical approaches to the causative alternation and quantitative studies on the frequencies of the causative and noncausative uses of alternating verbs.

2.1 Theoretical approaches to constraints on the causative alternation

Previous approaches to the causative alternation fall into two major classes. The first class assumes that the main constraint on the alternation derives from the lexical characterization of the cause argument. This class of approaches has been most fully developed in Levin and Rappaport Hovav (1995) and Reinhart (2002, 2016). The second class assumes that the constraints on the alternation follow from the lexical classification of the verb together with nonlexical factors, stressing the interplay of lexical and nonlexical factors in explaining the limited productivity of the alternation. This class of approaches is represented by Alexiadou (2010), Alexiadou and Doron (2012), Alexiadou et al. (2006), Rappaport Hovav (2014, 2020), and Rappaport Hovav and Levin (2012), despite significant differences among their analyses.

A key assumption shared by many accounts of the causative alternation that represent these two classes of approaches is that the alternation is enabled by the underlying lexical structure of the alternating verbs (e.g., Alexiadou et al. 2006; Levin and Rappaport Hovav 1995; Reinhart 2002, 2016; Samardžić and Merlo 2018, among others). In their seminal work, Levin and Rappaport Hovav (1995) first proposed that the crucial lexical-semantic distinction for characterizing the class of alternating verbs is between verbs denoting internally caused eventualities and those denoting externally caused eventualities. They assume that verbs denoting internally caused eventualities, such as bloom and blossom, are lexically monadic, selecting only the argument denoting the entity undergoing the change specified by the verb. Conversely, verbs denoting externally caused eventualities, such as destroy and open, are lexically dyadic, additionally selecting a cause argument. On their account, a subset of verbs denoting externally caused events, such as break and open, undergo a process of lexical binding of the external argument, which prevents the external argument from being expressed syntactically, thus resulting in the noncausative variant of the alternation. Reinhart (2002, 2016) provides a similar characterization of the class of alternating verbs. Like Levin and Rappaport Hovav’s (1995) analysis, her analysis of the causative alternation takes the causative variant to be basic, deriving the noncausative variant from it via a lexical operation of decausativization. This operation applies to verbs that simply specify that their subject is a cause, subsuming agents, natural forces, and instruments (Reinhart 2016: 25).

Thus, both Levin and Rappaport Hovav’s (1995) and Reinhart’s (2002, 2016) accounts characterize the class of alternating verbs over the causative variant of the alternation as verbs with no lexical specification for the causing event. This characterization predicts that alternating verbs can appear with a wide range of semantic types of NPs, as shown in (2a) and (2b). In contrast, the verb murder specifies something about the causing event (it must involve intention). Hence, this verb cannot undergo lexical binding or decausativization, and does not participate in the alternation, as shown in (3).

(2)

a.	The vandals/the rocks/the storm broke the windows.
b.	The butler/the key/the wind opened the door.

(3)

a.	The hit men/the bullets/the plan murdered the gangster.
b.	*The gangster murdered.

A challenge encountered by such lexically-oriented accounts is that some verbs which fit the characterization of the class of alternating verbs do not alternate, as illustrated for destroy and kill in examples (4) and (5). As noted out by Rappaport Hovav (2014: 15), English verbs of destruction and killing are often classified as externally caused and do not lexically specify any particular cause of the state change, as indicated by the fact that these verbs allow a range of NPs as cause as in (4a) and (5a). Nevertheless, these verbs do not alternate, as shown in (4b) and (5b).

(4)

a.	The vandals/the storm/the intense heat destroyed the crops.
b.	*The crops destroyed.	(Rappaport Hovav 2014: 15, (31))

(5)

a.	The marauders/the position/the cold killed the chickens.
b.	*The chickens killed.	(Rappaport Hovav 2014: 15, (32))

Another challenge for Levin and Rappaport Hovav’s (1995) and Reinhart’s (2002, 2016) lexically-oriented accounts arises because the causative alternation is not entirely constrained by the lexical specification of the verb; contextual factors also play a critical role. If the lexical specification of the verb were solely responsible for constraining the alternation, then all externally caused verbs that are compatible with NPs, bearing a range of semantic roles in subject position, would be expected to appear in both variants, regardless of the properties of the NP chosen as the theme. However, this is not the case. Change of state verbs that may select agents, natural forces, and instruments as subjects may still not exhibit noncausative uses for some choices of causative variant themes. As Levin and Rappaport Hovav (1995) note, causative sentences of change-of-state verbs − the prototypical causative alternation verbs − lack a noncausative counterpart, as demonstrated in examples (6) and (7).

(6)

a.	He broke his promise/the contract/the world record.
b.	His promise/The contract/*The world record broke.
		(Levin and Rappaport Hovav 1995: 85, (9))

(7)

a.	The waiter cleared the table.
b.	*The table cleared.	(Levin and Rappaport Hovav 1995: 104, (55))

Rappaport Hovav and Levin (2012) demonstrate that similar patterns are found with other change of state verbs such as empty, lengthen and shorten. These verbs conform to the lexical specification of the class of alternating verbs, and do indeed alternate, as exemplified by clear in (8). However, in certain cases, a noncausative variant is ill-formed in isolation, as in (7b) above.

(8)

a.	The wind cleared the sky.
b.	The sky cleared.	(Levin and Rappaport Hovav 1995: 104, (55))

Levin and Rappaport Hovav (1995: 102) propose that the rule of lexical binding is limited to cases where the verb denotes an “eventuality [that] can come about spontaneously without the intervention of an agent” (cf. Haspelmath 1993; Smith 1978). This accounts for the contrast between (8b), which describes an event that can take place without an agent, and (6b) and (7b) which cannot.

However, the spontaneity condition on application of lexical binding is problematic on both theoretical and empirical grounds. First, it is not the verb’s meaning that determines spontaneity or agent involvement; instead, world knowledge and properties of events tell us that certain events, such as clearing of the sky and lengthening of the days, do not need the intervention of an agent, while others, like clearing of the counter/the table and lengthening of the skirt, necessitate such intervention. Thus, the constraint on the availability of variants appears to be non-lexical in nature (Rappaport Hovav and Levin 2012; Schäfer 2008). Rather, as Rappaport Hovav (2014) accurately observes, it seems to be a constraint on the suitability of the noncausative variant in describing events of a certain type. The non-lexical nature of this constraint on the alternation thereby challenges the basic assumptions of Levin and Rappaport Hovav’s (1995) and Reinhart’s (2002, 2016) lexically-oriented accounts, which assume that all alternating verbs are inherently dyadic and the noncausative form is derived through a lexical rule that removes the cause argument.

A further empirical problem of the spontaneity condition on the application of lexical binding is the prevalence of noncausative uses of alternating verbs that describe events involving an unspecified external agent. Consider the following examples of noncausative sentences of open from the BNC in (9), discussed by Lee (2023: 360).^[2] These examples describe events of opening that require the participation of an agent: opening shops and opening a village; these contrast with opening a door which could be done by an agent, an instrument, or even a natural force (see (2b)).

(9)

a.	It is hoped that the builders will be on site by July and <the first shops will open by Christmas 1994>. (BNC W:misc, K98-126)
b.	<A second phase of the village is due to open> in July 1991 featuring a traditional Irish thatched cottage, entertainment, a street market at weekends and more craft workshops. (BNC W:misc, B29-636)

In example (9), the addressee may not be able to determine the specific identity of the agent, but can infer the type of agent, such as the owner of a shop and the local government. In such cases, the unexpressed agent is type-inferable, though its exact identity remains unidentifiable, irrelevant and unimportant. This kind of inferable agent is generally disfavored in causative uses, if not entirely excluded. However, contrary to this observation, Levin and Rappaport’s (1995) account predicts that agentive uses of verbs lack a noncausative variant, as on their account a process of lexical binding is restricted to verbs denoting an event that can come about spontaneously.

Rappaport Hovav and Levin (2012) and Rappaport Hovav (2014) propose a non-derivational approach to the alternation which diverges from their own earlier analysis and those of Reinhart (2002, 2016), as well as non-derivational analyses embodied in Alexiadou et al. (2006) and others. In the realm of change-of-state verbs, Rappaport Hovav (2014) makes a three-way classification, summarized in Table 1.

Table 1:

Rappaport Hovav’s (2014) three-way classification of change-of-state verbs.

Verbs that lexically select an external cause argument (non-alternating in English)	Verbs that specify something about the nature of the involvement of an external cause, such as murder and assassinate
	Verbs that specify nothing about the nature of the causing event, for example, kill and destroy
Verbs lexically associated with an internal argument only (alternating in English)	Verbs that specify the nature of the change of state but nothing specific about the cause of the change of state, such as break, open and clear

According to Rappaport Hovav (2014), verbs such as murder and assassinate, which lexicalize both a volitional action and a change of state, are lexically associated with an external cause argument, and this argument cannot be omitted. Verbs of destruction and killing share this pattern, as their external cause argument cannot be omitted. Consequently, the verbs of the first two classes are indistinguishable by lexical adicity in English. In contrast, verbs that allow alternation in English are lexically associated solely with their patient. Rappaport Hovav and Levin (2012) and Rappaport Hovav (2014) provide systematic evidence supporting this view. The addition of the cause is non-lexical, driven by the well-known constraint that the cause must be construable as a direct cause – a cause with immediate control over the eventuality. Building on Rappaport Hovav and Levin (2012), Rappaport Hovav (2014) suggests that a causer can be realized with a change-of-state verb just when the event being described involves direct causation in the sense defined by Wolff (2003: 5). This constraint contributes to the determination of the semantic type of NPs allowed as the external argument in given contexts. It does not, however, explain why alternating verbs differ with respect to how often they are used as a causative and as a noncausative and which variant is appropriate in a given discourse context. For this, we need a theory which accounts for the (non-)appearance of the cause argument.

Rappaport Hovav (2014) provides a pragmatic account of the (non-)appearance of the cause argument. Drawing on Grice’s (1975) maxims of conversations, she argues that if the expression of the cause is deemed relevant, it is typically preferred since the causative variant is more informative. The degree of informativeness of the causative variant is influenced by factors such as contextual identifiability and predictability of the cause. For instance, if the cause is recoverable in some way from context, then the sentence with the cause expressed is no longer more informative than the corresponding sentence that expresses just the change of state. In that case, mentioning the cause seems superfluous and the noncausative may be preferred from economy considerations like Grice’s (1975) Maxim of Manner, which dictates avoiding prolixity.

Sometimes the speaker will leave the agent unmentioned though the causative variant would be more informative. This happens, according to Rappaport Hovav (2014), when the speaker does not know the cause; consequently, the noncausative is required because the speaker cannot truthfully specify the cause. Rappaport Hovav (2014, 2020) shows that this account covers constraints on the alternation which have been taken to be lexical and also those which are more clearly non-lexical in earlier accounts.

The inclusion of contextual constraints in the analysis of the causative alternation has recently gained ground in an empirical work by Lee (2023). In the following subsection, we discuss this work and other quantitative studies on the causative alternation.

2.2 Quantitative studies on the causative alternation

Quantitative studies on the causative alternation have focused on the morphosyntactic form of alternating verbs, which has emerged as a major locus of cross-linguistic variation. In his typological work, Haspelmath (1993), distinguishes five types of encoding of the causative alternation: (i) the marked causative type, where the causative variant is formally marked as opposed to the noncausative variant (illustrated by Georgian in Table 2); (ii) the marked anticausative type, where the noncausative variant is formally marked as opposed to the causative variant (as in the Polish example); (iii) the labile type, with no formal change in the verb occurs (as seen in English); (iv) the equipollent type, where both the causative and the noncausative variant bear special morphology that is attached to a shared stem (as in the Japanese example); and (v) the suppletive type, where the two variants are expressed by verbs which are formally not related (as in the Russian example). The different marking strategies illustrated in Table 2 represent only the most prevalent markings. Languages may adopt different strategies for different verbs. For instance, in Korean, the intransitive version of the verb meaning ‘open’ is morphologically marked, whereas the transitive version is not marked (i.e., yel-li- ‘open (intransitive)’ versus yel- ‘open (transitive)’). On the other hand, the transitive version of the verb meaning ‘freeze’ is morphologically marked, while the intransitive version remains unmarked (i.e., el- ‘freeze (intransitive)’ versus el-li- ‘freeze (transitive)’).

Table 2:

Encoding types of the causative alternation (Haspelmath 1993, adapted).

Type	Example
(i) Marked causative	Georgian
(i) Marked causative	duγ-s ‘cook (intransitive)’ – a-duγ-ebs ‘cook (transitive)’
(ii) Marked anticausative	Polish
(ii) Marked anticausative	złamać się ‘break (intransitive)’ – złamać ‘break (transitive)’
(iii) Labile	English
(iii) Labile	break ‘break (intransitive)’ – break ‘break (transitive)’
(iv) Equipollent	Japanese
(iv) Equipollent	atum-aru ‘gather (intransitive)’ – atum-eru ‘gather (transitive)’
(v) Suppletive	Russian
(v) Suppletive	goret’ ‘burn (intransitive)’ – žeč ‘burn (transitive)’

Numerous empirical studies have shown that the morphological marking of the causative alternation correlates with spontaneity (Croft 1990; Haspelmath 1993, 2008, 2016; Haspelmath et al. 2014; Nedjalkov 1969). Haspelmath (1993: 103) contends that the more spontaneous an event is, the more likely it is to be expressed with a marked causative:

[A] factor favoring the anticausative expression type [=marked anticausative] is the probability of an outside force bringing about the event. Conversely, the causative expression type [=marked causative] is favored if the event is quite likely to happen even if no outside force is present. (Haspelmath 1993: 103; modified)

This study has been extended, and its results have been reinterpreted with a novel explanation in Haspelmath (2008, 2016) and Haspelmath et al. (2014). In Haspelmath’s view, the recurrent pattern in the encoding of the alternation reflects a principle of efficient communication: unmarked forms remain unmarked due to their high frequency of use (as measured by corpus frequency) irrespective of the lexical properties of verbs.

The relation between the lexical properties of verbs and formal encodings of the causative alternation has been examined in a recent corpus study by Samardžić and Merlo (2018). They revisit the concept of external causation, defining it as the likelihood of an external agent’s involvement in an event described by a verb. They refer to this property as Sp for spontaneity (following Haspelmath [1993]), and consider the degree of Sp as a scalar component integrated into the lexical representation of alternating verbs, assigned as a common property across the causative and noncausative realizations (Samardžić and Merlo 2018: 905). Statistical analyses of data extracted from the English component of the parallel corpus Europarl reveal that the corpus-based measure of Sp correlates with the rank of verbs based on the C/A ratio calculated from the typology of the morphological marking of the verbs across languages.^[3] Samardžić and Merlo (2018) interpret this correlation as evidence that the probability of external causation is a grammatically relevant lexical property, influencing both the corpus frequency distribution of structural realizations of alternating verbs and the typological distribution of their morphological markings: both are seen as effects of the distribution of Sp-value across verb types. This interpretation posits that the ratio between causative and noncausative uses of a verb serves as a quantifiable indicator of a verb’s lexical property.

Another notable result of this work is that the Sp-value exhibits a normal distribution across a large set of verbs listed by Levin (1993). Given that a normal distribution is symmetric and concentrated around the mean, this finding suggests that most alternating verbs can be expected to describe events with approximately a 50 percent likelihood of involving an external causer. Samardžić and Merlo’s (2018: 914) analysis further shows that while most verbs are neutral regarding external causality, only some tend to be assigned extreme Sp-values. If the majority of alternating verbs indeed describe events whose external causer can be present or absent with similar probability, this raises several questions unaddressed in Samardžić and Merlo (2018): what determines the appearance of the cause argument in a given context? Why one variant is chosen rather than the other when, in principle, both might be possible? Is the choice a matter of lexical, semantic, discourse or contextual considerations?

These questions are taken up in Lee’s (2023) work, which tests several predictions on the relationship between the frequencies of causative and noncausative uses of alternating verbs and the ease with which the ultimate cause of an event can be identified in specific discourse contexts. Figure 1 illustrates the classification of cause types based on ease of identification (Lee 2023: 362).

Figure 1:

Classification of cause types according to the ease of identification.

Agents are regarded as the prototypical causes (Croft 1991; Lakoff 1990; Talmy 1976, 2000) and are typically isolatable as the ultimate cause because they often have intentions to bring about specific changes. In contrast, when an event is nonagentive, determining its ultimate cause can prove more challenging. Rappaport Hovav (2014) contends that sentences such as The window opened and The vase broke are easier to accept than sentences such as The table cleared and The skirt lengthened because it is easy to imagine a scenario in which the speaker sees the change of state but not the act which brings it about: i) the change of state is brought about by some nonagentive cause lacking intentions (e.g., by the wind), and then the cause may not be easily identifiable; (ii) the change of state is brought about by the action of an agent from a distance. When there are a number of causes normally present simultaneously, it becomes even more difficult to identify the ultimate cause of an event. This happens in various circumstances when a change is recoverable by default or unidentifiable from context. Many natural events of change have default causes. As noted by Rappaport Hovav (2014: 24), there are a number of causes normally present simultaneously for changes which come about in the normal course of events (e.g., hair growing longer, trees growing, skies clearing) or those that occur gradually over the course of time. Thus, it is generally difficult to identify a single, isolatable cause for such changes.

One of the main results of Lee’s (2023) corpus study of 12 change-of-state verbs is that there is a strong positive correlation between the ease of identifying the cause and the frequency of verb uses: verbs predominantly used as a causative tend to have a higher percentage of agent causers, a cause type characterized by the greatest ease of identifiability, whereas verbs that are more frequently used as a noncausative cause are associated with cause arguments that are less easily identifiable. A further notable result is the increased proportion of nonagentive causers in the causative uses of verbs that are used more frequently as a noncausative. Nonagentive causers are less frequent in verbs that are predominantly used as a causative. These findings are consistent with the results of Heidinger and Huyghe’s (2024) recent corpus study on French change-of-state verbs. Furthermore, they offer corroborative evidence of a relationship between the noncausative use of a verb and the semantic role of its transitive subject: the more frequently the transitive subject of a verb is a nonagentive causer (as opposed to an agent or instrument), the more likely the verb is used as a noncausative (as opposed to a transitive causative).

In addition to presenting these novel empirical findings, Lee (2023) proposes an account for the observed correlation between cause types and verb uses based on communicative efficiency. Developing the line of argumentation in Levshina (2022), she argues that the correlation between agents and the causative variant, as well as between nonagentive causers and the noncausative variant, stems from the principle of efficient language use, which posits a positive correlation between benefits and costs in communication. According to this principle, high costs correlate with high benefits, and low costs correlate with low benefits. Therefore, language users are incentivized to invest more effort and time in conveying information that yields greater benefits, and to expend less effort and time on information of less utility (Levshina 2022: 22).^[4] From the efficiency perspective, the causative variant, which specifies the cause of a change of state, involves greater costly but is simultaneously more informative than the noncausative variant, which expresses just the change of state. As Rappaport Hovav (2014) has noted, the choice of the causative variant is preferrable unless the extra information, namely, the cause of the change of state, is redundant, unimportant, or unidentifiable. Consequently, the high costs associated with producing a more complex structure are justified by the substantial relevance and significance of the identifiable agent in describing a change of state and the enhanced informativeness of the causative variant. These constitute the communicative benefits in a broadly encompassing sense and ultimately motivate the efficient division of labor between the two causative alternation variants.

This idea is compatible with Rappaport Hovav’s (2014, 2020) pragmatic account of the causative alternation based on Gricean maxims of conversation. However, previous corpus-based studies, including Lee’s (2023) recent work, have utilized monofactorial analyses to test the impacts of individual semantic or contextual factors. There remains a lack of research employing the multifactorial analysis necessary to determine whether these individual factors independently affect the corpus frequency distribution of the alternation variants when tested concurrently. The current study aims to fill this gap by utilizing three multifactorial methods to identify a range of factors influencing variant choice in the causative alternation and to assess the impact of each factor while controlling for others.

In summary, accounts of the causative alternation must explain its limited productivity. Accounts that incorporate both semantic and contextual constraints are most successful at meeting this challenge. Precisely what factors play a more important role and how they interact in constraining the alternation is a matter for continued exploration. The following sections discuss a corpus study that tested the effects of the semantic and contextual factors known from the literature – externally versus internally caused change of state, intentionality, and the contextual identifiability of the causer – against a large set of alternating verbs.

3 Data and methodology

This section first describes the data source and methods for data collection and analysis (Section 3.1) and then discusses the variables examined in this study (Section 3.2).

3.1 Procedures

3.1.1 Selection of verbs

The focus of this study is a major class of verbs that exhibit the causative alternation, known as change-of-state (COS) verbs. The selection of these verbs is motivated by the fact that these verbs instantiate the ‘become’ semantics that favors the participation in the causative alternation, but also show variability in the frequency of causative versus noncausative uses.

We compiled a sample of COS verbs from Levin (1993), a lexical resource that describes the semantic properties and syntactic behaviors of over 3,000 English verbs. From the 9 subclasses of COS verbs identified as alternating verbs by Levin (1993), we selected the following 3: i) break verbs (e.g., break, shatter, split), ii) verbs that are zero-related to adjective (e.g., clear, open, warm), and iii) other alternating verbs of change of state (e.g., change, freeze, melt). Break verbs refer to actions that bring about a change in the material integrity of some entity. Verbs that are zero-related to adjective denote a change in the physical state described by the base and have a ‘(make) become + base’ semantic structure. Other alternating verbs of change of state include various verbs that involve changes in physical state. Unlike other COS verbs ending in -en, -ify, -ize and -ate, which carry the deadjectival affixes, this class includes members that have both nominal (e.g., change, heat, and light) and adjectival bases (e.g., close, mature, and short). This diverse selection averts formal bias and ensures morphological variety in the corpus data.

Although a full account of the causative alternation requires examining a broad range of verb classes participating in the alternation, concentrating on these three major subclasses of prototypical alternating verbs reduces interference from idiosyncratic verb meanings and results in a more refined dataset.

3.1.2 Data extraction

The data source for this study is the BNC, a 100 million word collection of samples of written and spoken language from a wide range of sources originally compiled by Oxford University Press in the 1980s − early 1990s. The BNC was selected as the data source for a first step toward multi-corpora analysis aimed at comparing the use of identical verbs additional English corpora representing more recent language use (e.g., the BNC2014, the ANC, and the COCA).

To conduct the multifactorial analysis required for this study, it was essential to extract the relevant corpus data from the raw BNC texts in a format amenable to further statistical analysis. Consequently, we established a pipeline for systematic data collection and processing; this ensures the reproducibility and replicability of our results. This section outlines the data processing pipeline depicted in Figure 2.

Figure 2:

Procedures for data collection and analysis.

To identify instances of the two patterns involved in the causative alternation, it was necessary to add syntactic information to the corpus data. For this purpose, the raw BNC text comprising unannotated sentences was parsed automatically using a dependency parser. In this study, dependency parsing was performed using Stanza, an open-source Python natural language processing toolkit that supports 66 human languages and was developed by the Stanford NLP Group (Qi et al. 2020). Dependency parsing builds a tree structure of words from the input sentence, representing the syntactic dependency relations between words. The resulting tree representations, which follow the Universal Dependencies formalism (https://universaldependencies.org), are useful in many downstream applications. More specifically, this initial parsing step produces a corpus in a format known as the CoNLL-U format. Such a metadata file consists of split sentences that specify various types of information about each word in the sentence, including the word’ form, its position in the sentence, its lemma, its part-of-speech (pos) tag, and dependency relations (deprel)). This information facilitates the identification of relevant corpus data of interest in this study, enabling the automatic retrieval of causative and noncausative uses of verbs. Figures 3 and 4 provide the graphical dependency representation for the example sentence in (8a) and (8b), respectively.^[5]

Figure 3:

Graphical dependency representation for the sentence: The wind cleared sky.

Figure 4:

Graphical dependency representation for the sentence: The sky cleared.

In Figures 3 and 4, all words in a sentence are connected through dependency relations, defined as binary relations that hold between a governor (also known as a head) and a dependent. For instance, the subject and object depend on the main verb; determiners depend on the nouns they modify; and so on. The labeled arc is directed from the governor to the dependent.^[6]

The second step involved extracting causative and noncausative instances of verbs classified as alternating in Levin (1993) from the parsed corpus. In this study, we will consider only instances of active transitive realizations and intransitive realizations. The passive instances were extracted but excluded from the analysis, as including passives complicates the interpretation of the regression model.^[7] Furthermore, we focused only on one-word uses of the verbs, excluding complex verbs composed of [verb + particle], known as verb particle constructions (VPCs), such as break down and melt away. The exclusion was based on the observation that not all multi-word uses of verbs under scrutiny exhibit the causative alternation. For example, the verb break is frequently used in VPCs, and shows some concentrated use as a sense that does not have a change-of-state interpretation, as in the war broke out. The VPC break out has a coming-into-existence sense and lacks the causative counterpart that can be paraphrased as ‘cause the war to happen’. As it is difficult to exclude only such VPCs, all VPCs were excluded from the analysis.

To retrieve only the relevant data, we wrote a Python script that identified the one-word causative and noncausative instances of 354 alternating verbs. Our script identified the causative instances by selecting sentences that contain dependency relations tagged as “nsubj” (nominal subject) and “dobj” (direct object) and governed by “root”. To extract the noncausative instances, the extraction script utilized an algorithm that filtered out sentences lacking a “dobj” dependency relation governed by “root”. Additionally, the script utilized an algorithm to exclude VPCs and passive sentences by identifying specific patterns that include the presence of “aux:pass” (passive auxiliary) and “compound:prt” (phrasal verb particle).

The data extracted during the initial two steps of the pipeline necessarily contain processing-related errors because they are collected automatically from an automatically parsed corpus. Consequently, the third step involved filtering the data to remove such errors. A manual evaluation of the dataset was conducted to assess the extent to which the analyses were accurate. The validation reveals a bias toward identifying the extracted instances as noncausatives when they are not, and this bias was found in instances of a majority of verbs included in the analysis. The noncausative bias is primarily attributed to the incorrect analysis of English passive constructions headed by the verb be such as The door was opened as noncausatives rather than passives. Other cases of incorrect analyses resulted from parsing errors that led to the assignment of an incorrect form. For instance, an intransitive form was assigned where a transitive form with an elided object should have been, or the actual forms found were not verbs but adjectives or nouns (e.g., open, close, clear, dry). As the current extraction method cannot automatically handle these cases yet, such errors were manually removed from the dataset.

Furthermore, we excluded the following constructions from the samples of relevant tokens eventually analyzed, because controlling for the direct causation interpretation and the change-of-state interpretation were paramount:

Transitive uses that are not causative: the subject is not the cause as in She broke her leg;
Uses that do not have a change-of-state interpretation as in the book opens (=begin) with an introduction to the theory of syllables;
Noncausatives modified by an adjunct phrase that does not express a direct cause as in The service in the bank improved as a result of her complaint: in this
example, her complaint is not construed as a direct cause of the change. Hence,
this sentence does not have the paraphrase Her complaint improved the service. A better paraphrase would be: The bank improved their service because of/as a result of her complaint.

Finally, we excluded idiomatic uses of the verbs to control for the influence of the non-compositional meaning on their ability to alternate and the interpretation of the sentences.

The filtering step produced a total of 15,763 extracted instances of alternating verbs, encompassing 4,566 causative instances and 8,761 noncausative instances.^[8] For practical reasons, we chose to analyze a subset of the data, specifically 3,864 causative and noncausative instances of break verbs, deadjectival verbs, and other alternating verbs of change of state. Some verbs within these classes do not occur in the causative or noncausative variants in the dataset and were thus excluded. The final data set includes 13 break verbs, 34 of deadjectival verbs, and 88 other alternating verbs of change of state. We analyzed corpus occurrences of causative and noncausative uses of these 135 verbs, following the methodology outlined in the subsequent subsection.

In order to conduct a statistical analysis, the data destined for the multifactorial analysis were recorded in an Excel spreadsheet and manually annotated with five variables (verb, realization, intentionality of causation, contextual recoverability, and external causality of change of state) to compose the final dataset for this study. Figure 5 displays the first line of the file containing the final dataset for the verb abate:

Figure 5:

Screenshot of the file containing the final dataset (the first line for abate).

3.2 Coding predictor variables for multifactorial analysis

This section provides an overview of the three semantic and contextual factors included as predictor variables in the multifactorial analysis, as listed in Table 3. The reliability of the coding, as indicated by the high inter-rater agreement score (Light’s kappas) exceeding 0.80, establishes the credibility of the coding schema.

Table 3:

Semantic/contextual factors operationalized as predictor variables.

Variables	Values	Expectations
Intentionality	Intentional (Intent)	Causative
Intentionality	Nonintentional (NIntent)	Noncausative
Contextual identifiability	Recoverable cause (RC)	Noncausative
	Nonrecoverable cause:
	Unknown cause (NRC_UC)	Noncausative
	Identified cause (NRC_IC)	Causative
External causality	Externally caused COS (ExtCOS)	Causative
External causality	Internally caused COS (IntCOS)	Noncausative

3.2.1 Intentionality

This variable distinguishes between agentive and nonagentive causes. As discussed in Section 2.2, agents are typically easily isolatable as an ultimate cause since they often have intentions to bring about specific changes. When an agent is involved in causing an event, that agent must be usually be expressed because it is more informative to mention the ultimate cause of the change of state who intends to bring about that change, if the ultimate cause is known and relevant (Rappaport Hovav 2014: 26). Consequently, the noncausative variant is generally disfavored as a description of an agentive situation unless the agent was established previously or hinted at in the discourse (see Section 3.2.2).

When an event is nonagentive, however, it may be more difficult to ascertain its ultimate cause, and using a noncausative variant becomes appropriate. Sentences such as The sky cleared and The vase broke are easier to accept compared to sentences such as The table cleared and The record broke, because they are more likely to describe a change of state brought about by a nonagentive cause that lacks intentions (e,g., by some natural force).

For the purposes of this study, the distinction between agentive and nonagentive causes was reformulated as a property of the causing action with two values: intentional causation and nonintentional causation.

Intentional causation. A causing action was coded as intentional when it denotes actions that can only be carried out intentionally (e.g., opening shops and closing business). Cases which denote a change of state not necessarily brought about by an intentional agent (e.g., breaking the window and opening the door) were classified as intentional when the context of the sentence or the immediate discourse environment indicates that the causer acted intentionally, as illustrated in (10).

(10)

a.	The protesters broke the window.
b.	I pushed and pushed on the door, and <it finally opened>.
	(McCawley 1978, cited in Rappaport Hovav 2014: 22, (62))

Nonintentional causation. All other cases were coded as nonintentional.^[9] Examples of nonintentional causation are given in (11). In (11a) and (11b), the causer affects the patient accidentally or unintentionally; (11c) exemplifies inanimate causes lacking intention. (11d) describes a change of state (alteration in economic circumstances) that is unlikely initiated by intentionally acting causers.

(11)

a.	I leaned against the door and <it accidentally opened>.
	(Rappaport Hovav 2014: 17, adapted from example (41))
b.	<John broke the window> when he was playing football.
	(Levshina 2022: 168)
c.	The rocks/the storm broke the window.
d.	Or should it adjust its policy as <circumstances change> and attempt to fine-tune the economy? (BNC W:commerce, J15-1447)

In line with Rappaport Hovav (2014, 2020), one might expect intentional causation to increase the probability of causatives, while nonintentional causation is likely to increase the likelihood of noncausatives.

3.2.2 Contextual identifiability

This variable captures the degree to which the cause of a change of state is contextually identifiable, differentiating between recoverable cause (RC) and nonrecoverable cause (Non-RC). A recoverable cause is defined as one that can be contextually identified, whereas all other cases were considered a nonrecoverable cause and classified into unknown cause (UC) or identified cause (IC).

An identified cause is defined as a cause of a change of state explicitly singled out as the primary cause in the clause where the alternating verb is used. Identified causes and the change they bring about are specified in the same clause, enabling their clear identification as the ultimate cause of the change without having to rely on the surrounding context. The following two cases were considered identified causes: (i) causes realized as the subject or the external argument in the causative variant, and (ii) causes realized in other positions of the clause where the investigated verb is used. Research on lexical causatives has established that the subject of the causative is necessarily characterized as a cause that is conceptualized as both proximate and ultimate (Levshina 2022, Rappaport Hovav 2014, Wolff 2003).^[10] The data coding procedure adopted this widely accepted perspective by counting causes realized as the subject in the causative variant as instances of an identified cause. Further cases of an identified cause from the corpus are given in (12). In (12a) the cause (Bannister) fulfills two roles: it serves directly as the object of the matrix verb permit, and indirectly, through indexing, as the external argument in the complement clause containing the verb break (Radford 1988, 1997); in (12b) and (12c), the prepositional phrase containing the cause modifies the verb phrase headed by the verb (melt and broke).

(12)

a.	The record was broken by scientific training and pacing in which two first-class athletes sacrificed themselves to permit Bannister <to break the record>. (BNC W:non-ac:soc_science, A6Y-24)
b.	… these crystals melt in hot water. (BNC W:fict:prose, H8A-2435)
c.	The window broke from the pressure/from the explosion/from Will’s banging. (Heidinger and Huyghe 2024: 191, (23a))

The restriction to causes realized in the same clause where the change is specified excludes causes specified or hinted in the surrounding context from cases of identified cause. Instead, these causes were coded as recoverable causes, that is, causes that can be contextually identified. The following types of cause were regarded as recoverable causes: (i) previously mentioned cause, (ii) hinted cause, (iii) default cause, and (iv) type-inferable cause. The first two types are contextually identifiable based on information available from the immediate discourse context, whereas the latter two types are considered recoverable because they are part of what discourse participants know about the way the world works.

Previously mentioned cause. A cause can be recoverable when it has been established in the surrounding context, that is, in the clauses or sentences preceding or following it. These are cases such as those illustrated in (10b) and (11a) above, wherein the cause of the change mentioned in the preceding clause licenses the noncausative variant of the verb. In (11a) the causing action was performed unintentionally, while in (10b) it was performed intentionally.

Hinted cause. A cause can also be recoverable when the cause might be subtly suggested in the surrounding context. An example is (13) uttered in the context of an event with many tables and food served by waiters. In such contexts, a noncausative variant may be used to describe the event.

(13)

As the night wore on, <the tables slowly cleared> and there was nothing left for the late comers to eat. (Rappaport Hovav 2014: 26, (81))

Default cause. Another type of recoverability, as noted by Rappaport Hovav (2014), is the default cause, defined as causes that are part of discourse participants’ common knowledge about the way the world works: they are recoverable by default. Examples of a default cause of the event are causes of changes which occur in the normal course of events such as the lengthening of hair, the growth of a tree, and the clearing of the sky. According to Rappaport Hovav (2014: 24), one attribute that distinguishes default causes from identified causes is the presence of multiple simultaneous causes for changes that occur as part of the normal course of events. Consequently, it is typically difficult to pinpoint a single, isolatable cause for such changes. When the cause is recoverable by default, the use of the causative variant would be odd.

Type-inferable cause. Another type of cause that is recoverable from world knowledge about the properties of eventualities is termed a type-inferable cause. An example with a type-inferable cause is given in (14). The when-clause in this example describes the closure of a hospital effected through the involvement of an agent who can be inferred to be typically responsible for such an event, namely, the hospital’s owner (see also (9) above). The precise identity of the agent is unimportant and need not be specified. This seems to be the reason for the general avoidance of the type-inferable cause in causative uses, although it is found in agentless passives (refer to Thompson [1987: 499] for further discussion).

(14)

Careful plans have been made for these people so that when <the hospital eventually closes> they will not find themselves on the streets.

(BNC W:non-ac:polit_law_edu, HH3-9982)

The noncausative variant may also be appropriate when neither speaker nor the hearer is aware of the ultimate cause of the event. The following types of cause were considered unknown causes (UCs): (i) cause from a distance, (ii) cause for a gradual change, and (iii) unidentifiable cause.

Cause from a distance. As noted by McCawley (1978), the choice of the verbs come and go determines the orientation of the speaker: in (15) the speaker is probably positioned in the lunchroom. Consequently, this example allows the hearer to infer that the speaker does not know how the door was opened.

(15)

<The door of Henry’s lunchroom opened> and two men came in.
(McCawley 1978, cited in Rappaport Hovav 2014: 22, (59a))

Cause for a gradual change. Another type of unknown cause is the cause for a gradual change discussed by Rappaport Hovav (2014). Consider (16):

(16)

With the 1929 stock market crash, <skirts lengthened> but kept their narrow silhouette, with longer waistlines. (Rappaport Hovav 2014: 27, (83))

This sentence does not describe a change in the length of an individual skirt, but rather a change in the kind of skirt. It reports a situation in which, over time, there are a variety of instantiations of the kind, and when comparing one to another over this period, the length is seen to increase. It is likely that multiple causes (and reasons) contribute to the gradual change in the kind, which are generally diffuse and not uniquely identifiable.

Unidentifiable cause. In numerous instances, an identifiable cause could not be discerned and the available contextual information did not indicate the cause of a state change. These instances, characterized by unclear and unidentified causes due to insufficient information, were categorized as cases of unknown cause. An example with an unidentifiable cause is given in (17).

(17)

People can enjoy reading about you even if they don’t like your music and if enough of them do, it improves your chances of more coverage the next time you need it. … Although the total press coverage of pop music increased dramatically over the last ten years, <the quality hasn’t improved> and the power of music papers to make and break an artist has greatly dissipated.

(BNC W:misc, A6A-449)

From the immediate discourse context, it is apparent that the influence of music papers augmented the probability of obtaining more press coverage for artists and their music. However, the reason for the persistently unimproved quality of pop music, despite a significant increase in total press coverage, remains unclear.

Determining whether the cause for a change is a recoverable cause or an unknown cause required a careful analysis of the context. We used the CQPweb interface of the BNC to access preceding and subsequent context. CQPweb (Hardie 2012) is an online corpus analysis tool that serves as an interface to the corpus workbench software (CWB) and its effective corpus query processor (CQP) search utility (https://cqpweb.lancs.ac.uk). The most convenient method to view context is by searching for query expressions and concordancing the results. As shown in Figure 6, by default, the CQPweb system displays only the immediate text unit containing the search element (in this case, But the age structure within this total has changed).

Figure 6:

Viewing a concordance with surrounding sentences in the CQPweb.

To examine more context, one can simply click on the search element (underlined and center-aligned in the concordance line). This action opens a new window displaying the entire text from which the search element originates (Figure 7). This configuration enables reading the context immediately before and after the sentences (highlighted in the window).

Figure 7:

Displaying more context in the CQPweb.

3.2.3 External causality

This variable adapts of the distinction between internal versus external causation as discussed in Levin and Rappaport Hovav (1995), capturing whether a change of state comes about in a natural circumstance or not. Levin and Rappaport Hovav (1995) consider this distinction relevant for the lexical representation of verbs, which encodes the conceptualization of events, introducing categories for verbs they label externally and internally caused. The major classes of internally caused verbs in their analysis include (i) a variety of activity verbs, (ii) verbs of emission; and (iii) a subset of change of state (COS) verbs such as bloom, blossom, corrode, decay, erode, rust, and wither.^[11] Levin and Rappaport Hovav (1995: 97) cite this class of COS verbs as nonalternating; nonetheless, corpus studies by McKoon and Macfarland (2000) and Wright (2001, 2002) reveal that these verbs are sometimes found in causative variants.

Levin and Rappaport Hovav (1995) provide two partially overlapping yet essentially distinct characterizations of internal causation: (i) causation residing in the single argument, namely, the entity undergoing the change, and (ii) causation inherent to the natural course of development of the changing entity. However, it is not accurate to characterize internally caused COS verbs as describing events devoid of external causes, occurring solely due to the inherent properties of their arguments. For instance, when a flower blooms, factors external to the flower such as temperature, sunlight, and moisture are surely influential. On the other hand, flowers bloom in the natural course of events. Rappaport Hovav (2020: 228) argues that it is more accurate to characterize internally caused changes of state as those which come about in the changing entities as part of their natural development.

This characterization of internally caused changes of state can be straightforwardly applied to other alternating COS verbs. In this study, an internally caused COS is defined as a COS that comes about in the natural course of events as a result of the internal properties of the argument undergoing the state change; an externally caused COS is defined as a COS induced by an external cause, which is not inherent to the entity’s natural development. Unlike internally caused changes, externally caused changes of state occur with the intervention of an external causer, including agents, nonintentional acts, instruments, and abstract processes, or with a significant input of external force or energy. These various types of external causers correspond to different types of external causation and collectively represent a continuum along which different languages establish their limits on possible external causers (Wolff et al. 2009).

The meaning of the investigated verbs is compatible with descriptions of either internally or externally caused changes of state though they may vary in the likelihood of describing certain types of changes. As noted by Rappaport Hovav (2020) and others, the exact nature of the change is largely determined by the chosen theme of the change of state. In this study, the external causality of a COS was assessed by examining two aspects of the syntactic context: (i) the choice of internal arguments found with the verbs, and (ii) modifiers of the arguments and the verb phrase. For instance, naturally occurring changes of state, such as the clearing of the sky and the opening of clouds, have been considered internally caused changes of state. Such changes generally occur independent of human control, although certain environmental conditions and natural forces can regulate the onset and rate of change by acting physically on the theme of the change of state.

Changes in concrete or abstract entities are more subject to human control. As such, when verbs take concrete entities or abstract entities, the nature of the change was determined by analyzing modifiers of the arguments and the verb phrase, as well as the choice of both internal and external arguments. For example, nonagentive melting events like the melting of ice or ice cream at room temperature have been counted as instances of internally caused COS. On the other hand, the melting of substances or objects such as wax and metal by heating has been counted as instances of externally caused COS. Melting of someone’s heart has also been considered a description of an externally caused COS, as this change does not occur as part of the natural progression of the theme of the change of state.

To illustrate further, the natural drying of hair, sweat, or a towel has been regarded as a description of an internally caused drying event. However, drying them by a machine, in the sun, or in front of a fire requires external heat and thus has been regarded as a description of externally caused drying events. Further examples of causative and noncausative uses of the verbs studies, describing internally and externally caused changes of state, are given in (18) and (19), respectively.

(18)

a.	This railing is just outside my door. On a very cold winter morning before <the temperature melted the ice>, I was able to capture this image.
	(Rappaport Hovav 2020: 242, (61))
b.	<Sweat began to dry> and strength seeped back into my limbs.
	(BNC W:misc, AT-2138)
c.	When <the ice melted>, some shallow lakes remained where boulder clay blocked old river courses. (BNC W:non-ac:soc_science, B1H-2052)
d.	<The sugars dissolve into the water> and the sweet liquid, called wort, is pumped to a copper. (BNC W:misc, A13-63)

(19)

a.	Their warm Spanish eyes in luxuriant black eyelashes melted my heart.
	(BNC W:biography, AC6-205)
b.	The Central Bank has frozen all remittances of profits, dividends and interest. (BNC W:pop_lore, ABF-1212)
c.	Sarah let Roy carry Benny into the house, and as she looked at Janine comforting her only daughter, <the ice in her heart melted>. Going to the girl, she put her arm around her shoulders. (BNC W:fict:prose, CR6-982)
d.	I picked up a young bird which one of my cats had caught; I held it in my hands in the hope that it might survive. The shock had been too great, though, and its heart went into convulsions. And then came a moment when I was aware that <its personal life was dissolving>, and I could somehow perceive that a greater life, a kind of group ‘bird soul’ was taking over … (BNC W:non_ac:humanities_arts, CCN-1379)

If it could not be determined contextually whether the change of state was internally or externally caused, a third category ‘unclear causality’ was added. Instances of unclear causality were excluded from the samples of relevant tokens that were subsequently analyzed.

In line with previous studies, one can expect that internally caused changes of state are more associated with noncausatives, while externally caused changes of state are more typically associated with causatives.

The influence of these factors discussed in this section will be examined through the following hypotheses:

(20)

Hypothesis 1: Intentionality (nature of causing action)

a.	The causative is more likely if the causing action is intentional.
b.	The noncausative is more likely if the causing action is nonintentional.

(21)

Hypothesis 2: Contextual identifiability (type of cause)

a.	The causative is more likely if the cause is clearly singled out and identified.
b.	The noncausative is more likely if the cause is recoverable or unknown.

(22)

Hypothesis 3: External causality (kinds of change of state)

a.	The causative is more likely if the COS described by the sentence is externally caused.
b.	The noncausative is more likely if the COS described by the sentence is internally caused.

Since these variables overlap and are interrelated, it would be impossible to determine from individual examples whether the choice of variant can be attributed to one variable or another. To test the simultaneous effects of these predictors, we used three popular multivariate methods in corpus linguistics: conditional inference trees, conditional random forests (Levshina 2015, 2016, 2020, 2022; Tagliamonte and Baayen 2012) and mixed-effects logistic regression (Baayen 2008; Baayen et al. 2008; Jaeger 2008, 2011). These methods are described in the following section.

4 Results

This section introduces the statistical methods and reports the results of our descriptive statistical analysis (Section 4.1) and multifactorial analyses (Sections 4.2 and 4.3).

4.1 Descriptive statistics

We analyzed the distribution of the causative and noncausative variants in relation to the three factors described in Section 3.2.^[12] Figure 8 shows the proportional distribution of the two variants and the absolute frequencies according to the levels of the factors.

Figure 8:

Proportional distribution of alternation variants by intentionality, cause identifiability and spontaneity.

Intentionality. The proportional distributions in Figure 8 confirm our hypotheses regarding the influence of intentionality of causation as posited in (20). We can see that the causative variant is proportionally preferred over the noncausative variant when the causing action is intentional, aligning with (20a). Conversely, when the causing action is nonintentional, the noncausative is preferred, as outlined in (20b).

Recall that the category of nonintentional causes encompasses inanimate causes and nonintentional animate causers. While animates may act intentionally or unintentionally in precipitating a particular event, inanimates lack the capacity for intentional action. This difference between inanimate causes and animate causers suggests that inanimate causes are more closely linked to nonintentional causation. To investigate whether this is the case in our data, we have further analyzed nonintentional causes that are explicitly expressed in the clause of the COS verb or identifiable contextually by their ontological class. Figure 9 shows the frequency distribution of nonintentional causes among the two variants based on their animacy.

Figure 9:

Frequency distribution of animate and inanimate nonintentional causes.

We can see that there is a marked asymmetry between inanimate causes and animate causers: nonintentional causes are predominantly inanimates in both variants. There were merely 43 instances of a nonintentional animate causer realized as a causative subject, constituting only 6.56 % of the 655 causatives with a nonintentional causer subject in the dataset; the number of noncausative instances with a nonintentional animate causer totals 148, accounting for 7.86 % of all 1,884 noncausative uses with a nonintentional causer. All these cases are recoverable causes either overtly expressed in the local linguistic context or hinted at in the surrounding context, as exemplified in (11a) and (13) above, respectively. While inanimate causes can be expressed as adjuncts in noncausatives as illustrated in (12c) above, animate causers cannot (Alexiadou and Schäfer 2006: 41):

(23)

*The window broke from John.

This difference between inanimate causes and animate causers is consistent with the proposal that inanimate causes are more closely associated with the noncausative variant than are animate causers (Alexiadou and Schäfer 2006; Heidinger and Huyghe 2024). Further evidence supporting this is provided by analyzing the realization patterns of prototypical examples of inanimate causes – natural forces and conditions. We analyzed a total of 215 instances of natural forces and conditions identified or identifiable as causes, examining the frequency of their realizations and the cause types. The results of this analysis of natural causes are summarized in Figure 10.

Figure 10:

Frequency distribution of subtypes of natural causes.

Figure 10 shows that natural causes predominantly appear in the noncausative variant. Concerning natural causes in noncausatives, the discrepancy between RCs and ICs (adjunct PPs in noncausatives) is striking: RCs significantly outnumber ICs. When assessing the relative frequency of subtypes of RCs, it was found that 102 instances were default causes (DCs), and 53 instances were RCs mentioned or suggested in the local linguistic context. It is noteworthy that noncausatives with these RCs take up 80.31 % of all instances of noncausatives with a natural cause. Examples of noncausatives with recoverable and identified natural causes from the BNC are given in (24) and (25), respectively. We revisit to these findings in Section 5 and propose an efficiency-based explanation for them.

(24)

a.	<Early fog had cleared> and the airport manager and I were standing on the tarmac lining up the motorcade when my car phone rang.
	(BNC W:commerce, ADK-1948)
b.	<The store burned> because, some minutes after 11 am on 25 April, a 6.9 earthquake heaved the town, our houses and the store briefly off the ground and, in the store, an electric coffee pot overturned.
	(BNC W:non_ac:polit_law_edu, CAK-814)

(25)

a.	In the midst of summer when you came out, <your eyes burned with sunlight> and your small meaty smooth arms and legs were stung by the pavement’s heat. (BNC W:misc, A6C-46)
b.	Ambitions soon melted under an unseasonally hot sun.
	(BNC W:pop_lore, A15-394)

Contextual identifiability. The proportional distributions in Figure 8 also confirm our hypotheses regarding the influence of contextual identifiability as discussed in (21). We observe that the causative variant predominates when the cause is clearly singled out and identified; specifically, when the cause and the change are specified in the same clause (NRC-IC). Conversely, in cases where the cause is either recoverable or unknown, the pattern shifts: the noncausative prevails if the cause is recoverable (RC), and it is exclusively selected when the cause is unknown (NRC-UC). There are 49 instances of causative uses with a recoverable cause, constituting merely 3.02 % of the total 1,620 causative instances in the dataset. Predominantly, such cases of recoverable cause subjects in the causative uses of the alternating verb in our dataset are omitted subjects of finite verbs, which refer back to the subject in the immediately preceding sentence or to a salient entity introduced earlier in the discourse, as illustrated in (26).^[13]

(26)

a.	Crushed a snail? (BNC S:conv, KDE-4073)
b.	Should have defrosted him earlier. (BNC W:fict:prose, F9X-1688)

External causality. The proportional distribution of the realization of internally caused changes of state is in line with our hypothesis in (22b). Figure 8 shows that the noncausative is the categorical choice when describing an internally caused COS. There are no attested instances of causative uses describing an internally caused COS within our dataset. As already mentioned, we excluded instances of transitive uses of the analyzed verbs that are not causative. Examples of the transitive uses removed from the final dataset are given in (27):^[14]

(27)

a.	The seeds you’ve planted are sprouting the tenderest of young green shoots.
	(BNC W:fict:prose, HGN-2707)
b.	Already the wounds sprout new growth. (BNC W:fict:prose, CM4-1549)

However, the proportional distribution of the realization of externally caused changes of state is not consistent with our hypothesis regarding the influence of externally caused changes of state as outlined in (22a). As shown in the figure, the noncausative variant is marginally favored over the causative variant even when the sentence describes an externally caused COS.

In the subsequent subsection, we will use conditional inference trees and conditional random forests in order to identify the set of independent variables that significantly differentiate between the two variants.

4.2 Tree and forest analysis

Conditional inference trees (CITs) and conditional random forests (CRFs) were introduced to linguistic analysis in a paper by Tagliamonte and Baayen (2012) and have been fruitfully used in models of linguistic variation (Hundt et al. 2021; Levshina 2015, 2016, 2020, 2022; Szmrecsanyi et al. 2016). These methods are non-parametric permutation approaches that do not assume a specific distribution of the data, but rather build a model through resampling from the input.

CITs are methods for regression and classification based on recursive partitioning. They split the data recursively into smaller subsets based on predictor variables that co-vary most strongly with the outcome. For each binary split, the data is evaluated for the predictor that best preserves the homogeneity of each split (e.g., all causatives and all noncausatives) at a certain significance level. This splitting process continues until no further splits can significantly increase the subset’s homogeneity with respect to the outcome variable. Despite their utility, single trees have the disadvantage that they very much hinge on the dataset at hand and are hence subject to a high degree of variability. Conditional random forests (CRFs) mitigate this by resampling across a predefined number of trees through a conditional permutation scheme and thus particularly robust to, for instance, predictor collinearity (see Levshina [2020] for details).

CITs and CRFs can be particularly useful in situations where the use of parametric methods (in particular, regression models) may not be suitable, such as in situations characterized by ‘small n, large p’, (where n is the number of observations and p is the number of predictors), complex interactions, non-linearity, and correlated predictors (Tagliamonte and Baayen 2012). They are also noted for their robustness in the presence of outliers. Furthermore, CITs facilitate the intuitive interpretation of complex interactions involving more than two predictors.

Figure 11 shows an individual CIT fitted to the data using the function ctree ( ) in the party package in R. The ovals highlight the names of the variables selected for the best split, along with the corresponding p-values. The levels of the variables are specified on the ‘branches’. The bar plots at the bottom (‘leaves’) display the proportions of the two variants in each end node (‘bin’), which contain all observations with a given combination of features. The number of observations in each bin is noted in parentheses above the boxes.

Figure 11:

Conditional inference tree of the variant choice.

The plot is presented with all significant splits at the 0.05 level and should be interpreted from the top down.^[15] The interpretation proceeds as follows. First, the algorithm looks for the variable most strongly associated with the variant (causative or noncausative). This variable is contextual identifiability (see Node 1). It then divides the data along this variable into two subsets, separating observations with NRC_ICs (cases on the left side of the tree) from those with RCs and NRC_UCs (cases on the right side of the tree). We observe that causative situations involving NRC_ICs are expressed predominantly with a causative (Nodes 3 and 4), whereas those involving RCs and NRC_UCs are realized predominantly as a noncausative (Nodes 7, 8 and 9). Subsequently, the algorithm seeks another variable significantly associated with the variant. On both sides of the tree, the subsequent split occurs with the variable of intentionality (Nodes 2 and 5), indicating that in cases where the cause type is similarly identifiable, the intentionality of the causer becomes relevant. Descending the left-hand branch from Node 2, where the cause is an NRC_IC type, the proportion of causative outcomes is higher in situations of intentional causation (Node 3: 96.7 %) compared to nonintentional causation (Node 4: 80.9 %). Conversely, on the right-hand branch from Node 5, where the cause is recoverable or unknown, the proportions in the bar plots in the node leaves show that noncausative is uniformly preferred in instances of intentional causation with NRC_UCs (Node 8: 100 %) and that noncausative is more prevalent in instances of nonintentional causation (Node 9: 99.2 %), relative to intentional causation with RCs (Node 7: 92.2 %). Note that the variable external causality is not used in the data splitting because it does not significantly correlate with the outcomes (with p < 0.05) in the model. These results support the conclusion that contextual identifiability is the most predictive factor, followed by intentionality, while external causality appears to have a minimal impact in differentiating between the two variants.

The results of the CIT analysis were corroborated by the CRF analysis, where a CRF consists of an ensemble of multiple CITs. In addition to measures of classification accuracy, CRFs facilitate conditional variable importance scores, which show how important each variable is in predicting the outcome (in our study, the use of the causative and noncausative variants), taking into account all other variables and their interactions. To compute this measure for a predictor, the algorithm averages results across numerous trees and measures the reduction in predictive capability when the predictor is randomly permuted. As explained in Levshina (2016, 2020), a notable distinctive feature of CRFs is their ability to ascertain the conditional variable importance measures, offering certain benefits over non-conditional approaches (Strobl et al. 2007). Specifically, these measures are unbiased towards correlated predictors and with regard to the number of categories in a categorical variable. Furthermore, CRFs are superior to other popular random forest methods as they make predictions by retaining information about individual observations in each tree without pruning the trees. Consequently, terminal nodes with a high count of observations exert more influence because they provide more data points for calculating predicted values.

We created a random forest comprising 500 trees, and subsequently computed the variable importance scores.^[16] The variable importance scores are presented in Figure 12 in order of decreasing importance. The horizontal axis represents the conditional variable importance (CVI) for each predictor. The dashed line separates the important scores on the right from the unimportant ones on the left. If a variable is irrelevant, its importance values vary around zero.

Figure 12:

Conditional variable importance (CVI) scores of the variables based on CRFs.

Figure 12 shows that contextual identifiability is the most important predictor for the choice between the two alternation variants (CVI score: 0.337). This variable also predominantly facilitated the splits in the CIT. Consistent with the results of the CIT analysis, intentionality emerges the second most important variable (CVI score: 0.0036). Notably, external causality does not seem to have any discriminatory power in the CRF.

We compared the variable importance scores obtained from the CIF analysis with those derived from non-conditional random forests. The mean decrease accuracy plot in Figure 13 demonstrates that all three variables play a role in random forests:^[17] with contextual identifiability being the most important predictor, followed by intentionality and external causality.

Figure 13:

Gini scores of the variables based on RFs.

To summarize, the results of the CIT, CRF and RF analyses suggest that contextual identifiability is the most predictive factor, whereas the other two variables are less useful for distinguishing between the two variants. In the next subsection, we will use mixed-effects logistic regression modeling to gain a more robust insight into the size of the predictors’ effect, their interactions, and random effects (verb specific effects on variation).

4.3 Mixed-effects logistic regression model

Mixed-effects logistic regression enables us to model binary outcome variables with two types of factors that are mixed in the statistical analysis (Baayen 2008; Baayen et al. 2008; Jaeger 2008, 2011): fixed factors and random factors. Random factors are employed to systematically exclude variation that can be deemed as ‘random’ or unpredictable, thus only indirectly affecting the response variable. For example, factors that could arguably be considered random are the number of speakers in a given corpus or the number of verbs that are found in a given construction. The separation of the effects of random factors permits a more reliable assessment of the effects of the remaining fixed factors.

Our model included realization as a binomial response/outcome variable (i.e., causative vs. noncausative), intentionality (intentional vs. nonintentional), contextual identifiability (RC vs. Non-RC), and external causality (internally vs. externally caused COS) as independent variables, and verb as a random factor. All independent variables were coded using sum contrasts, in which the proportion of responses for each level is compared against the grand mean across all levels, as shown in Table 4.

Table 4:

Specification of the predictor variables.

Variables	Levels	Sum coding
Intentionality	Intentional (Intent)	−1
Intentionality	Nonintentional (NIntent)	1
Contextual identifiability (C.Identifiability)	Nonrecoverable cause (NRC_IC & NRC_UC)	−1
Contextual identifiability (C.Identifiability)	Recoverable cause (RC)	1
External causality (ExtCausality)	Internally caused COS (IntCOS)	−1
External causality (ExtCausality)	Externally caused COS (ExtCOS)	1

We built a mix-effects logistic regression model using the lme4 package in R version 4.3.2 (R Core Team 2023). All factors listed in Table 4 were included as fixed factors. We also included a random intercept in the model to account for variability between specific verbs.

We started out with a model that included all the fixed factors specified in Table 4 and their possible interactions, successively removing predictors that did not significantly contribute to the overall fit of the model. The final model includes two significant main effects and no significant interaction effect. A summary of the model is given in Table 5.^[18]

Table 5:

Mixed-effects logistic regression of the causative alternation.

Random effects
Groups	Name	Variance	Std. Dev.
Verb	(Intercept)	2.25	1.49

Fixed effects

	Estimate (β coefficient)	Odds ratio	p-value
(Intercept)	–4.05	0.087	< 2e-16
Intentionality_NIntent	2.11	7.76	< 2e-16
C.Identifiability_RC	6.62	125.80	< 2e-16
ExtCausality_IntCOS	0.54	1.12	0.993

In the fixed effects section, we report the β coefficients and the odds ratios associated with each predictor variable along with the significance levels (p-values). Under the Estimate column, the β coefficient indicates the slope (effect size) of the effects of each predictor variable. A positive coefficient indicates that the predictor variable increases the probability of noncausatives (the outcome having value 1), whereas a negative coefficient means that the variable increases the probability of causatives (the outcome having value 0). The odds ratios show similar information, but have a lower bound at zero, and a value of one indicates an equal chance of either variant. If the odds ratio is greater than one, noncausatives are more likely than causatives; if it falls between zero and one, they are less likely. Odds ratios can be interpreted as factors with which the odds of a noncausative variant are increased or decreased given certain predictor changes. For instance, intentionality changes the odds of the noncausative realization by a factor of 7.76, thus making noncausatives 7.76 times more probable in situations of nonintentional causation (having value 1 of the predictor variable ‘intentionality’) compared to the overall average.

Table 5 shows that, consistent with the results of the CIT and CRF analysis, intentionality and contextual identifiability were significant predictors of the realization of the causative alternation, with contextual identifiability emerging as the most predictive factor. Specifically, a causative situation is 125.8 times more likely to be realized as a noncausative when the cause is contextually identified than when it is not. External causality, however, turns out not to be significant, indicating it does not play a significant role in distinguishing between the two variants as the noncausative variant is favored in situations of both internal and external causation (refer to Figure 8).

Figure 14 visualizes the fixed effects of the final model, with predictors situated to the right of the dotted line indicating a high probability of noncausatives. The horizontal bars extending from the points are 95 % confidence intervals. We can interpret the figure as follows. The odds ratios for both significant fixed factors are greater than 0, and their 95 % confidence intervals exclude 0, indicating that they significantly increases the probability of noncausatives. Contextual identifiability is particularly notable for its preference for noncausatives.

Figure 14:

Predictors for the final model, with 95 % confidence intervals for the odds ratios.

In the random effects section of Table 5, variance and Std. Dev. (standard deviation) indicate the variability from the predicted values due to the random effects added to the model. Consequently, these values reflect the fact that every observation has some unexpected factors affecting the realization of the causative alternation in addition to the fixed effects. The random effect variable, verb, has a standard deviation of 1.5, which translates into an average difference between verbs in their noncausative use of approximately 37.5 %. In other words, while genuine differences exist between verbs regarding their causative or noncausative realizations, the model takes this into account, such that the fixed effects in Table 5 represent the effects of our predictors beyond the verb specific effects. Figure 15 below shows the random effects for 9 sample verbs, with 95 % confidence intervals. Grow and explode are notable for their preference for noncausative realization, whereas open and fill are linked to a higher probability of causative realization. Increase and diminish exhibit no significant deviation from the overall average.

Figure 15:

Random effects for the final model, with 95 % confidence intervals.

To summarize discussion in this section, the results of the statistical analyses show that the contextual identifiability and intentionality of the causer influence the causative alternation as predicted by the hypotheses in (20) and (21) above. Multifactorial analyses further reveal that these two factors are significantly associated with the realizations in both the CIT/CRF and mixed-effects logistic regression models, with contextual identifiability emerging as the most important predictor: causative situations with a clear identifiable agent are predominantly realized as a causative, whereas those with a less clear, nonagentive cause are predominantly expressed with a noncausative. These findings underscore the pivotal role of contextual and semantic properties of the causer in distinguishing between the two variants over the involvement of an external causer.

5 General discussion

The results presented in Section 4 raise the question of why the contextual identifiability and intentionality of the causer are significantly associated with the causative alternation. In this section, we propose that speakers’ use of the alternation variants is driven by the efficiency principle, which provides a uniform motivation for the association of contextual identifiability and intentionality with the causative alternation.

As discussed in Section 2, contextual identifiability and intentionality are closely related to the ease of cause identification and represent different degrees of cause identifiability. Agent causers are the most readily identifiable cause type since their intentionality and animacy render them salient, easily isolatable as the ultimate cause, and less likely to be abstracted. We have proposed that the observed correlation between intentionality and the causative variant in our corpus data stems from Levshina’s (2022) efficiency principle, which posits a positive correlation between benefits and costs. From this efficiency perspective, the causative variant, which specifies the cause of a change of state, is more costly and simultaneously more informative than the noncausative variant, which expresses just the change of state. As Rappaport Hovav (2014) noted, the choice of the causative variant is preferable unless the extra information – the cause of the change of state – is redundant, unimportant, or unidentifiable. Therefore, the high costs required to produce a more complex structure are justified by the relevance and importance of the identified agent in the description of a change of state and by the high informativeness of the causative variant.

When an event is brought about by a nonintentional causer, identifying its ultimate may be more challenging, and the noncausative variant becomes more prevalent. This also follows from the positive correlation between benefits and costs. According to this principle, a message should be sufficiently useful to justify the articulation costs. If a cause of a change of state is not sufficiently salient and clear to yield communicative benefits, speakers are likely to omit expression of such a cause to conserve effort and time on less useful information.

The positive correlation between benefits and costs can further explain the dominance of the noncausative in causative situations with a contextually identified cause. As discussed in Section 2, if the cause is recoverable from the context, then a sentence with the cause expressed is no longer more informative than a sentence that expresses just the change of state. In that case, mentioning the cause appears superfluous, and the noncausative may be preferred based on economy considerations.

The frequency distribution and realization pattern of natural causes observed in our corpus data have a similar efficiency-based explanation. Recall from Section 4.1 that the majority of natural causes occur in the noncausative variant and are predominantly unexpressed in the clause of the COS verb. This naturally follows from the positive correlation between benefits and costs, since the contextual identifiability and predictability of the cause affect the costs associated with producing a more complex structure, namely, the causative variant and its degree of informativeness. As noted in Section 2, many natural events of change have expected causes that are recoverable by default. For example, the sky clears, ice melts and a tree grows due to a variety of causal factors which are typically co-occurring, common, and predictable. If the cause is predictable and known, then the sentence expressing the cause expressed is no longer more informative than the sentence that expresses just the change of state. Furthermore, as Rappaport Hovav (2020: 244–245) notes, these causal factors do not qualify as actual causes (the particular cause deemed to be “the” cause of the change of state). In such instances, mentioning the cause is highly costly because the speaker cannot easily identify “the” cause of the change of state that is suitable as the subject in the causative variant. For this reason, it is more likely that the cause will remain unmentioned to conserve effort and time on less crucial information that is not clearly identifiable as the actual cause. This, we suggest, helps explain why the majority of natural causes appear in the noncausative variant and are predominantly unexpressed in the clause of the COS verb.

Despite this tendency, natural causes can describe some unexpected circumstances as we have previously observed. Consider, for example, the melting of ambitions under a hot sun as described in (25b) above. This sentence does not describe a change of state that comes about in the natural course of events. Instead, it describes a change that can be characterized as externally caused. Note that the external cause adjunct here includes a modifier unseasonally hot, which suggests that the cause (hot sun) for the change is an unusual cause. Further examples of natural causes that include a modifier are given in (28)

(28)

a.	<The sunlight that filtered through the windows mellowed the old wood> and brought out the richness of its colour, and even the stone floor glowed softly and did n’t give any impression of coldness.
	(BNC W:fict:prose, H8F-875)
b.	When he woke up, stiff and uncomfortable, <the early summer dawn was lighting the room>. (BNC W:fict:prose, H85-3648)

Rappaport Hovav (2020: 245) observes that modification of the natural cause helps pick out one of the causal factors as being unusual or one aspect of the causal factor as being unpredictable or less expected. If this holds true, then we can expect that the natural causes with modifiers are more likely to be overtly expressed than those without. Our analysis of sentences with a natural cause as the subject or adjunct indicates that overtly expressed natural causes tend to include a modifier. Among 25 natural cause subjects, 19 include a modifier; among 38 natural cause adjuncts, 22 include a modifier.^[19] This provides new empirical evidence for the proposal that specifying the cause of the change of state is favored when such an expressed cause renders the sentence more informative than its counterpart only expressing the change of state (Rappaport Hovav 2014, 2020).

What remains to be explained is why the contextual and semantic properties of the causer play a more significant role in distinguishing between the two variants than external causality or the nature of the change of state. We do not have a full explanation for the nonsignificance of external causality in the statistical analyses reported in Section 4.2, but offer some speculative remarks on the crucial role of cause identifiability in our dataset. It is likely attributable to the absence of morphological marking in the English causative alternation. Given that alternating verbs in English can be used as a causative and a noncausative without a formal change, grammatical or discourse context must be utilized to determine how the verbs are used. Consequently, straightforward cause identification in context assumes greater importance for distinguishing between verb uses in English than in languages with the morphologically marked causative alternation. As previously noted, it is the alternation in such languages that has been shown to be driven by spontaneity or external causality in previous crosslinguistic studies. The relative importance of different factors in distinguishing between the two variants, potentially linked to the formal encoding of the causative alternation, will be explored in future research.

6 Conclusions

This paper has investigated semantic and contextual factors influencing the causative alternation in English. The main goal was to identify the factor most strongly associated with the causative and noncausative uses of alternating verbs. For this purpose, we examined the effects of three semantic and contextual factors – intentionality, contextual identifiability, and external causality – on 3,864 instances of causative and noncausative uses of 135 alternating verbs extracted from the automatically parsed BNC. Our results of a series of multifactorial analyses of the corpus data show that contextual identifiability is the most predictive factor, followed by intentionality, whereas external causality is least useful for discriminating between the two alternation variants: causative situations with a clear identifiable cause are realized predominantly as a causative, whereas those with a less clear cause are expressed predominantly with a noncausative. These findings lend new empirical support to the view that contextual properties of a cause argument play an important role in the choice between the two variants, as originally proposed by Rappaport Hovav (2014).

The observed correlation between intentional causers and the causative variant is also in line with psycholinguistic studies on the verbalization of an event (e.g., Fausey et al. 2010; Fausey and Boroditsky 2011; Martin et al. 2023; Wolff 2003) and Heidinger and Huyghe’s (2024) recent corpus analysis of the causative alternation in French. These studies reveal that intentional causers and nonintentional causers exhibit different propensities for appearing in the subject position of a transitive causative sentence, with a lower likelihood for nonintentional causers. We have proposed that this pattern of association between cause type and verb uses is ultimately grounded in the communicative demands of language users, and arises due to the tendency of languages to utilize their structures efficiently.

This paper represents the inaugural effort to demonstrate that communicative efficiency consistently motivates the influence of contextual identifiability and intentionality on the causative alternation in English. However, the efficiency-based approach to the causative alternation is not complete in several areas, and various unresolved issues and tasks persist for future investigation. The current study was limited to active transitive and intransitive realizations of alternating change-of-state verbs. A more comprehensive analysis is called for that closely examines the division of labor between intransitive realizations and passives.

The realization of causative alternation may be influenced by additional factors not investigated in this study; consequently, future research should examine in greater detail the distinctions between transitive sentences with agentive and nonagentive subjects. In a recent proposal, Martin (2020) argues that causative variants with agentive subjects differ from those with nonagentive subjects regarding event structure. The event denoted by the VP combined with an agent subject is composed of an action carried out by the agent and the theme’s change of state, whereas in instances with a nonagentive causer subject, it is restricted to the theme’s change of state. The empirical evidence for differentiating these two types of transitive causatives comes from their aspectual properties: interpretations of in-adverbials, begin-sentences, and progressive sentences. The intentionality associated with agentivity enables a specific profiling of process duration, which makes it easier to access preparatory phrases in agentive descriptions of events. Crucially, the absence of any acting or causing entity in the VP of transitive causatives with nonagentive causer subjects is a trait they share with noncausatives. Martin’s analysis suggests that nonagentive transitive causatives are semantically closer to noncausatives than are agentive transitive causatives. This raises significant empirical questions: (i) what drives the choice between nonagentive causative (e.g., The heat melted the plastic) and noncausative with a cause expressed as an adjunct (e.g., The plastic melted from/in the heat)? and (ii) how the universal force of communicative efficiency interacts in this choice with speakers’ intentions and the conceptualization of events as well as with constraints imposed by language-particular grammar? These are important empirical questions which require a thorough empirical investigation in future study.

Another important challenge for the efficiency-based approach is the existence of verb-specific preferences for one variant over the other. Lee’s (2023) research on 12 change-of-state verbs has shown that the identifiability of causes in context more accurately characterizes verb-specific preferences for one variant over the other and corpus frequency distribution of alternation variants than other lexical semantic factors discussed in the literature. This requires empirical validation through psycholinguistic experimentation and quantitative analysis of a larger set of verbs from corpora in different languages. The present study can serve as a foundation for such comprehensive analysis. Evidence from broad crosslinguistic analysis will contribute to uncovering native speakers’ knowledge of the complex probabilistic constraints that mediate the linguistic realizations of causal events and event participants.

Corresponding author: Hanjung Lee, English Language and Literature, Sungkyunkwan Univ, 25-2 Sungkyunkwan-ro, Jongro-ku, Seoul, Korea, E-mail: hanjung@skku.edu

Funding source: National Research Foundation of Korea

Award Identifier / Grant number: NRF2023S1A5A2A01074277

Acknowledgments

This research received support from a grant from the Ministry of Education and the National Research Foundation of Korea (NRF2023S1A5A2A 01074277). We extend our gratitude to two anonymous reviewers for their insightful feedback and to the co-editors-in-chief, Stefan Gries and Stefanie Wulff, for their continued support. We are also grateful to Minhaeng Lee for his valuable comments on the initial version of this paper. It is needless to state that all shortcomings and errors are solely our responsibility.

Data availability: The annotated data, python scripts and statistical scripts underlying this article are available online at https://github.com/joyennn/causative-alternation.

Appendix A: Alternating verbs included in the analysis

Break verbs

break, chip, crack, crash, crush, fracture, rip, shatter, smash, snap, splinter, split, tear

Deadjectival verbs (zero-related to adjectives)

blunt, clean, clear, cool, crisp, dim, dirty, dry, dull, empty, even, firm, level, loose, mellow, muddy, narrow, open, pale, quiet, round, short, shut, slack, slim, slow, smooth, sober, sour, steady, tame, tense, thin, warm

Other alternating verbs of change of state

abate, advance, age, air, alter, atrophy, awake, balance, blast, burn, burst, capsize, change, char, chill, clog, close, collect, compress, condense, contract, corrode, crumble, decompose, decrease, deflate, defrost, degrade, diminish, dissolve, distend, divide, double, drain, ease, enlarge, expand, explode, fade, fill, flood, fray, freeze, frost, fuse, grow, halt, heal, heat, hush, ignite, improve, increase, inflate, kindle, light, loop, mature, melt, multiply, overturn, pop, quadruple, rekindle, reopen, reproduce, rupture, scorch, sear, shrink, shrivel, sink, soak, splay, sprout, steep, stretch, submerge, subside, taper, thaw, tilt, tire, topple, triple, unfold, vary, warp

Appendix B: The number of causative and noncausative occurrence of the 135 verbs

Verb	Caus	NCaus	Total	Verb	Caus		NCaus	Total
Abate	3	35	38	Advance	21		30	51
Age	0	2	2	Air	1		0	1
Alter	33	12	45	Atrophy	0		5	5
Awake	0	11	11	Balance	10		2	12
Blast	10	9	19	Blunt	13		1	14
Break	26	10	36	Burn	6		9	15
Burst	3	30	33	Capsize	3		9	12
Change	46	77	123	Char	1		3	4
Chill	23	7	30	Chip	7		6	13
Clean	16	7	23	Clear	29		9	38
Clog	12	3	15	Close	35		22	57
Collect	36	3	39	Compress	19		4	23
Condense	10	18	28	Contract	7		17	24
Cool	5	17	22	Corrode	9		13	22
Crack	13	24	37	Crash	7		38	45
Crisp	2	1	3	Crumble	2		44	46
Crush	31	4	35	Decompose	5		13	18
Decrease	5	40	45	Deflate	19		10	29
Defrost	3	2	5	Degrade	11		3	14
Dim	15	28	43	Diminish	23		25	48
Dirty	8	0	8	Dissolve	17		11	28
Distend	2	3	5	Divide	10		4	14
Double	28	61	89	Drain	18		10	28
Dry	7	11	18	Dull	17		6	23
Ease	25	12	37	Empty	20		10	30
Enlarge	10	9	19	Even	2		0	2
Expand	16	36	52	Explode	3		63	66
Fade	0	50	50	Fill	44		9	53
Firm	29	7	36	Flood	15		11	26
Fracture	11	1	12	Fray	2		14	16
Freeze	6	10	16	Frost	1		0	1
Fuse	9	13	22	Grow	7		90	97
Halt	16	30	46	Heal	10		21	31
Heat	4	2	6	Hush	10		11	21
Ignite	14	8	22	Improve	23		27	50
Increase	46	73	119	Inflate	14		7	21
Kindle	9	10	19	Level	12		4	16
Light	31	3	34	Loop	12		13	25
Loose	9	2	11	Mature	0		21	21
Mellow	8	17	25	Melt	5		26	31
Muddy	2	0	2	Multiply	17		15	32
Narrow	28	44	72	Open	86		33	119
Overturn	21	8	29	Pale	0		45	45
Pop	10	21	31	Quadruple	4		0	4
Quiet	5	2	7	Rekindle	23		0	23
Reopen	23	26	49	Reproduce	13		5	18
Rip	19	9	28	Round	13		1	14
Rupture	11	6	17	Scorch	12		10	22
Sear	12	6	18	Shatter	4		4	8
Short	0	1	1	Shrink	6		75	81
Shrivel	4	23	27	Shut	20		3	23
Sink	6	33	39	Slack	3		3	6
Slim	5	3	8	Slow	9		22	31
Smash	7	11	18	Smooth	14		1	15
Snap	11	17	28	Soak	12		6	18
Sober	7	20	27	Sour	15		18	33
Splay	2	10	12	Splinter	5		14	19
Split	9	13	22	Sprout	9		12	21
Steady	2	1	3	Steep	0		2	2
Stretch	13	34	47	Submerge	8		3	11
Subside	0	72	72	Tame	11		1	12
Taper	0	3	3	Tear	6		0	6
Tense	4	31	35	Thaw	3		16	19
Thin	3	19	22	Tilt	33		16	49
Tire	4	7	11	Topple	12		21	33
Triple	3	4	7	Unfold	24		25	49
Vary	1	108	109	Warm	15		14	29
Warp	3	6	9
Total						1,620	2,244	3,864

References

Alexiadou, Artemis. 2010. On the morpho-syntax of (anti-)causative verbs. In Malka Rappaport Hovav, Edit Doron & Ivy Sichel (eds.), Syntax, lexical semantics and event structure, 177–203. Oxford: Oxford University Press.10.1093/acprof:oso/9780199544325.003.0009Search in Google Scholar

Alexiadou, Artemis & Edit Doron. 2012. The syntactic construction of two non-active voices: Passive and middle. Journal of Linguistics 48(1). 1–34. https://doi.org/10.1017/s0022226711000338.Search in Google Scholar

Alexiadou, Artemis, Elena Anagnostopoulou & Florian Schäfer. 2006. The properties of anticausatives crosslinguistically. In Mara Frascarelli (ed.), Phases of interpretation, 175–199. Berlin & New York: Mouton de Gruyter.10.1515/9783110197723.4.187Search in Google Scholar

Alexiadou, Artemis & Florian Schäfer. 2006. Instrument subjects are agents or causers. In Proceedings of the 25th west coast conference on formal linguistics, 40–48. Somerville: Cascadilla Proceedings Project.Search in Google Scholar

Baayen, R. Harald. 2008. Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.10.1017/CBO9780511801686Search in Google Scholar

Baayen, R. Harald, Doug J. Davidson & Douglas M. Bates. 2008. Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language 59(4). 390–412. https://doi.org/10.1016/j.jml.2007.12.005.Search in Google Scholar

Croft, William. 1990. Possible verbs and the structure of events. In Savas L. Tsohatzidis (ed.), Meanings and prototypes: Studies in linguistic categorization, 48–73. London: Routledge.Search in Google Scholar

Croft, William. 1991. Syntactic categories and grammatical relations: The cognitive organization of information. Chicago: University of Chicago Press.Search in Google Scholar

Fausey, Caitlin M., Bria L. Long, Aya Inamori & Lera Boroditsky. 2010. Constructing agency: The role of language. Frontiers in Psychology 1. 162. https://doi.org/10.3389/fpsyg.2010.00162.Search in Google Scholar

Fausey, Caitlin M. & Lera Boroditsky. 2011. Who dunnit? Cross-Linguistic differences in eye-witness memory. Psychonomic Bulletin & Review 18(1). 150–157. https://doi.org/10.3758/s13423-010-0021-5.Search in Google Scholar

Grice, H. Paul. 1975. Logic and conversation. In Peter Cole & Jerry M. Morgan (eds.), Syntax and semantics, Vol. 3: Speech acts, 41–58. New York: Academic Press.10.1163/9789004368811_003Search in Google Scholar

Hardie, Andrew. 2012. CQPweb – combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics 17(3). 380–409. https://doi.org/10.1075/ijcl.17.3.04har.Search in Google Scholar

Haspelmath, Martin. 1993. More on typology of inchoative/causative verb alternations. In Bernard Comrie & Maria Polinsky (eds.), Causatives and transitivity, Vol. 23, 87–121. Amsterdam & Philadelphia: John Benjamins.10.1075/slcs.23.05hasSearch in Google Scholar

Haspelmath, Martin. 2008. Frequency vs. iconicity in explaining grammatical asymmetries. Cognitive Linguistics 19(1). 1–33. https://doi.org/10.1515/cog.2008.001.Search in Google Scholar

Haspelmath, Martin. 2016. Universals of causative and anticausative verb formation and the spontaneity scale. Lingua Posnaniensis 58(2). 33–63. https://doi.org/10.1515/linpo-2016-0009.Search in Google Scholar

Haspelmath, Martin, Andreea Calude, Michael Spagnol, Heiko Narrog & Elif Bamyaci. 2014. Coding causal-noncausal verb alternations: A form-frequency correspondence explanation. Journal of Linguistics 50(3). 587–625. https://doi.org/10.1017/s0022226714000255.Search in Google Scholar

Heidinger, Steffen & Richard Huyghe. 2024. Semantic roles and the causative-anticausative alternation: Evidence from French change-of-state verbs. Linguistics 62(1). 159–202. https://doi.org/10.1515/ling-2021-0207.Search in Google Scholar

Horn, Laurence R. 1984. Towards a new taxonomy for pragmatic inference: Q-Based and R-based implicature. In Deborah Schiffrin (ed.), Georgetown University round Table on languages and linguistics, 11–42. Washington: Georgetown University Press.Search in Google Scholar

Hundt, Marianne, Melanie Röthlisberger & Elena Seoane. 2021. Predicting voice alternation across academic Englishes. Corpus Linguistics and Linguistic Theory 17(1). 189–222. https://doi.org/10.1515/cllt-2017-0050.Search in Google Scholar

Jaeger, T. Florian. 2008. Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language 59(4). 434–446. https://doi.org/10.1016/j.jml.2007.11.007.Search in Google Scholar

Jaeger, T. Florian. 2011. Corpus-based research on language production: Information density and reducible subject relatives. In Emily M. Bender & Jennifer E. Arnold (eds.), Language from a cognitive perspective: Grammar, usage and processing, 161–197. Stanford: CSLI Publications.Search in Google Scholar

Lakoff, George. 1990. Women, fire and dangerous things. Chicago: University of Chicago Press.Search in Google Scholar

Lee, Hanjung. 2023. Cause identifiability and the causative alternation in English: A corpus-based analysis. Linguistic Research 40(3). 353–385.Search in Google Scholar

Lee, Minhaeng. 2021. Tokile censan uyconmwunpep yenkwu – khophesuenehakcek cepkun [A study of the computational dependency grammar of German – a corpus linguistic approach]. Seoul: Yeklak.Search in Google Scholar

Levin, Beth. 1993. English verb classes and alternations. Cambridge: The MIT Press.Search in Google Scholar

Levin, Beth. 2015. Semantics and pragmatics of argument alternations. The Annual Review of Linguistics 1(1). 63–83. https://doi.org/10.1146/annurev-linguist-030514-125141.Search in Google Scholar

Levin, Beth & Malka Rappaport Hovav. 1995. Unaccusativity: At the syntax-lexical semantics interface. Cambridge: The MIT Press.Search in Google Scholar

Levshina, Natalia. 2015. How to do linguistics with R: Data exploration and statistical analysis. Amsterdam: John Benjamins.10.1075/z.195Search in Google Scholar

Levshina, Natalia. 2016. Why we need a token-based typology: A case study of analytic and lexical causatives in fifteen European languages. Folia Linguistica 49(2). 487–520. https://doi.org/10.1515/flin-2016-0019.Search in Google Scholar

Levshina, Natalia. 2020. Conditional inference trees and random forests. In Magali Paquot & Stefan Th. Gries (eds.), A practical handbook of corpus linguistics, 611–643. Cham: Springer.10.1007/978-3-030-46216-1_25Search in Google Scholar

Levshina, Natalia. 2022. Communicative efficiency: Language Structure and use. Cambridge: Cambridge University Press.10.1017/9781108887809Search in Google Scholar

Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard & David McClosky. 2014. The Stanford Core NLP natural language toolkit. In Proceedings of the 52nd annual meeting of the association for computational linguistics: System demonstrations, 55–60. Baltimore: Association for Computational Linguistics.10.3115/v1/P14-5010Search in Google Scholar

de Marneffe, Marie-Catherine & Christopher D. Manning. 2008/2016. Stanford typed dependencies manual. Stanford: Stanford Natural Language Processing Group.10.3115/1608858.1608859Search in Google Scholar

Martens, Scott. 2013. TüNDRA: A web application for treebank search and visualization. In Proceedings of the 12th workshop on treebanks and linguistic theories (TLT12), 133–144. Sofia: Bulgarian Academy of Sciences.Search in Google Scholar

Martin, Fabienne. 2020. Aspectual differences between agentive and non-agentive uses of causative predicates. In Elitzur A. Bar-Asher Siegal & Nora Boneh (eds.), Perspectives on causations: Selected papers from the Jerusalem workshop 2017, 257–294. Cham: Springer.10.1007/978-3-030-34308-8_8Search in Google Scholar

Martin, Fabienne, Yining Nie, Chiara Dal Farra, Silvia Silleresi, Maria-Teresa Guasti & Artemis Alexiadou. 2023. Agentivity in child and adult event descriptions. Paper presented at the 2023 LSA Annual Meeting, Denver, 5-8 January 2023, USA.Search in Google Scholar

McKoon, Gail & Talke Macfarland. 2000. Externally and internally caused change of state verbs. Language 76(4). 833–858. https://doi.org/10.2307/417201.Search in Google Scholar

McCawley, James D. 1978. Conversational implicature and the lexicon. In Peter Cole (ed.), Syntax and semantics, Vol. 9: Pragmatics, 245–259. New York: Academic Press.10.1163/9789004368873_009Search in Google Scholar

Nedjalkov, Vladimir P. 1969. Nekotoryje verojatnostnyje universalii v glagol’nom slovoobrazovanii [Some probabilistic universals in verbal derivation]. In Igor’ F. Vardul’ (ed.), Jazykovyje universalii i lingvističeskaja tipologija [Language universals and linguistic typology], 106–114. Moscow: Nauka.Search in Google Scholar

Qi, Peng, Yuhao Zhang, YuHui Zhang, Jason Bolton & Christopher D. Manning. 2020. Stanza: A python natural language toolkit for many human languages. In Proceedings of the 58th annual meeting of the association for computational linguistics: System demonstrations, 101–108. Association for Computational Linguistics.10.18653/v1/2020.acl-demos.14Search in Google Scholar

Radford, Andrew. 1988. Transformational grammar: A first course. Cambridge: Cambridge University Press.10.1017/CBO9780511840425Search in Google Scholar

Radford, Andrew. 1997. Syntactic theory and the structure of English: A minimalist approach. Cambridge: Cambridge University Press.10.1017/CBO9781139166706Search in Google Scholar

Rappaport Hovav, Malka. 2014. Lexical content and context: The causative alternation in English revisited. Lingua 141. 8–29. https://doi.org/10.1016/j.lingua.2013.09.006.Search in Google Scholar

Rappaport Hovav, Malka. 2020. Deconstructing internal causation. In Elitzur A. Bar-Asher Siegal & Nora Boneh (eds.), Perspectives on causations: Selected papers from the Jerusalem workshop 2017, 219–256. Berlin: Springer.10.1007/978-3-030-34308-8_7Search in Google Scholar

Rappaport Hovav, Malka & Beth Levin. 2012. Lexicon uniformity and the causative alternation. In Martin Everaert, Marijana Marelj & Tal Siloni (eds.), The theta system: Argument structure at the interface, 150–176. Oxford: Oxford University Press.10.1093/acprof:oso/9780199602513.003.0006Search in Google Scholar

R Core Team. 2023. The R project for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.Search in Google Scholar

Reinhart, Tanya. 2002. The theta system - an overview. Theoretical Linguistics 28(3). 229–290. https://doi.org/10.1515/thli.28.3.229.Search in Google Scholar

Reinhart, Tanya. 2016. Concepts, syntax and their interface. Cambridge: The MIT Press.10.7551/mitpress/9780262034135.001.0001Search in Google Scholar

Samardžić, Tanja & Paola Merlo. 2018. The probability of external causation: An empirical account of crosslinguistic variation in lexical causatives. Linguistics 56(5). 895–938. https://doi.org/10.1515/ling-2018-0001.Search in Google Scholar

Schäfer, Florian. 2008. The syntax of (anti-)causatives: External arguments in change-of-state contexts. Amsterdam & Philadelphia: John Benjamins.10.1075/la.126Search in Google Scholar

Smith, Carlota S. 1978. Jespersen’s ‘move and change’ class and causative verbs in English. In Mohammed Ali Jazayery, Edgar C. Polomé & Werner Winter (eds.), Linguistic and literary studies in honor of archibald A. Hill, Vol. II: Descriptive linguistics, 101–109. The Hague: Mouton.Search in Google Scholar

Strobl, Carolin, Anne-Laure Boulesteix, Achim Zeileis & Torsten Hothorn. 2007. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 25(8).10.1186/1471-2105-8-25Search in Google Scholar

Szmrecsanyi, Benedikt, Jason Grafmiller, Benedikt Heller & Malanie Röthlisberger. 2016. Around the world in three alternations: Modeling syntactic variation in varieties of English. English World-Wide 37(2). 109–137. https://doi.org/10.1075/eww.37.2.01szm.Search in Google Scholar

Tagliamonte, Sali & Herald R. Baayen. 2012. Models, forests and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24(2). 135–178. https://doi.org/10.1017/s0954394512000129.Search in Google Scholar

Talmy, Leonard. 1976. Semantic causative types. In Masayoshi Shibatani (ed.), Syntax and semantics, Vol. 6: The grammar of causative constructions, 43–116. New York: Academic Press.10.1163/9789004368842_003Search in Google Scholar

Talmy, Leonard. 2000. Toward a cognitive semantics. Cambridge: The MIT Press.10.7551/mitpress/6847.001.0001Search in Google Scholar

Thompson, Sandra A. 1987. The passive in English: A discourse perspective. In Robert Channon & Linda Shockey (eds.), In honor of ilse lehiste, 497–512. Dordrecht: Foris.10.1515/9783110886078.497Search in Google Scholar

Wolff, Phillip. 2003. Direct causation in the linguistic coding and individuation of causal events. Cognition 88(1). 1–48. https://doi.org/10.1016/s0010-0277(03)00004-0.Search in Google Scholar

Wolff, Phillip, Ga-hyun Jeon & Yu Li. 2009. Causers in English, Korean, and Chinese and the individuation of events. Language and Cognition 1(2). 165–194. https://doi.org/10.1515/langcog.2009.009.Search in Google Scholar

Wright, Saundra K. 2001. Internally caused and externally caused change of state verbs. Evanston, IL: Northwestern University Dissertation.Search in Google Scholar

Wright, Saundra K. 2002. Transitivity and change of state verbs. In Proceedings of the 28th meeting of the berkeley linguistics society, 339–350. Berkeley: Berkeley Linguistics Society.10.3765/bls.v28i1.3849Search in Google Scholar

Received: 2024-04-27

Accepted: 2025-02-20

Published Online: 2025-04-22

This work is licensed under the Creative Commons Attribution 4.0 International License.

https://doi.org/10.1515/cllt-2024-0047

Keywords for this article

causative alternation; cause identifiability; change-of-state verbs; communicative efficiency; external causality; intentionality

Creative Commons

BY 4.0