Probabilistic reduction and constructionalization: a usage-based diachronic account of the diffusion and conventionalization of the Spanish la de  <noun> que construction

Matti Marttinen Larsson

doi:10.1515/cog-2023-0112

Article Open Access

Probabilistic reduction and constructionalization: a usage-based diachronic account of the diffusion and conventionalization of the Spanish la de <noun> que construction

Matti Marttinen Larsson

Published/Copyright: October 11, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Cognitive Linguistics Volume 35 Issue 4

Abstract

This paper scrutinizes the conventionalization of the Spanish expression la de <noun> que (‘the amount of <noun> that’), a reduced variant of la cantidad de <noun> que. The study seeks to determine the diachrony of and mechanisms underlying the emergence and diffusion of the la de <noun> que expression and whether it has conventionalized to develop into an independent form-function pairing. A Bayesian mixed-effects logistic regression analysis of approximately 2000 observations of diachronic corpus data tests the influence of the conditional probability of lexemes in the noun slot and the register, which both turn out to have a meaningful effect. It is argued that the initial omission of cantidad can be accounted for by appealing to the notion of probabilistic reduction, whereby omission is feasible in contexts involving a high degree of constructional predictability. In the mapping out of change, conventionalization of the innovative la de <noun> que is most observable in contexts involving high constructional predictability and is least prominent in contexts of low constructional predictability. On the grounds that, over time, the la de <noun> que progressively has become stylistically divergent from the longer expression, the two constructions are claimed to be functionally distinct.

Keywords: constructionalization; accessibility; frequency effects; predictability; reductive change

1 Introduction

Through a real-time multivariate diachronic analysis, the present paper analyzes the diffusion and conventionalization of the Spanish la de <noun> que expression (literally ‘the of <noun> that’, meaning ‘the amount/number of <noun> that’), exemplified in (1)–(4):

¡La de planes que tenía dentro de ella!

Lit. ‘Oh, the Ø of plans she had inside of her!’

= “Oh, the number of plans she had inside of her!”

(Clarín, Su único hijo, 1891, CORDE)

Mira la reina de Inglaterra, la de años que lleva.

Lit. ‘Look at the Queen of England, the Ø of years that she has been.’

= “Look at the Queen of England, the number of years that she has been [on the throne].”

(Luis Goytisolo, Chispas, 2019, CORPES XXI)

Y no sabe la de vueltas que dan las cosas dentro de la cabeza.

Lit. ‘And you don’t know the Ø of turns that things take inside the head.’

‘And you don’t know how many twists and turns things take inside the head.’

(Itziar Pascual, Variaciones sobre Rosa Parks, 2007, CORPES XXI)

No, si malo nunca ha sido… Trabajador, cumplidor –en su trabajo–: yo no puedo decir que me haya dado una mala noche. Desgraciadamente. Que me gustaría a mí que un hombre me la diera. A la de mujeres que habrá hecho felices con una nueva dentadura.

Lit. ‘[…] To the Ø of women he must have made happy with a new set of teeth.’

“No, he’s never been bad… Hardworking, diligent – in his job – I can’t say he’s ever given me a bad night. Unfortunately. I wish a man would give me one. To the many women he must have made happy with a new set of teeth.”

(Elvira Lindo, La ley de la selva, 1995, CREA)

According to the Spanish Royal Academy’s grammar (Real Academia Española and Asociación de Academias de la Lengua Española 2009: § 42.16 m), the la de <noun> que expression is a highly productive schema in contemporary Spanish that has emerged quite recently. Yet, in spite of its high degree of productivity, empirical corpus-based studies of the la de <noun> que expression are seemingly nonexistent. There appear to be only a few descriptions of the construction; Torrego (1988: 114) describes the expression as an expression of quantity that is – in his view – plausibly an elliptical variant of the expression la cantidad de cosas. The full la cantidad de <noun> is exemplified in (5)–(6).

[…] la anciana sonrió imaginando la cantidad de vida que ellos creían haber vivido ya […].

“The elderly woman smiled imagining the amount of life that they thought to have lived already …”. (Blanca Álvarez, Ópalo, 2009, CORPES XXI)

El primer libro que me regalaron fue un álbum de Hergé: La estrella misteriosa. Debió de ser allá por el año 62. Pese a la cantidad de vueltas que da la vida aún lo conservo.

“The first book I was given was an album by Hergé: ‘The Shooting Star.’ It must have been around the year 1962. Despite the twists and turns of life, I still have it.” (El Mundo, 20/04/1996, Día del Libro. Lo que se puede hacer con él […], CREA)

Arias (2023) agrees with Torrego in that the la de <noun> que expression has its origin in la cantidad de <noun> que. Arias (2023: 15) posits that elision of cantidad has brought about the grammaticalization of the definite article la into “a weak evaluative quantifier tantamount to mucho [‘a lot’]”. Moreover, Arias (2023: 15) asserts that “When asked what «La de…» refers to, speakers unanimously coincide that it comes from «La cantidad de…».” The semantic equivalence of the two variant expressions is seconded by Calvo (1987: 110) who views both as exclamatives denoting quantity. In other descriptions of the expression, the la de <noun> que expression is only mentioned in isolation, that is, unrelatedly to la cantidad de <noun> que. In Carbonero Cano’s (1990: 117, 125), Bosque’s (2017: 24) and the Spanish Royal Academy’s (Real Academia Española and Asociación de Academias de la Lengua Española 2010: § 42.4.e) view, the la de <noun> que expression is an exclamative expression of quantity. For Gutiérrez-Rexach and Andueza (2011: 197), it is a nominal exclamative that can be used both as an embedded and non-embedded exclamative. Arias (2023) puts forward a generative account of the expression and argues that it can be classified as a pseudopartitive construction that is used both as a primary and a partial exclamative.

The above-outlined accounts suggest that the la de <noun> que expression possesses two fundamental properties. First, the description of the expression being an elliptical expression (e.g., Torrego 1988: 114) implies that it can be regarded as a mere situationally reduced (i.e., online) variant expression of the full la cantidad de <noun> que. Second, the claim that the construction is exclamative (Arias 2023; Bosque 2017; Calvo 1987; Carbonero Cano 1990; Gutiérrez-Rexach and Andueza 2011) indicates that it only partially matches the different uses of the full la cantidad de <noun> que, which can be used both exclamatively and non-exclamatively (cf. the difference between Oh, the amount of life that I have left in me! and The amount of salt needed to prepare the dish is substantial).

While it seems rather unproblematic to accept that the la de <noun> que expression has sprung out of the longer la cantidad de <noun> que expression, the above-outlined accounts nonetheless raise several issues. First, expressions involving omission have recurrently been shown to arise due to online processes whereby predictable phonemes, words, and constituents become omitted, arguably as a means to enhance communicative efficiency (Aylett and Turk 2004; Bybee 2006, 2010; Gibson et al. 2019; Jaeger 2010; Jaeger and Buz 2017; Jurafsky et al. 2001; Kurumada and Jaeger 2015; Levshina 2022a; Roland et al. 2007 among others). In discourse, speakers can reduce what is predictable from context in order to avoid over-specification of contextually salient and accessible information (cf. Jaeger’s 2010 Uniform Information Density hypothesis). The reductive effect of predictability has been observed both at a phonological and at a morphosyntactic level (see Jaeger and Buz 2017 for an overview). In the context of the present study, this would lead to the prediction that if la de <noun> que is a reduced variant of the full la cantidad de <noun> que construction, then the reduced variant should likely be used in contexts involving a high degree of predictability (Levshina 2022a: 230). What is more, online omission can grammaticalize so that the effects become conventionalized (Bybee 2002b, 2006, 2010; Jaeger 2010; Levshina 2022a; Lorenz 2013; Rickford et al. 1995). Given that there is ample evidence that online omission can bring about reductive language change, as in the case for the expression under scrutiny in this study, this would ultimately be at odds with the view that the reduced la de <noun> que is merely an elliptical variant (Torrego 1988). Moreover, according to the Principle of No Synonymy (Goldberg 1995), two constructions that differ in form (e.g., coding length) should also differ in meaning or function. That is, if two expressions stem from the same source but vary in surface coding length as a consequence of synchronic probabilistic reduction (i.e., what is predictable from discourse is omitted), the grammaticalization of the reduced variant would plausibly bring about its development into an independent form-meaning or form-function pairing (Lorenz 2013). Research on communicatively efficient reductive changes indicates that reduced variants often emancipate to develop register-specific uses and reduced variants are typically restricted to more informal registers (Levshina and Lorenz 2022; Lorenz 2013; Rickford et al. 1995). Naturally, this begs the question: if the la de <noun> que expression is a reduced variant of the la cantidad de <noun> que construction, to what extent has the reduced variant developed into an independent form-function pairing?

Second, even though the la de <noun> que expression has recurrently been described as an exclamative (Arias 2023; Bosque 2017: 24; Calvo 1987: 110; Carbonero Cano 1990: 117, 125; Real Academia Española/Asociación de Academias de la Lengua Española 2009: § 42.16 m, 2010: § 42.4.e), the reduced la de <noun> que expression manifests uses that, arguably, cannot be classified as exclamatives. Consider, for instance, example (7). According to Zanuttini and Portner (2003), embedded exclamatives can only appear under factive predicates. However, when embedded under factive verbs in the present tense and with a first person subject, the verb cannot be negated because negation would deny the speaker’s knowledge, which, logically, contradicts the factive presupposition of the exclamative (Zanuttini and Portner 2003: 46–47). Example (7), which has been extracted from the Corpus Oral y Sonoro del Español (The Audible Corpus of Spoken Rural Spanish), shows an instance of the reduced la de <noun> que used in an embedded context under the negated factive verb saber ‘to know’ conjugated in the first-person present tense:

Pues entonces era, el ta, era tabaco negro, no era rubio, porque ahora ya es casi todo, todo es rubio, pero entonces era negro. Entonces lo plantaban a mano…, hay que plantarlo a mano, planta por planta, una finca que yo no sé la de hectáreas que tiene, no sé […]

Lit. ‘[…] a plantation that I do not know the Ø of hectares that it is […]’.

“So back then, it was, the to-, the dark tobacco, it was not light tobacco, because now almost everything is, everything is light, but back then it was dark. Back then they planted it by hand…, you have to plant it by hand, plant by plant, a plantation that I don’t know how many hectares it is, I don’t know […].” (COSER-1015_01, Navalmoral de la Mata, 1999)

Following Zanuttini and Portner’s (2003) above-discussed criterion, example (7) throws doubt on the veracity of the characterization of the la de <noun> que expression as an exclamative (Arias 2023; Bosque 2017: 24; Calvo 1987: 110; Carbonero Cano 1990: 117, 125; Real Academia Española/Asociación de Academias de la Lengua Española 2009: § 42.16 m, 2010: § 42.4.e). While a comprehensive pragmatico-semantic analysis of the (non-)exclamative nature of the la de <noun> que expression remains outside the scope of the present study, example (7) clearly demonstrates that its use is not limited to the exclamative domain. With this in mind, while also taking into account the fact that online omission can very well bring about grammaticalization of reduced patterns, it seems plausible to posit a cline of grammaticalization by which la de <noun> que gradually has spread from non-embedded and embedded exclamative contexts to entirely non-exclamative contexts (such as example (7)).

In a first attempt to quantitatively and empirically approach the la de <noun> que expression, the present study pursues the first issue outlined above, that is, to determine to what extent the la de <noun> que expression has developed into an independent form-function pairing. Future studies should consider addressing the hypothesized cline of change. Here, however, the central issue that the present study deals with is determining to what extent the two expressions differ – can the la de <noun> que expression be understood as a reduced and conventionalized construction? Put differently, has the reduced variant developed properties that warrant categorizing it as a distinct construction?

The focus of this study is set on the analysis of data from European Spanish, and the study uses diachronic corpora and traces the diffusion of the la de <noun> que construction in real time. It quantitatively analyzes how the two variants – the full la cantidad de <noun> que construction and the reduced la de <noun> que – compete in diachrony. The analysis is approached from a usage-based, constructionist, and cognitive perspective, and the effects of usage, entrenchment, and conventionalization are mapped out in the history of the la de <noun> que expression.

The paper is organized as follows. Section 2 outlines the central theoretical premises that the present study draws on. On the basis of theories of communicative efficiency and reductive language change, a series of predictions are formulated that the study sets out to test. Section 3 describes the materials that were compiled, along with the data cleaning, coding, and annotation process. Section 4 explains the statistical approach and presents the results obtained from a Bayesian mixed-effects logistic regression analysis. Section 5 contains a general discussion and some concluding remarks, and the results are discussed in light of the formulated hypotheses. Moreover, the explanatory power of two competing accounts of probabilistic reduction and communicative efficiency is expounded upon (speaker-centered vs. addressee-centered accounts). The results are also connected to other recent findings relating to the role played by constructional predictability in syntactic enhancement processes in Spanish. Finally, directions for future research are outlined.

2 Communicative efficiency, coding length reduction, and constructionalization

Communicative efficiency involves speakers’ capacity to strike a balance between expressing as much and, at the same time, as little information as necessary to convey the intended meaning and ensure successful information transmission. It has been widely shown that one such communicatively efficient strategy hinges on the reduction or omission of linguistic units that are contextually highly predictable (Aylett and Turk 2004; Bybee 2002b; Gibson et al. 2019; Haspelmath 2008; Haspelmath 2021; Jaeger 2010; Kurumada and Jaeger 2015; Levshina 2022a; Levshina and Lorenz 2022; Norcliffe and Jaeger 2016; Priva and Jaeger 2018; Wasow et al. 2011). Omitting highly predictable information is communicatively efficient because it complies with the goal of being understood while still favoring ease of production (Jaeger 2010; Kurumada and Jaeger 2015).

The motivations underlying the omission and reduction of predictable information have been the subject of debate. On the one hand, reduction has been hypothesized to be an addressee-centered strategy according to which the speaker derives the predictability of structures and their possible omission on the basis of contextual, cognitive, and pragmatic principles (Levshina and Lorenz 2022: 260; for an overview, see Jaeger and Buz 2017: 53–55). On the other hand, reduction has been characterized as a fundamentally speaker-centered process whereby reduction is the cause of the automation of linguistic production and “highly practiced neuromotor activity” (Bybee 2002b: 268). This means that reduction largely hinges on the repetition of patterns of co-occurrence that, through repeated usage, become increasingly fluent, automated, and accessed holistically as a chunk (Bybee 2002a). Starting from the observation that such patterns are pervasive in language use (e.g., going to > gonna, kind of > kinda, etc.), Bybee (2002a: 112) formulates the Linear Fusion Hypothesis according to which “Items that are used together fuse together”. Importantly, this means that what are compressed are phonologically neighboring segments.

These accounts are both highly relevant to the analysis of the la [cantidad/Ø] de <noun> que expression because they allow us to formulate two competing hypotheses of its workings. An addressee-centered efficiency account would posit that the motivation underlying the omission of cantidad can be ascribed to probabilistic reduction whereby “speakers tend to produce shorter linguistic forms and more reduced signals for contextually predictable parts of the message” (Jaeger and Buz 2017: 39; see also Jurafsky et al. 2001). Probabilistic reduction thus emerges online in discourse and hinges on reducing parts of the linguistic signal that are highly expected. Conversely, what is not expected cannot be successfully omitted. In a constructionist model of communicative efficiency, this has been formulated as the Hypothesis of Construction-Lexeme Accessibility and Formal Length, according to which:

The less accessible (probable) a lexeme given a construction or a construction given a lexeme, the greater the chances of the longer constructional alternative; conversely, the more accessible (probable) the lexeme given the construction, the greater the chances of the shorter variant. (Levshina 2022a: 230)

In the context of the present study, this means that noun slot fillers that are endowed with a high constructional predictability should be contextually more accessible and, therefore, more likely to be used with the reduced construction (la de <probable noun> que; see also Jaeger and Buz 2017: 42). Less probable noun slot fillers should, conversely, be more likely to occur in the full construction (la cantidad de <improbable noun> que). This hypothesis takes into account that what is accessible hinges on the cognitive, pragmatic, and contextual status of (some piece of) information (here: a lexeme). Diachronically, if the la de <noun> que expression has developed into a new form-function pairing, we would expect it to increase in type frequency as it becomes increasingly emancipated from its source construction (Traugott and Trousdale 2013: 114). During such type frequency increase, we would also expect – in accordance with the Hypothesis of Construction-Lexeme Accessibility and Formal Length (Levshina 2022a) – that the diffusion of the la de <noun> que expression would be dependent on how accessible (probable) the noun slot fillers are, with the reduced construction firstly recruiting constructionally highly accessible lexemes and spreading over time to increasingly improbable noun slot fillers. This type of diffusion from highly accessible nouns towards increasingly accessible nouns can be expected because if an exemplar is reduced in the wrong context (e.g., when the full construction involving the noun slot filler is not accessible), it becomes confusable and the exemplar is prevented from being stored, and thus it cannot spread through the language (Jaeger and Buz 2017: 167; Pierrehumbert 2001: 152).

On the contrary, under an automation account (Bybee 2002b, 2006, 2010), we would assume that the noun slot lexeme filler plays a more peripheral role in offline reductive change; instead, what takes center stage here is the linear fusion of linguistic structures. Under this view, it would necessarily be the case that the initial reduction of cantidad occurs online (due to probabilistic reduction, similarly to the account laid out above) and that initial uses adhere to the same accessibility principles as laid out above. Over time, repeated reduction brings about increasing debilitation of exemplars of la cantidad de (cf. Lorenz 2013: 233). In this respect, this hypothesis aligns with the above-outlined speaker-centered account. However, they differ as concerns the diachronic mapping out of change. According to the automation account, we would expect that what is fused together is the co-occurrence of la de because automation relies on the fusion of phonologically high-frequency neighboring structures (Bybee 2002a, 2002b). Bybee’s model of reduction lends less stronger support than the addressee-centered account in favor of the view that the degree of accessibility of the individual lexemes used in the noun slot would play a pivotal role in the diachronic mapping out of grammaticalization and constructionalization because emphasis is placed on the fact that “items that are used together frequently come to be processed together as a unit” (Bybee 2006: 721). If the reduced variant is stored as an automated chunk, this would mean that, diachronically, the reduced la de <noun> que expression should diffuse evenly across lexemes because the speaker has automated the la de sequence and the accessibility of the noun slot plays less of a role. Alternatively, chunking would take place over larger segments involving the noun in the noun slot. For example, high-frequency use of collocational patterns (e.g., la de veces que lo he tenido que buscar… ‘the number of times that I have had to go look for him…’) could likely become increasingly entrenched so that they develop into prefabricated instances of the construction (e.g., la de <noun> que → la de <veces/gente/años/cosas…> que ‘the number/amount of <times/people/years/things> that’). Under an automation view of reduction, these prefabs are predicted to grammaticalize earlier or at a faster rate than the general construction (Bybee and Torres 2008: 409). In the context of the present study, this would mean that we could expect the conventionalization of la de <noun> que to have taken place earlier or at a faster rate in contexts involving nouns with high conditional probabilities.

The validity of these competing hypotheses remains to be evaluated through a diachronic analysis, which the present study offers. Regardless of the motivation – be it speaker-centered or audience-oriented – the obvious difference between the two expressions (la de <noun> que vs. la cantidad de <noun> que) is that they differ in coding length. According to a constructionist framework (Goldberg 1995), the fact that these expressions differ in surface manifestation should mean that they also differ in function or meaning. This principle is known as the Principle of Synonymy (Goldberg 1995: 67), which posits: “If two constructions are syntactically distinct, they must be semantically or pragmatically distinct […]. Pragmatic aspects of constructions involve particulars of information structure, including topic and focus, and additionally stylistic aspects of the construction such as register.” Consequently, a constructionist view on la [cantidad/Ø] de <noun> que would posit that the two expressions – the reduced and the full – differ pragmatically, functionally, semantically, or in some other respect.

Lorenz (2013: 231) argues that reduced variants can become emancipated from their full source forms and that they undergo functional divergence as a consequence of this. This model of reductive emancipation in conjunction with the Principle of No Synonymy thus allows us to predict that the la de <noun> que and the la cantidad de <noun> que should be functionally distinct, which would warrant classifying the reduced construction as an independent form-function pairing. One approach to test the degree of functional divergence between the two expressions is by analyzing the role of register. Stylistic variation is a major determinant of coding length, with shorter coding lengths typically correlating with informality (Kurumada and Jaeger 2015; Levshina and Lorenz 2022; Rickford et al. 1995). We can thus expect such differences to surface in the analysis if this constructional alternation occurs. Arias (2023) describes the la de <noun> que expression as a colloquial trait, but so far this claim has not been assessed empirically.

The present study thus hypothesizes that the two expressions represent two independent constructions that are functionally distinct (Goldberg 1995: 4). The reduced variant is hypothesized to be more typical of informal registers, whereas the full variant is more likely to be used in formal registers (Kurumada and Jaeger 2015; Levshina and Lorenz 2022; Rickford et al. 1995). Moreover, the reduced variant is expected to have emerged in contexts involving a high degree of contextual predictability, that is, where cantidad is easily inferable and can be omitted without leading to unsuccessful transmission of information (Jaeger 2010; Levshina and Lorenz 2022). Over time, the reduced variant should likely undergo constructionalization as indicated by its functional emancipation from the full variant (Lorenz 2013) and by the increase in its type frequency (Traugott and Trousdale 2013). This type frequency increase is hypothesized to occur in either of two manners depending on the fundamental motivation for which constructionalization-by-reduction was instantiated to begin with; either diffusion hinges on the degree of accessibility of the noun slot filler, which would be indicative of diffusion being fundamentally addressee-oriented, or diffusion occurs evenly across contexts, which would testify in favor of a production-based automation account of reductive change (Bybee 2002a, 2002b, 2006, 2010; Bybee and Scheibman 1999).

In the section that follows, the methods and data used are described.

3 Overview of methods and data compilation, annotation, and distribution

Here, the data compilation is explained, along with the processes relating to data cleaning and annotation. The methods are also discussed and some descriptive distributions of the data are reviewed.

3.1 Data compilation

Data were compiled from three diachronic corpora: Corpus Diacrónico del Español (CORDE, -1974), Corpus de Referencia del Español Actual (CREA, 1975–2001), and Corpus del Español del Siglo XXI (CORPES XXI, 2000–2023 version 1.0). The data compilation was carried out in the fall of 2023. Data were compiled from the year 1800 onwards, with the geographical area limited to Spain. Only text data were retrieved, leaving aside transcriptions of oral speech in the more contemporary corpora.

The search strings used in the corpora were the following: la cantidad de * que, La cantidad de * que, la de * que, La de * que, la cantidad de * * que, La cantidad de * * que, la de * * que, and La de * * que. This means that the returned searches consisted of the two constructions containing one or two words between de and the relative que. All of the occurrences obtained from the corpora were extracted for further data cleaning and annotation.

3.2 Data treatment

Data cleaning consisted of the removal of duplicates and the removal of observations that did not contain any of the constructions in question. For the la cantidad de <noun> que construction, some occurrences were found where the noun slot was filled by other structures such as pronouns. The noun slot of the reduced constructions only admits zero-marked or plural nouns. Therefore, these occurrences containing other structures than zero-marked or plural nouns were also removed, given that such uses are not possible in the reduced construction (Alija 1999: 307).

3.3 Distribution and selection of data

The final dataset consisted of 2,706 occurrences of the constructions. A diachronic distributional analysis showed that there were 478 occurrences of the constructions between the 19th century and 1975. However, during this time period of approximately 90 years, there were only 34 occurrences of the la de <noun> que construction. This means that, while the la de <noun> que construction was evidently used, it was rather infrequent, particularly in comparison to the “full” constructions (N = 444). From 1976 onwards, the la de <noun> que construction gained in productivity. In order to circumvent data scarcity issues in the statistical modelling, the quantitative diachronic analysis brought forward in the next section will focus on the most vivid phase of change, in this case between 1976 and 2022 (N = 2,228).

3.4 Predictors

3.4.1 Constructional predictability

As hypothesized in Section 2, the probability of omission of cantidad should depend on the degree of constructional predictability (Divjak 2019: 151–153; Levshina 2018, 2022a; Schmid and Küchenhoff 2013; Stefanowitsch and Gries 2003). The more probable the combination of the construction’s component parts, the higher the probability of cantidad omission. The likelihood of a certain lexeme appearing in either of the two constructions is determined by calculating the log-transformed conditional probability of the lexeme’s basic form, that is, dividing (the basic form of) a certain lexeme (e.g. la de veces que ‘the amount of times that’, la cantidad de gente que ‘the amount of people that’, …) by the N of the construction (Levshina 2018; Levshina and Lorenz 2022; Schmid 2010).^[1] The hypothesis is that at the genesis of the la de <noun> que construction, lexemes with higher conditional probabilities (i.e. constructional predictability) should be the strongest cues to activate the mental representation of the construction in the mind of the speaker (Stefanowitsch and Gries 2003). Given this high degree of constructional predictability, these constructions are the most likely to allow omission. In other words, reduced constructions are sanctioned when the source construction and collocational pattern is frequent enough to be stored and accessed holistically.

Recall that two competing hypotheses were formulated above, referred to as the addressee-centered account and Bybee’s automation account. In statistical terms, these two competing hypotheses can be tested by computing an interaction term between real time (year) and the conditional probability of the noun slot lexeme. If this interaction is statistically non-meaningful, this would mean that the effect is stable over time, which would be in favor of the addressee-centered account: basically, diffusion would stably depend on the accessibility of the noun slot lexeme and omission would occur through probabilistic reduction (cf. Jaeger and Buz 2017). Conversely, should the interaction be statistically meaningful, this would mean that the effect of the parameter changes over time. Such an effect would thus be in line with Bybee’s automation account, and either the effect of the conditional probability of the lexeme weakens over time, which would be in favor of the view that la de has developed into an independent chunk, or the conventionalization of the la de <noun> que expression takes place at different rates across different contexts, with lexemes that have a particularly high conditional probability leading the change in terms of timing and/or in terms of rate of change (Bybee and Torres Cacoullos 2008: 409). These competing hypotheses are tested in the multivariate analysis.

3.4.2 Register

The corpus data contained two types of written macro-registers – fiction and non-fiction.^[2] The non-fictional register contained informational written texts such as academic prose, news articles, legal texts, and other similar works. While the different types of written non-fictional registers may differ in terms of topics and communicative purpose, they share certain situational and linguistic features, especially in comparison to fiction (Biber and Conrad 2019: 139). Importantly, fiction is particularly distinguished from other written registers in terms of communicative goals and stylistic choices (Biber and Conrad 2019: 112). Moreover, fiction is generally considered to represent a more informal type of register in comparison to non-fiction (Biber 1986; Yamada 2022). Lastly, from the perspective of communicative efficiency, more informal registers are expected to opt for shortness in coding more so than formal registers (Rohdenburg 1996: 159). This study set out to test whether the two constructions are functionally distinct in terms of register differences in written language.

In the section that follows, the results of the multivariate analysis are presented.

4 Statistical modelling and analysis

This study adopted a Bayesian approach to the quantitative analysis. Useful introductions to Bayesian statistics can be found in Kruschke (2015), Nicenboim and Vasishth (2016), Sorensen et al. (2016), Kimball et al. (2019), and Levshina (2022b). A Bayesian mixed-effects logistic regression analysis was conducted using the brm function in the brms package (Bürkner 2021) in R (R Core Team 2024). In contrast to frequentist statistics, Bayesian methods incorporate prior knowledge (called prior probabilities or priors) into the distribution of the parameters. This knowledge is updated on the basis of newly observed data. Bayesian approaches subsequently compute posterior estimates and credible intervals for these estimates. Credible intervals represent the range within which there is a 95 % probability that the parameter value lies. Intervals that do not include zero suggest that the predictor variable has a meaningful effect on the outcome variable.

Priors have the capacity to influence the posterior distribution to varying extents, depending on the specificity of the prior. Here, weakly informative priors were used. This means that no strong prior beliefs about the distribution of the data were incorporated, and instead the observed data were allowed to substantially influence the estimate. One main advantage of weakly informative priors is that they help prevent extreme estimates and contribute to a more stable model. For the intercept, a prior with normal distribution was used with a mean of 0 and a standard deviation of 5. For the regression coefficients, a normal prior with a mean of 0 and a standard deviation of 1 was used.

The model was fitted with the population-level parameters year, lexeme conditional probability (see Section 3) and register. The numerical variable year was centered. To measure how probabilistic collocational patterns evolve in real time, the lexeme conditional probability was computed as an interaction parameter with year. register was included as an independent parameter in order to analyze whether the two constructions are functionally distinct in terms of register. Group-level (random) effects included author to account for language user-specific idiosyncrasies.

The Bayesian mixed-effects logistic regression model was run on four chains of 4,000 iterations each (2,000 warmup and 1,600 total post-warmup draws). The target acceptance rate was set at 0.95. A high acceptance rate (called adapt_delta) reduces the risk of divergent transitions and increases the validity of the posterior samples (Bürkner 2021: 10). Rhat values for all parameters were 1.0, which means that all the chains converged and mixed well. Posterior predictive checks aligned well, indicating that the predictions of the model were reliable.

The R² statistic is a goodness-of-fit measure that indicates the proportion of variance in the dependent variable (full la cantidad de <noun> que vs. reduced la de <noun> que) that is explained by the model parameters. The conditional R² considers both the population-level parameters and the group-level parameters, while the marginal R² only takes into account the amount of variance explained by the group-level parameters.

Conditional R²: 0.539 (95 % CI [0.483, 0.584])
Marginal R²: 0.254 (95 % CI [0.194, 0.319])

Overall, the conditional R² statistic indicated that the full model explained a rather large portion of the observed variance (approximately 54 %). K-fold cross-validation (k = 10) was performed to validate the model. An evaluation of the predictions yielded by the cross-validation showed that the Mean Absolute Error was 0.19, meaning that the misclassification rate was approximately 19 %. The full regression model is reported in Table 1.

Table 1:

Results of the Bayesian mixed-effects logistic regression model. Statistically meaningful effects are marked in bold.

Parameter	Posterior mean	Est. Error	lower 95 % CrI	upper 95 % CrI
Intercept	−2.71	0.37	−3.46	−1.99
Year (centered)	0.72	0.27	0.23	1.26
Lexeme conditional probability (log)	0.29	0.05	0.19	0.39
Register: Fiction (reference level: Non-fiction)	3.82	0.31	3.24	4.48
Year (centered) × Lexeme conditional probability (log)	0.06	0.04	−0.03	0.15

As Table 1 shows, all of the considered population-level parameters were statistically meaningful as main effects, as their credible intervals did not include zero. The probability of the reduced la de <noun> que expression increased significantly over time, as indicated by the positive estimate of year (0.72). High conditional probabilities of the lexeme in the noun slot favored omission of cantidad. Moreover, fictional registers significantly favored the reduced expression in comparison to non-fictional registers. As concerns the interaction term specified between year and lexeme conditional probability, this had no statistically meaningful effect on the type of expression used. I will discuss this result below.

In Figure 1, the results from the Bayesian mixed-effects logistic regression model were plotted using the interactions package (Long 2019).

Figure 1:

Interaction between year and the conditional probability of the noun in the noun slot. Results are plotted by register.

As Figure 1 clearly shows, there is a close to categorical functional differentiation between the two constructions in terms of register, and the la de <noun> que construction is almost only used in fictional texts. This finding supports the hypothesis that the reduced variant is correlated with more informal language use, which is in line with previous studies on online reduction and reductive change (Rickford et al. 1995; Kurumada and Jaeger 2015; Levshina and Lorenz 2022; see also Arias 2023). As the left-most figure depicting non-fictional texts shows, a slight probabilistic increase is observed in non-fictional texts from approximately 2010 onwards. However, whether this increase is due to actual constructional change (that is, that the construction would expand its stylistic domain) or to environmental changes (for instance, that the non-fictional register is perhaps undergoing colloquialization) cannot be determined on the basis of this data. In all, though, the fact that there is a clear-cut and statistically meaningful differentiation between the two registers confirms the hypothesis that the two constructions are functionally distinct.

Focusing on the fictional register, Figure 1 shows that the la de <noun> que construction has diffused significantly over the course of approximately 50 years. At the earliest stage, the overall predicted probability of the la de <noun> que construction is 0.18 (CI; 0.09, 0.32). At the latest stage, the predicted probability is 0.54 (CI: 0.41, 0.66). In the contexts that are most favorable of the reduced expression – that is, in contexts involving lexemes with a high conditional probability – the predicted probability of the la de <noun> que at the latest stage reaches 0.77 (CI: 0.63, 0.87). In contrast, nouns with low conditional probabilities exhibit a more modest predicted probability (0.34, CI: 0.21, 0.49). Importantly, as hinted at above, the fact that the credible interval for the interaction between year and lexeme conditional probability contained zero (cf. upper and lower CIs in Table 1) means that there has been no meaningful difference in the effect of the conditional probability of the lexeme over time. Instead, the effect of the lexeme’s probability has been stable over time.

Taken together, the results depicted in Figure 1 show that the reduced la de <noun> que construction has undergone a vivid phase of conventionalization over the course of roughly 50 years, and it appears to constitute an expression that is proper of more informal registers. With these results in mind, the next section discusses the observed change from a constructionist perspective and from the viewpoint of communicative efficiency.

5 Discussion and concluding remarks

This paper has brought forward a usage-based constructionist account of the rise, diffusion, and conventionalization of a previously understudied construction, namely the Spanish la de <noun> que expression. In this study, the focus was placed on written European Spanish. By means of a multivariate analysis of the written corpus data, the study sought to shed light on the status of the la de <noun> que construction vis-à-vis the full la cantidad de <noun> que construction. The Bayesian mixed-effects regression analysis presented in Section 4 sheds light on the real-time diffusion of the la de <noun> que construction across the fictional register, where it became increasingly conventionalized over the analyzed timespan.

It was hypothesized that the la de <noun> que construction emerged as a means to achieve communicative efficiency by reducing the coding length of the construction in contexts where the expression could be felicitously interpreted in spite of omission (cf. Jaeger 2010; Levshina 2022a). Concretely, in line with Levshina (2022a) it was hypothesized that when the noun slot lexeme is probable given the construction, there will be a higher probability of using the reduced construction. This hypothesis was confirmed. In terms of the diachronic mapping out of change, two competing hypotheses were formulated: according to addressee-centered accounts of communicative efficiency we would expect the accessibility of the lexeme in the noun slot to influence the choice of coding length due to probabilistic reduction. When the lexeme is inaccessible or improbable, the use of a reduced construction may involve a high risk of confusability, and so speakers would more likely opt for the longer construction in these cases. This view emphasizes the probabilistic relationships in collocational patterns and how these relationships may determine coding length variability (cf. Levshina 2022a: 244). A crucial point here is that, in line with fundamentally addressee-centered accounts, these effects were hypothesized to be stably present over time and that speakers attempt to infer the degree of accessibility of the construction and the slot-filler in discourse. The competing hypothesis started from automation accounts (Bybee 2002b, 2006, 2010) which hold that reduction is fundamentally speaker-centered. Starting from this perspective, it was conjectured that, due to the automating effect of repetition of high-frequency patterns, two alternative outcomes were plausible: either the effect of the lexemes’ conditional probabilities levels out over time as the la de sequence becomes increasingly consolidated as a non-compositional chunk, or high-frequency collocations evolve into prefabricated chunks of the reduced construction, which should lead them to grammaticalize earlier or at a faster rate than the general construction (Bybee and Torres Cacoullos 2008: 409). Statistically speaking, should the automation hypothesis turn out to align with the changes undergone by la de <noun> que, this would mean that we should identify a meaningful effect of the interaction between year and the conditional probability of the lexeme. The results show that there was no meaningful effect of the above-discussed interaction. This finding thus renders the automation account implausible. The results presented here are chiefly in line with the speaker-centered account because throughout the analyzed timespan the full and felicitous parsing of the la de <noun> que construction appears to hinge on the predictability of the noun slot lexeme.

Previous descriptions of the la de <noun> que expression have not considered the role of the noun slot lexeme. Recall that Arias (2023: 15) proposes that the la de <noun> que construction has grammaticalized and that, through this process, the definite article la is reanalyzed into a weak evaluative marker equal to mucho ‘a lot’. Such an account would render the prediction that, following reanalysis, the la de <noun> que construction should be able to occur with any lexeme in the noun slot because the definite article would have been reanalyzed into an exclamative quantifier (Arias 2023: 19). Were the la fully exclamative, it should be able to take any lexeme in the noun slot. Considering the gradience in different collexemes’ constructional probabilities reported on in the present study, we can draw the conclusion that this is evidently not the case, as not all lexemes are equally likely to occur in the reduced la de <noun> que construction. This fact casts doubt on Arias’ grammaticalization account. The account advanced here instead places emphasis on the construction as a reflex of mechanisms related to communicative efficiency that are, nonetheless, conventionalizing (cf. Jaeger 2010). However, omission is only feasible and efficient to the extent that the linguistic signal can successfully convey the intended message (Kurumada and Jaeger 2015: 154). This places constraints on the la de <noun> que construction that are active throughout the analyzed timespan. This result lends diachronic support to Levshina’s (2022a) Hypothesis of Construction-Lexeme Accessibility and Formal Length on a non-English language, which – to the best of the author’s knowledge – has been lacking.

Another issue that has remained largely unaddressed in previous descriptions of the la de <noun> que construction is its diachrony. While there have been indications of the construction being rather innovative (Real Academia Española and Asociación de Academias de la Lengua Española 2009: § 42.16 m; Arias 2023), no previous attempt has been made so far to examine its conventionalization diachronically. The present study has shown that while the construction is attested already in the 19th century (as already pointed out by Real Academia Española and Asociación de Academias de la Lengua Española 2009: § 42.16 m), in written language it remained rather marginal until the 1970s, which appears to be the starting point of a rather rapid phase of conventionalization. Over the course of approximately 50 years, it increased substantially in overall probability of use, although limited to more informal registers. It is fundamentally this stylistic aspect of the construction that sets it apart from the la cantidad de <noun> que and that warrants classifying it as an independent form-function pairing. In light of this, we can posit that constructionalization is taking place (Traugott and Trousdale 2013). Here, emancipation and type frequency increases have been used as diagnostics of constructionalization. First, while at an initial stage the reduced construction may have been used interchangeably as a merely online reduced variant of the full construction, over time the la de <noun> que appears to have become largely emancipated from its source construction because it has undergone functional divergence (cf. Lorenz 2013: 231), rendering the two constructions functionally distinct. Second, the collexeme increases seen in Figure 1 show that the la de <noun> que construction has increased substantially in type frequency over time, which is also a tell-tale indication of ongoing constructionalization (Traugott and Trousdale 2013: 114).

This study has some limitations worth addressing. For one, only written language was analyzed, and it is highly likely that the la de <noun> que construction exhibits numerous constraints in the spoken language that fall outside of the scope of the present study. Furthermore, it is highly noteworthy that a large part of the variation was accounted for by inter-speaker variation (as shown by the R² statistics; see Section 4). This suggests that a fruitful avenue of research could lie in determining the social significance of the la de <noun> que construction, as well as analyzing other possible aspects of functional differentiation than that between the two types of written registers analyzed in this paper. Furthermore, given that differentiation also involves dialectal differences (Goldberg 2019; Leclercq and Morin 2023), future research should include other varieties of Spanish in order to detect possible dialectal differentiations with regard to the constructionalization process under study (cf. Arias 2023: 18).

In all, the diachronic analysis of real-time corpus data has shed light on a constructionalization process that has previously escaped the attention of corpus-based research. By appealing to a fundamentally probabilistic framework of reduction and reductive change, the study has argued that the innovation can be largely explained as a reflex of communicative efficiency. Over time, the reduced construction has conventionalized and developed into an independent form-function pairing. The degree of addressee comprehensibility as assessed by the speaker has been argued to be the principal motivation to account for synchronic layering and for the trajectory of diffusion of the construction across linguistic contexts and real time. Each new use of the construction pushes the limits of the construction’s lexical range. A conceivable extension is determined on the basis of its predictability from context.

This perspective on the analyzed change opens up new intriguing avenues of research. One issue pertains to the semantic properties of the noun in the la cantidad/ Ø de <noun> que noun slot. If, as argued here, the degree of accessibility (cf. Levshina 2022a) determines the coding length of the construction, it seems reasonable to hypothesize that the concreteness of the noun should influence the choice of construction because abstract nouns are cognitively costlier to process (Jessen et al. 2000). Other properties of the noun could potentially also influence coding length, such as whether the entity is a countable or noncountable noun. Therefore, future research would benefit from analyzing the semantics of the noun slot lexeme.

Moreover, there are other similar reductive phenomena in Spanish on which the approach adopted in the present study could be replicated. One very similar expression is the exclamative qué de <noun> (lit. *‘what of <noun>‘, meaning ‘how many <noun>‘, e.g. ¡Qué de flores había ahí! ‘There were so many flowers there!’) expression, which appears to have emerged through omission of the very same noun as the la de <noun> que construction – that is, cantidad ‘amount/number’ (cf. González Calvo 1987: 110). The qué de <noun> expression seems to have a much longer history of use than the la de <noun> que construction and a search in the diachronic CORDE corpus documents uses of it already towards the very end of the 15th century:

¡Oh cuando saltarle vea,

qué de abrazos le dará!

Lit. * ‘Oh, when he sees you leap, what of embraces he will give you!’

(Fernando de Rojas, La Celestina, c. 1499, CORDE).

The hypothesis that the qué de <noun> expression has emerged through omission of cantidad (i.e., qué cantidad de <noun> → qué de <noun>) remains tentative, pending future research. If, however, such a relationship is established, the results presented here provide strong support for the prediction that constructionalization of qué de <noun> would play out in the same manner as it has for the construction under study here – that is, as a process that hinges on speakers’ sensitivity to probabilistic reduction.

Another issue worthy of further scrutiny pertains to the cross-varietal stability of probabilistic constraints in diachrony. Marttinen Larsson (under evaluation) analyzed the variable and changing use of antecedent-agreeing definite articles in Spanish oblique relative clauses (e.g., la casa [en Ø/la que] nací ‘the house in which I was born’, el coche [en Ø/el] que vamos ‘the car that we are going in’). The observed change was one towards increasing use of the definite article. Marttinen Larsson argued that the definite article has emerged as an accessibility marker in contexts where the antecedent is difficult to retrieve due to accessibility and processing constraints. One such major constraint is the conditional probability of the antecedent given the relative clause, and the emerging use of the definite article in the oblique relative clauses was found to occur in contexts where the antecedents are probabilistically speaking highly improbable, e.g. with antecedents that have low conditional probabilities. Over time, the innovative oblique relative with the definite article (e.g., en la que, en el que, en las que, and en los que) spread towards contexts involving increasingly accessible antecedents. This syntactic change is thus towards probabilistic enhancement (cf. Jaeger and Buz 2017), and coding length becomes extended (e.g., en que → en [el/la/los/las] que) to facilitate the retrieval of the antecedent. Over time, this use has become more and more conventionalized. Importantly, Marttinen Larsson (under evaluation) found that the real-time actualization of the change across contexts takes place in the same manner across various varieties of Spanish: in the three regional varieties of Spanish that Marttinen Larsson analyzed, the conventionalization of the longer variant was shown to spread cross-varietally almost identically across contexts, diffusing from contexts involving highly inaccessible antecedents (in nonrestrictive oblique relative clauses with antecedents of low conditional probabilities that are also far-distanced, plural, and indefinite) towards increasingly accessible ones (in restrictive oblique relative clauses with antecedents of high conditional probabilities that are adjacent, singular, and definite). Given these results testifying to the pervasiveness of the constraints underlying probabilistic enhancement, the analysis of the la de <noun> que construction should be replicated on other varieties of Spanish to ascertain the inter-varietal coherence in the influence of constructional predictability during a process of probabilistic reduction.

Data availability

The dataset, R code, and R environment used in the present study are available in the OSF repository: https://doi.org/10.17605/OSF.IO/KSRB2.

Corresponding author: Matti Marttinen Larsson, University of Gothenburg, Gothenburg, Sweden; Stockholm University, Stockholm, Sweden; and Humboldt-Universität zu Berlin, Berlin, Germany, E-mail: matti.marttinen.larsson@gu.se

Funding source: Vetenskapsrådet

Award Identifier / Grant number: 2022-00303

Acknowledgements

This work was supported by funding from the Swedish Research Council (grant number 2022-00303). I am grateful to the three anonymous reviewers, an anonymous associate editor, and the journal editor-in-chief, Dagmar Divjak, for their insightful comments and constructive feedback, which have improved the manuscript significantly. I also extend my thanks to the audience at the 6th Variation and Language Processing Conference (VALP6) for valuable input. Naturally, all remaining errors are my own.

References

Alija, Francisco Javier Grande. 1999. La gramática de la emoción: los enunciados exclamativos. Contextos 33–36. 279–308.Search in Google Scholar

Arias, Juan José. 2023. «¡La de + N + que…!» The Feminine Definite Article in Spanish Exclamative Clauses. Languages 8(4). 274. https://doi.org/10.3390/languages8040274.Search in Google Scholar

Aylett, Matthew & Alice Turk. 2004. The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech 47(1). 31–56. https://doi.org/10.1177/00238309040470010201.Search in Google Scholar

Biber, Douglas. 1986. Spoken and written textual dimensions in English: Resolving the contradictory findings. Language 62(2). 384–414. https://doi.org/10.2307/414678.Search in Google Scholar

Biber, Douglas & Susan Conrad. 2019. Register, genre, and style, 2nd edn. Cambridge: Cambridge University Press.10.1017/9781108686136Search in Google Scholar

Bosque, Ignacio (ed.). 2017. Advances in the analysis of Spanish exclamatives. Columbus: The Ohio State University Press.10.26530/OAPEN_625759Search in Google Scholar

Bürkner, Paul-Christian. 2021. Bayesian item response modeling in R with brms and Stan. Journal of Statistical Software 100(5). 1–54. https://doi.org/10.18637/jss.v100.i05.Search in Google Scholar

Bybee, Joan. 2002a. Sequentiality as the basis of constituent structure. In Bertram F. Malle & Talmy Givón (eds.), The evolution of language out of pre-language, 109–134. Amsterdam/Philadelphia: John Benjamins.10.1075/tsl.53.07bybSearch in Google Scholar

Bybee, Joan. 2002b. Word frequency and context of use in the lexical diffusion of phonetically conditioned sound change. Language Variation and Change 14(3). 261–290. https://doi.org/10.1017/S0954394502143018.Search in Google Scholar

Bybee, Joan. 2006. From usage to grammar: The mind’s response to repetition. Language 82(4). 711–733. https://doi.org/10.1353/lan.2006.0186.Search in Google Scholar

Bybee, Joan. 2010. Language, usage and cognition. Cambridge: Cambridge University Press.10.1017/CBO9780511750526Search in Google Scholar

Bybee, Joan & Joanne Scheibman. 1999. The effect of usage on degrees of constituency: The reduction of don’t in English. Linguistics 37(4). 575–596. https://doi.org/10.1515/ling.37.4.575.Search in Google Scholar

Bybee, Joan & Rena Torres. 2008. Phonological and grammatical variation in exemplar models. Studies in Hispanic and Lusophone Linguistics 1(2). 399–414. https://doi.org/10.1515/shll-2008-1026.Search in Google Scholar

Calvo, José Manuel González. 1987. Sobre la expresión de lo “superlativo” en español (IV). Anuario de Estudios Filológicos 10. 101–132.Search in Google Scholar

Carbonero Cano, Pedro. 1990. Configuración sintáctica de los enunciados exclamativos. Philologia Hispalensis 1(5). 111–138. https://doi.org/10.12795/PH.1990.v05.i01.09.Search in Google Scholar

Divjak, Dagmar. 2019. Frequency in language: Memory, attention and learning. Cambridge: Cambridge University Press.10.1017/9781316084410Search in Google Scholar

Gibson, Edward, Richard Futrell, Steven P. Piantadosi, Isabelle Dautriche, Kyle Mahowald, Leon Bergen & Roger Levy. 2019. How efficiency shapes human language. Trends in Cognitive Sciences 23(5). 389–407. https://doi.org/10.1016/j.tics.2019.09.005.Search in Google Scholar

Goldberg, Adele E. 1995. Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press.Search in Google Scholar

Goldberg, Adele E. 2019. Explain me this: Creativity, competition, and the partial productivity of constructions. New Jersey: Princeton University Press.10.2307/j.ctvc772nnSearch in Google Scholar

Gutiérrez-Rexach, Javier & Patricia Andueza. 2011. Degree restrictions in Spanish exclamatives. In Luis A. Ortiz-López (ed.), Selected proceedings of the 13th Hispanic linguistics symposium, 286–295. Somerville, MA: Cascadilla Proceedings Project.Search in Google Scholar

Haspelmath, Martin. 2008. Creating economical morphosyntactic patterns in language change. In Jeff Good (ed.), Linguistic universals and language change, 185–214. Oxford: Oxford University Press.10.1093/acprof:oso/9780199298495.003.0008Search in Google Scholar

Haspelmath, Martin. 2021. Explaining grammatical coding asymmetries: Form–frequency correspondences and predictability. Journal of Linguistics 57(3). 605–633. https://doi.org/10.1017/S0022226720000535.Search in Google Scholar

Jaeger, T. Florian. 2010. Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology 61(1). 23–62. https://doi.org/10.1016/j.cogpsych.2010.02.002.Search in Google Scholar

Jaeger, T. Florian & Esteban Buz. 2017. Signal reduction and linguistic encoding. In Eva M. Fernández & Helen Smith Cairns (eds.), The handbook of psycholinguistics, 38–81. Hoboken: Wiley.10.1002/9781118829516.ch3Search in Google Scholar

Jessen, Frank, Reinhard Heun, Michael Erb, Dirk Oliver Granath, Uwe Klose, Andreas Papassotiropoulos & Wolfgang Grodd. 2000. The concreteness effect: Evidence for dual coding and context availability. Brain and Language 74(1). 103–112. https://doi.org/10.1006/brln.2000.2340.Search in Google Scholar

Jurafsky, Daniel, Alan Bell, Michelle Gregory & William D. Raymond. 2001. Probabilistic relations between words: Evidence from reduction in lexical production. In Joan L. Bybee & Paul J. Hopper (eds.), Frequency and the emergence of linguistic structure, 229–254. Amsterdam: John Benjamins Publishing Company.10.1075/tsl.45.13jurSearch in Google Scholar

Kimball, Amelia E., Kailen Shantz, Christopher Eager & Joseph Roy. 2019. Confronting Quasi-separation in logistic mixed effects for linguistic data: A Bayesian approach. Journal of Quantitative Linguistics 26(3). 231–255. https://doi.org/10.1080/09296174.2018.1499457.Search in Google Scholar

Kruschke, John K. 2015. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Boston: Academic Press.10.1016/B978-0-12-405888-0.00008-8Search in Google Scholar

Kurumada, Chigusa & T. Florian Jaeger. 2015. Communicative efficiency in language production: Optional case-marking in Japanese. Journal of Memory and Language 83. 152–178. https://doi.org/10.1016/j.jml.2015.03.003.Search in Google Scholar

Leclercq, Benoît & Cameron Morin. 2023. No equivalence: A new principle of no synonymy. Constructions. Constructions 15(1). 1–16. https://doi.org/10.24338/CONS-535.Search in Google Scholar

Levshina, Natalia. 2018. Probabilistic grammar and constructional predictability: Bayesian generalized additive models of help + (to) Infinitive in varieties of web-based English. Glossa: A journal of general linguistics 3(1). https://doi.org/10.5334/gjgl.294.Search in Google Scholar

Levshina, Natalia. 2022a. Communicative efficiency: Language Structure and use. Cambridge: Cambridge University Press.10.1017/9781108887809Search in Google Scholar

Levshina, Natalia. 2022b. Comparing Bayesian and frequentist models of language variation: The case of Help + (to-)Infinitive. In Ole Schützler & Julia Schlüter (eds.), Data and Methods in corpus linguistics: Comparative approaches, 224–258. Cambridge: Cambridge University Press.10.1017/9781108589314.009Search in Google Scholar

Levshina, Natalia & David Lorenz. 2022. Communicative efficiency and the principle of No synonymy: Predictability effects and the variation of want to and wanna. Language and Cognition 14(2). 249–274. https://doi.org/10.1017/langcog.2022.7.Search in Google Scholar

Long, Jacob A. 2019. Package ‘interactions’ Available at: https://interactions.jacob-long.com/.Search in Google Scholar

Lorenz, David. 2013. Contractions of English semi-modals: The emancipating effect of frequency. Albert-Ludwigs-Universität Freiburg PhD Dissertation.Search in Google Scholar

Marttinen Larsson, Matti. under review. Pathways of actualization across regional varieties and the real-time dynamics of syntactic change.Search in Google Scholar

Nicenboim, Bruno & Shravan Vasishth. 2016. Statistical methods for linguistic research: Foundational Ideas—Part II. Language and Linguistics Compass 10(11). 591–613. https://doi.org/10.1111/lnc3.12207.Search in Google Scholar

Norcliffe, Elisabeth & T. Florian Jaeger. 2016. Predicting head-marking variability in Yucatec Maya relative clause production. Language and Cognition 8(2). 167–205. https://doi.org/10.1017/langcog.2014.39.Search in Google Scholar

Pierrehumbert, Janet B. 2001. Exemplar dynamics: Word frequency, lenition and contrast. In Joan L. Bybee & Paul J. Hopper (eds.), Frequency and the emergence of linguistic structure, 137–157. Amsterdam: John Benjamins Publishing Company.10.1075/tsl.45.08pieSearch in Google Scholar

Priva, Uriel Cohen & T. Florian Jaeger. 2018. The interdependence of frequency, predictability, and informativity in the segmental domain. Linguistics Vanguard 4(s2). 20170028. https://doi.org/10.1515/lingvan-2017-0028.Search in Google Scholar

R Core Team. 2024. R: A Language and Environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.Search in Google Scholar

Real Academia Española & Asociación de Academias de la Lengua Española. 2009. Nueva gramática de la lengua española, Vol. 2. Madrid: Espasa.Search in Google Scholar

Real Academia Española & Asociación de Academias de la Lengua Española. 2010. Nueva gramática de la lengua española: Manual. Madrid: Espasa.Search in Google Scholar

Rickford, John R., Thomas A. Wasow, Norma Mendoza-Denton & Juli Espinoza. 1995. Syntactic variation and change in progress: Loss of the verbal Coda in topic-restricting as far as constructions. Language 71(1). 102. https://doi.org/10.2307/415964.Search in Google Scholar

Rohdenburg, Günter. 1996. Cognitive complexity and increased grammatical explicitness in English. Cognitive Linguistics 7(2). 149–182. https://doi.org/10.1515/cogl.1996.7.2.149.Search in Google Scholar

Roland, Douglas, Frederic Dick & Jeffrey L. Elman. 2007. Frequency of basic English grammatical structures: A corpus analysis. Journal of Memory and Language 57(3). 348–379. https://doi.org/10.1016/j.jml.2007.03.002.Search in Google Scholar

Schmid, Hans-Jörg. 2000. English abstract nouns as conceptual shells: From corpus to cognition. Berlin: De Gruyter.10.1515/9783110808704Search in Google Scholar

Schmid, Hans-Jörg. 2010. Does frequency in text instantiate entrenchment in the cognitive system? In Dylan Glynn & Kerstin Fischer (eds.), Quantitative Methods in cognitive semantics: Corpus-driven approaches, 101–134. Berlin: De Gruyter Mouton.10.1515/9783110226423.101Search in Google Scholar

Schmid, Hans-Jörg & Helmut Küchenhoff. 2013. Collostructional analysis and other ways of measuring lexicogrammatical attraction: Theoretical premises, practical problems and cognitive underpinnings. Cognitive Linguistics 24(3). 531–577. https://doi.org/10.1515/cog-2013-0018.Search in Google Scholar

Sorensen, Tanner, Sven Hohenstein & Shravan Vasishth. 2016. Bayesian linear mixed models using Stan: A tutorial for psychologists, linguists, and cognitive scientists. The Quantitative Methods for Psychology 12(3). 175–200. https://doi.org/10.20982/tqmp.12.3.p175.Search in Google Scholar

Stefanowitsch, Anatol & Stefan Th. Gries. 2003. Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics 8(2). 209–243. https://doi.org/10.1075/ijcl.8.2.03ste.Search in Google Scholar

Torrego, Esther. 1988. Operadores en las exclamativas con artículo determinado de valor cuantitativo. Nueva Revista de Filología Hispánica 36(1). 109–122. https://doi.org/10.24201/nrfh.v36i1.666.Search in Google Scholar

Traugott, Elizabeth Closs & Graeme Trousdale. 2013. Constructionalization and constructional changes. Oxford: Oxford University Press.10.1093/acprof:oso/9780199679898.001.0001Search in Google Scholar

Wasow, Thomas, T. Florian Jaeger & David M. Orr. 2011. Lexical variation in relativizer frequency. In Horst J. Simon & Heike Wiese (eds.), Expecting the unexpected: Exceptions in grammar, 175–196. Berlin: De Gruyter.10.1515/9783110219098.175Search in Google Scholar

Yamada, Aaron. 2022. Register effects and the Spanish adjectival construction sin + INF in historical corpus data. Isogloss. Open Journal of Romance Linguistics 8(1). https://doi.org/10.5565/rev/isogloss.147.Search in Google Scholar

Zanuttini, Raffaella & Paul Portner. 2003. Exclamative clauses: At the syntax-semantics interface. Language 79(1). 39–81. https://doi.org/10.1353/lan.2003.0105.Search in Google Scholar

Received: 2023-10-09

Accepted: 2024-09-29

Published Online: 2024-10-11

Published in Print: 2024-11-26

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/cog-2023-0112

Keywords for this article

constructionalization; accessibility; frequency effects; predictability; reductive change

Creative Commons

BY 4.0