Home Linguistics & Semiotics Coalescence and contraction of V-to-Vinf sequences in American English – Evidence from spoken language
Article Publicly Available

Coalescence and contraction of V-to-Vinf sequences in American English – Evidence from spoken language

  • David Lorenz EMAIL logo and David Tizón-Couto
Published/Copyright: March 30, 2017

Abstract

This paper addresses the issue of coalescence of frequent collocations and its consequences for their realization and mental representation. The items examined are ‘semi-modal’ instantiations of the type V-to-Vinf, namely have to, used to, trying to and need to, in American English. We explore and compare their realization variants in speech, considering the effects of speech-internal and extra-linguistic factors (speech rate, stress accent, phonological context, speech situation, age of the speaker), as well as possible effects of analogy with established contractions like gonna, wanna. Our findings show a high degree of coalescence in the items under study, but no clear pattern of contraction. The propensity for contraction in analogy to gonna/wanna is strongly affected by phonological properties – it is inhibited by the presence of a fricative in have/used to. Moreover, the most frequent reduced realizations are conservative in terms of transparency and still allow morphological parsing of the structure. More radical contractions are restricted to rapid and informal speech, and less entrenched as variants. This shows the limitations of reduction as a frequency effect in light of the balance between articulatory ease and explicitness in speaker–hearer interaction. Even in highly frequent and strongly coalesced items, reduction (articulatory ease) is restricted by a tendency to retain cues to morphological structure (explicitness). Finally, we propose a network of pronunciation variants that includes representation strengths as well as analogy relations across constructional types.

1 Introduction

It has long been noted that certain multi-word sequences undergo phonological reduction and contraction to a single word (e. g. want to > wanna). To cognitively oriented, usage-based theories (specifically exemplar theory), these phenomena provide important insights into how the human mind processes language and stores linguistic experience. Bybee (2002, 2006) and Ellis (Ellis 2002a, 2002b) make two major arguments in this respect: that reduction is an effect of frequency and that it is gradient. The frequency argument holds that more frequent sequences are more prone to reduction due to the automization of neuro-motor routines; this implies a connection between mental representations and articulation. Regarding mental representations, frequent sequences come to be treated as single processing units (‘chunks’). Articulation, on the other hand, is seen as a mechanical procedure which will become less accurate and less detailed with repetition. The gradience argument holds that reduction in articulation does not yield any distinct, categorical variants: “[T]he degree of reduction is a continuous function of the frequency of the target word and the conditional probability of the target given the previous word and that of the target given the next word” (Ellis 2002b: 331; cf. Jurafsky et al. 2001). However, as articulation feeds back into mental representations, “a gradual and probably continuous restructuring of categories around the most frequently occurring members” (Bybee 2002: 220) takes place. Thus, frequent articulation forms come to be recognized and stored as variants. These variants in turn affect the mental representation of the item as a whole (Connine and Pinnow 2006).

A recurring question that emerges from the above reflections is what status reduced variants have in the language user’s mind. With regard to contractions and ‘chunked’ sequences, the question is also what constitutes a contraction and what status the ‘chunked’ sequence has with respect to the construction that licenses the sequence. Prominent cases of contraction in English are the semi-modals gonna, wanna and gotta. These are conventional items in spoken language; their development and their variation with the full forms are well-documented (Berglund 2000; Krug 2000; Lorenz 2013a, 2013b; see Section 2 below).

The present study moves beyond these to investigate the collocations have to, used to, trying to and need to. These are of the same constructional type (V-to-Vinf). They are also relatively frequent and may (to varying degrees) be associated with grammaticalization; yet it is less clear what the ‘contracted’ variants of these items should be or how commonly they are phonetically reduced. We therefore take an exploratory approach, first establishing what realization variants can be found in natural speech, and then analyzing their usage.

We take a usage-based approach and adopt a broad Construction Grammar perspective (cf. Goldberg 2006; Diessel 2015) in order to account for the emergence of realization variants that derive from coalescence. This approach provides a backdrop that might accommodate the gradience of variation as regards the phonological realization of frequent items. Descriptions of constructional networks can easily include pronunciation variants and the relations between them. By adopting this approach, we also try to get beyond earlier, rule-based descriptions of ‘to-contraction’ (e. g. Pullum 1997), which did not have a handle on the details of pronunciation. Only an exhaustive survey of such details in casual speech can provide answers to questions concerning cognitive representations of realization variants (see Tucker and Ernestus 2016 for an elaborate argument to this effect). In this vein, we attempt to map out the realizations of the semi-modal items and establish their pronunciation variants. We assume that pronunciation variants embody instantiations of both the smaller semi-modal meso-constructions (i. e. one for each verb) as well as the larger (or macro) V-to-Vinf construction. As it is customary within usage-based models, the different pronunciation variants are assumed to be entrenched to different degrees and, thus, have a stronger or weaker representation in the speakers’ minds. A strongly represented variant will hold a more central status within the network that comprises all of these variants.

2 Background and scope

The present study involves phonetic reduction as well as coalescence and contraction of sequences. We provide a brief review of research of these aspects in turn, followed by an account of the V-to-Vinf construction and discussion of the concepts ‘coalescence’, ‘reduction’ and ‘contraction’.

Phonetic research on reduction is, naturally, mostly concerned with its acoustic detail and conditions. The most relevant reduction here is lenition or flapping of /t/, which is distinguished by its short duration compared to other stop sounds (Umeda 1977; Zue and Laferriere 1979; Rimac and Smith 1984; see Tucker [2007: 36–42] for an overview of the articulatory characteristics of flaps). There is wide consensus that /t/-flapping is typical of American English in intervocalic position (e. g. water) as well as intervocalic /nt/ and /rt/ clusters (e. g. counter). The overall share of the flap in these contexts is well over 90 % (Zue and Laferriere 1979; Patterson and Connine 2001), with no variation regarding sex or dialect, except with flapped /t/s at word boundaries (Byrd 1994: 45–46; Patterson and Connine 2001). Thus, the /t/-flap is not only frequent in casual speech, but “indubitably a feature of even careful speech” (Shockey 2003: 30). However, use of the flap also depends on the item it occurs in – flaps are relatively dispreferred in low-frequency items and morphologically complex words (Patterson and Connine 2001).

There is less consensus regarding the perception and processing of /t/-flaps. Connine (2004) proposes that the flap is part of the lexical representation of words that it frequently occurs in. However, flap realizations were found to inhibit word recognition in some studies (Tucker and Warner 2007; Tucker 2011), or to be on a par with full /t/ realizations in others (Inhoff et al. 2002; Pitt et al. 2011). Less equivocally, Pitt et al. (2011) show that /t/-flapping (as well as other /t/ allophones) is strongly tied to its typical phonological context and that listeners show a high sensitivity to this in word recognition. Thus, the phonetic research also points us to the issue of mental representations, i. e. what forms are stored in the language user’s memory and how they are retrieved and connected (cf. Connine and Pinnow 2006; Ernestus and Warner 2011).

Phonetic reduction, including /t/-lenition and /t/-deletion (Raymond et al. 2006), has mostly been investigated word-internally. We draw on these studies for background information while it should be kept in mind that our object of study is not simplex items but sequences that represent different types of a single construction and that may be more or less chunked, hence more or less ‘word-like’. Also, most previous reduction studies have been experimental and carried out under laboratory conditions. The present study explores reduction in uncontrolled, ‘real-life’ corpus data (see Section 3 for details).

Regarding coalescence and contraction, there is, firstly, a long list of discussions of the syntactic conditions or rules of ‘to-contraction’ (i. e. Lakoff 1970; Bolinger 1981; Pullum 1997). These studies sought to explain to-contraction as a phenomenon on the level of syntax or lexicon and have focused largely on what contexts allow or block contractions such as gonna and wanna. [1] They are thus not immediately relevant to the present research, which, by contrast, is concerned with variation at the phonological level. Then, there is research from the perspective of variation and change, often with a cognitive orientation. Here, chunking and the use of reduced forms are strongly associated with the frequency of the sequence (Krug 1998). The idea is that frequent sequences are stored in memory as a whole and the individual component parts are backgrounded (‘chunking’); in speech production, then, the sequence is an articulatory routine that becomes simplified as the component parts need not be clearly discerned (Bybee 2006, 2010; Diessel 2007). This occurs especially in fixed phrases which take on functions beyond their literal compositional meaning, e. g. I don’t know > “dunno” (Scheibman 2000; Pichler 2009; Hildebrand-Edgar 2016), hell of a > “hella” (Trousdale 2012), or Spanish dice que > “dizque” (Dankel 2015).

Following this vein, Lorenz (2013a) describes the ‘emancipation’ of the contracted forms gonna, gotta and wanna, which themselves have become frequent and conventional. These forms are also stored as items, and their mental representation is changing as their ties to the source items (going to, have got to, want to) loosen. This is a gradual process; a schematic outline of its hypothesized stages is provided in Figure 1, along with the frequency effects that propel the change at each stage.

Figure 1: 
Stages of reduction and emancipation of contracted forms, adapted from Lorenz (2013a: 233).
Figure 1:

Stages of reduction and emancipation of contracted forms, adapted from Lorenz (2013a: 233).

The development from ‘on-line phonetic reduction’ to ‘stored pronunciation variant’ is the emergence of established pronunciation variants of an item/sequence, first through high frequency of the collocation (cf. Bybee 2002, 2006), then through the general (absolute) frequency of a particular realization. The further progress to a ‘stored lexical variant’ can ensue if a pronunciation variant gains in frequency (relative to other variants) and consequently its ties to the source item loosen. This is best exemplified by the case of gonna (Lorenz 2013b). The use of the variant gonna for going to strongly depends on the speaker’s age and social variables such as education and dialect region; however, articulatory reduction (e. g. /aɪmənə/, I’m gonna) is more strongly contingent on intralinguistic factors (speech rate, preceding element). Lorenz (2013b: 146) concludes “that variation [of going to vs gonna] is largely a matter of who speaks, reduction is a matter of how they speak. This sets the variation apart from the reduction process”. Similarly, Berglund (2000) reports an increased use of gonna in less formal situations in British English; Boas (2004) presents a constructional account of wanna that includes a ‘colloquial’ feature on the contraction.

When a reduced form ceases to be a sporadic outcome of imprecise articulation, it becomes a variant that may be associated with social or situational connotations. Hollmann and Siewierska (2011: 48) suggest this with regard to dialectal definite article reduction: “As a particular reduced variant takes on an indexical role in a speech community, speakers may start to use it more often than they would otherwise do (or conversely, they might avoid it)”. In Lorenz’s (2013b) study of gonna, articulatory reduction is subject to change across generations of speakers. It is almost exclusively a matter of rapid speech with older speakers, while in the younger generation other factors (region, preceding element) become relevant. “The trajectory […] is from reduction as a ‘speeding accident’ (tied to rapid speech) to established phonological variation” (Lorenz 2013b: 149). This suggests a difference between reduction that results from imprecise articulation and reduced forms that are part of the speaker’s repertoire of variants.

Thus, to better understand coalescence and contraction, we need to investigate how reduced forms emerge and take their place in a system of pronunciation variants without causing serious problems for communication. In doing so, we need to take into account the interplay of frequency, speech-related factors and speech situation (e. g. formality or ‘colloquialness’). The present study explores these issues in the collocations have to, used to, trying to and need to.

2.1 The V-to-Vinf construction

The items in question all instantiate a constructional pattern ‘V-to-Vinf’, i. e. a finite verb that subcategorizes a to-infinitive. [2] The inflected verb here takes on a modal-like function, modifying the status of the action or state expressed by the to-infinitive verb (cf. Palmer 2001: 1), as in (1a). The verbs that are possible in this slot are restricted by their compatibility with this function (or propensity to be coerced into it), such that the examples in (1b) seem illicit. Due to this function, some typical instances of this construction have been called ‘semi-modals’ (e. g. Biber et al. 1999) or ‘quasi-modals’ (e. g. Hopper and Traugott 1993) – therefore, we will also call this the ‘semi-modal construction’.

(1)

  1. I need /forgot /am trying /prefer to buy milk.

  2. I *dream/*see to buy milk.

There are also other, very similar structures with to-infinitive complements, such as passives (be supposed/believed/known to) and motion verbs (as in I’ll run to buy milk). The constructional relations of all these types are not our main concern here, but we note that a feature of the ‘semi-modal construction’ appears to be that to is attached to the finite verb rather than the following infinitive and thus can be ‘stranded’ (compare [2a] and [2b]).

(2)

  1. You can still buy milk if you need to.

  2. *You can still buy milk if you run to.

Some types of V-to-Vinf are idiomatic outcomes of grammaticalization processes (going to, have (got) to, used to) while in others the modality they express derives directly from the core meaning of the verb in question (e. g. try to, forget to). However, in all these instances the component parts of the construction are transparent (i. e. inflected verb, to, infinitive verb). As long as this is the case, all these types should be treated as instantiations of the semi-modal construction, despite the differences between them in frequency and idiomaticity. These differences are part of our investigation, as will be seen.

Our second assumption is that the more frequently a verb occurs in this construction, the more likely it is that ‘V to’ is not treated as a sequence of two separate items, but as a single chunk (cf. Krug 1998; Blumenthal-Dramé 2012: 104), and is then also more likely to undergo phonological reduction. This process of chunking and reduction is also the origin of the forms of to-contraction that have become conventional items (e. g. gonna, wanna; see Figure 1 above). The cases investigated in the present study – have to, used to, trying to, need to – are of interest in this respect, because they fall somewhere in the middle. They are arguably entrenched, show considerable variation in pronunciation, but it is not immediately clear that there is one typical reduced realization that is becoming an independent item. Thus, we can study these items at a stage of variation that may lead to further change. The relevant concepts of coalescence, reduction and contraction are discussed from a cognitive and variationist perspective, as these lead to the central questions of our analysis.

2.2 Coalescence, reduction, contraction

‘Coalescence’ is at times used as a cover term to include related phenomena such as chunking, contraction and cliticization (e. g. Krug 1998). We employ a more restrictive definition, seeing coalescence only as the manifestation of chunking, i. e. that a sequence is represented and produced as a single unit. Thus, coalescence is not necessarily tied to a change in form. Reduction phenomena may then follow from increasing coalescence: “In the course of this process, the elements of the string may lose their independence, boundaries are blurred, and the whole chunk is compressed and reduced” (Diessel 2007: 116).

Reduction in general can be an outcome of automization, which is a function of frequency (Bybee 2006; Diessel 2007), but is also linked to aspects of intonation and prosody, such as speech rate. Articulatory reduction, as well as phonetic assimilation, is a matter of economy, as speech gestures are carried out with less effort (‘hypo-articulation’ in the sense of Lindblom 1990). If we (artificially) separate frequency from speech rate, we may say that both affect pronunciation, but for different reasons. High frequency promotes reduction because of the item’s entrenchment and automization in the speaker’s mind and its easier retrievability on the part of the listener. Rapid speech promotes reduction because it requires articulatory economy. Yet, even in rapid speech, reduction is selective and sensitive to frequency. Greenberg and Fosler-Lussier (2000) report that rapid speech reduction affects high-frequency words more than low-frequency ones.

Regarding reduced forms, we will make a (somewhat stipulated) distinction between articulatory reduction and reduced pronunciation variants. By ‘articulatory reduction’ we mean any on-line mechanical simplification of the articulatory gesture; by ‘reduced pronunciation variant’ we mean a phonetic form stored in the speaker’s memory and available as a target for pronunciation. Presumably, such reduced pronunciation variants emerge out of repeated articulatory reduction (Bybee 2010, 2013) and may gain independent status if their relative frequency increases (Lorenz 2013a: 232). A contraction then, is a reduced pronunciation variant of a coalesced sequence in which the elements of the sequence are no longer clearly separable (as in want to > wanna). Broadbent and Sifaki (2013) identify /t/-lenition as the defining feature of to-contraction. This works well in cases where /t/ follows a vowel (e. g. got to) or an alveolar nasal (want to, goin’ to), that is, in a position that favors reduction to a flap, once the word boundaries have been weakened through frequent occurrence of the string (cf. Gregory et al. 1999). However, in other frequent V-to-Vinf strings, such as have to and used to, a lenition (or elision) of the /t/ is less easily achieved due to the preceding fricative, which is not a typical ‘flapping environment’. Yet, coalescence and reduction are to be expected in these items due to their frequency. We address this issue from a constructional perspective investigating four frequent types of V-to-Vinf: What realization variants do we find in speech? What factors affect the reduction/contraction of these items? Do reduced forms of different types show a common pattern of to-contraction?

2.3 Realization variants

The study presented here analyses the following V-to-Vinf strings: have/has to, used to, trying to, need to. The coalescence of a string can be evidenced by its pronunciation. In need to and trying to this evidence would be a lenition of the medial /t/ (or omission in “tryna”). [3] This /t/-lenition is to be understood as any reduction of the articulatory gesture in the voiceless alveolar stop, usually to a voiced sound or a flap. The variants tryna and needa have at times been mentioned as instances of to-contraction (Andrews 1978; Krug 2000: 211). For have/has to and used to coalescence can, in principle, have two different effects. Speakers can either devoice the fricative in assimilation to the following /t/, rendering the type /hæftə/ (“hafta”) or the initial /t/ of to can undergo lenition or complete elision. We hold that /t/-lenition is the stronger indicator of coalescence, firstly because it is less likely in a separated to, whereas devoicing of a fricative tends to also occur word-finally (rendering /hæf/ for have); and secondly because it parallels the precedent cases of gonna, gotta, wanna. We will thus follow Broadbent and Sifaki (2013) in defining to-contraction by /t/-lenition. The option of having both (voiceless fricative and reduced or omitted /t/) is also conceivable, rendering forms of the type /hæfə/ (“haffa”). This form could occur as a reduced form of “hafta”, perhaps in analogy to wanna (and in some ways similar to often). Finally, it should be noted that coalescence need not be reflected in pronunciation at all. A perfect citation-form realization of have + to can nevertheless be represented as a single unit in the speaker’s mind. However, across individual instances and speakers, coalescence and contraction are gradient, and detailed quantitative data can evidence to what degree they are actualized in the language at large. Therefore, in this paper we explore what realizations occur in American English corpus data and what the conditions of their occurrence reveal about potential variant representations.

3 Data and methods

The data used in this study is from the Santa Barbara Corpus of Spoken American English (SBC; DuBois et al. 20002005). The SBC consists of 60 speech recordings from various situations and places across the United States. It is designed to represent a cross section of American English usage, consisting largely of face-to-face conversations, but also including public talks and telephone conversations. The corpus provides a total of 249,000 words, which may seem small, but it has decisive advantages for our purpose. All the recordings are available in digitized form and come with a time-aligned transcript. This enabled us to retrieve tokens from the text and analyze them phonetically based on the recording. The analysis encompasses all tokens of have to, used to, trying to and need to that instantiate the ‘semi-modal construction’; other tokens were excluded. [4] This yielded a total of 634 tokens (356 have to, 76 used to, 106 trying to, 96 need to) from 136 different speakers. In many cases (126 tokens), a given speaker produced only one instance of an item; the most prolific individual speaker provided 18 tokens of have to.

We coded pronunciation for the following features: voiced versus voiceless fricative in have/has/used; the quality of the /t/ sound as ‘full’, ‘reduced’ (lenition) or ‘zero’ (elision); the final vowel as /ʊ/ or schwa. Evidently, these categories are somewhat stipulated as none of the variables is per se categorial. Fricative (de-)voicing, /t/-lenition and vowel centering are all gradient phenomena. Our aim, however, is to find pronunciation variants, articulatory prototypes, which requires categorization. Moreover, the nature of the data often does not allow for a more fine-grained acoustic analysis (due to background noise, overlapping talk, etc.).

Both authors listened to all of the tokens for auditory impression. Where there was disagreement about the annotation of a token, we resolved this through acoustic analysis, where possible, using Praat (Boersma and Weenink 2014). [5] In these cases, a /t/ was coded as ‘reduced’ if it was voiced or the closure time was clearly less than 40 ms, a criterion given by Shockey (2003: 29) for identifying a /t/-flap. The final vowel was compared to a clear /ʊ/ and, where needed, a clear schwa from the same speaker on the first two formant measures (F1 and F2).

There is a wide array of variation in the data, as can be expected in spoken language. It should be kept in mind that these tokens occur in natural speech situations, where listeners will draw on the context to understand or reconstruct what they (don’t) hear (cf. Marslen-Wilson and Welsh 1978; Pitt et al. 2011: 310). Table 1 provides a summary of the variants found in the data, based on the above features. We should note in addition that in some cases (14 of have to, 1 of used to) the final vowel seemed practically reduced to zero; these are here subsumed under ‘final [ə]’.

Table 1:

Realizations of have to, used to, trying to and need to in the Santa Barbara Corpus.

have to used to trying to need to
voiced fricative

([v], [z])
full /t/ final [ʊ] 68 1 13 47
final [ə] 55 1 4 28
reduced /t/ final [ʊ] 4 2 41 4
final [ə] 23 3 23 17
elided /t/ final [ʊ] 4
final [ə] 17 1 21
devoiced fricative

([f], [s])
full /t/ final [ʊ] 89 39 (variable ‘fricative’ does not apply to trying to and need to)
final [ə] 97 18
reduced /t/ final [ʊ] 2
final [ə] 2 9
elided /t/ final [ʊ] 1
final [ə]
TOTAL 356 76 106 96

While Table 1 lists the realizations of fricative, plosive and final vowel in all possible combinations, it is clear that there are different preferences for the different items, and that some combinations are more likely than others. Thus, the tendency to devoice the fricative is more pronounced in used to than in have to, which reflects a general tendency to devoice /z/ in English (cf. Shockey 2003: 30). Reduction of /t/ is clearly more frequent in trying to and need to than in have to and used to, and /t/-elision is relatively rare overall; we treat /t/-reduction and /t/-elision as different grades of lenition rather than different processes (cf. Raymond et al. 2006). With have to, /t/-lenition and elision almost only occur with a voiced fricative and final schwa (i. e. realization as “havda” or “hava”). Note that these forms are analogous to established contractions such as gonna, wanna. In have/used to, a preceding voiceless fricative inhibits /t/-lenition. The fricative, on the other hand, may be devoiced in assimilation to the voiceless plosive /t/ – when the /t/ is reduced, this assimilation cannot take place. Thus, a devoiced fricative rarely comes in combination with reduced /t/ in have to, and in used to most instances of a voiced fricative also show a reduced /t/. Based on these considerations, we derive from the data the variants listed in Table 2. We will return to these when discussing the results of our analyses below. Impressionistic written representations are given in quotation marks, which will serve as short-hand references to the variants.

Table 2:

Set of variants of have to, used to, trying to and need to.

For have to, used to
Citation form: [v]|[z] - [t] - [ʊ] (“have to”, “used to”)
Devoiced fricative: [f]|[s] - [t] - [ʊ] (“haf to”, “uSed to”)
Final schwa: [v]|[z] - [t] - [ə] (“have ta”, “uzed ta”)
Devoiced fricative+schwa: [f]|[s] - [t] - [ə] (“hafta”, “uSeta”)
/t/-lenition+schwa: [v]|[z]/[s] - [ɾ]/∅ - [ə] (“havda”/”hava”, “uzda”/”uSeda”)
For trying to, need to
Citation form: [t] - [ʊ] (“trying to”, “need to”)
Final schwa: [t] - [ə] (“tryin ta”, “need ta”)
/t/-lenition: [ɾ] - [ʊ] (“tryindu”, “needu”)
/t/-lenition+schwa: [ɾ] - [ə] (“tryinda”, “needa”)
/t/-elision+schwa: - [ə] (“tryna”)

The next thing we ask is which conditions affect the realization of these items and how. From this, we will make inferences about the cognitive status of the variant forms. We consider five factors that have been associated with reduction and that cover articulatory, prosodic and speech-external aspects. The effects of these factors on the realization of the different items will be tested by means of statistical effect estimation in Section 4.

3.1 Speech rate

Speech rate is an important factor with respect to reduced forms. We consider that a form that is produced only or predominantly in rapid speech is an outcome of articulatory reduction, and not the target of the planned production of a stored variant. Two measures of speech rate are employed: item duration and environment speech rate. The first is on the target items themselves, that is, the duration of the sequence have/used/trying/need to. This measure holds the danger of circularity, in that an item may be reduced because it is uttered rapidly, or an item may have shorter duration because it is phonologically reduced. Therefore, we will present the durational differences between the variant forms but use the second measure in the further analysis. This second measure is the speech rate on the environment, that is, the conversational turn in which the item occurs, excluding the item itself. Thus, the specific phonological form of the item is irrelevant to this measure. Our basic assumption is that rapid speech environments promote articulatory reduction, whereas the repeated occurrence of a reduced form in slow speech indicates that this form is more firmly represented in the speaker’s mind. In our analysis we employ the measure of syllables per second (syll/sec) [6] and use logarithmized values in the regression models. The mean environment speech rates are 5.98 syll/sec in the have to/used to data set, and 5.56 syll/sec in trying to/need to.

3.2 Stress accent

Another prosodic factor of reduction is the degree of accentuation (stress) a syllable receives. Reduction typically occurs on less accented syllables (Greenberg et al. 2002; Shockey 2003: 22; Raymond et al. 2006). In V-to-Vinf constructions, to is always unaccented but may form a unit with the preceding verb (which in turn always receives at least a light accent). Stress accent on this verb is therefore a potentially relevant factor of the realization of the unit (consider We HAVE to go versus We have to GO). Stress accent is gradient, but cannot easily be quantified since it is based on a combination of acoustic cues such as pitch, duration and amplitude (Greenberg et al. 2002: 351; Wichmann 2011: 334). Largely following Greenberg et al.’s (2002) procedure, we have labeled the stress accent by two categories, ‘heavy’ and ‘light’. All tokens were annotated by both researchers separately (initial agreement rate: 75.2 %), and unclear cases then revisited. Overall, ‘heavy’ and ‘light’ accents are almost evenly distributed (56 % ‘heavy’ in have to/used to, 50 % in trying to/need to).

3.3 Following sound

Another speech-related factor is the following sound, that is, the first sound of the verb (or adverb) directly following to. In the V-to-Vinf construction the particle to is originally an infinitive marker linked to the following verb; to-contraction then involves the fusion of to with the preceding item and thus its detachment from the following verb. The effect of the following sound can indicate to what extent the realization of to is phonetically conditioned. As this phonetic conditioning is largely a matter of articulatory ease, stored variants in which to is fused to the preceding verb will be less affected by it. Moreover, a following vowel or pause has been associated with fuller forms (Fox Tree and Clark 1997) – specifically, the function word to has shown less reduction of /ʊ/ to schwa before vowels (Jurafsky et al. 1998), and word-internal /t/-deletion is rare before pauses (Raymond et al. 2006: 71). We categorize the sounds by place of articulation, using four categories: labial/dental (comprising bilabial, labiodental and interdental), alveolar, velar and ‘other’ (comprising /h/, vowels and pauses).

3.4 Speech situation

It is known that reduction is often a phenomenon of informal language, whereas formal registers favor full citation forms. Hollmann and Siewierska (2011: 49) point out “the possibility that certain reduced variants index social identities, and that this may have an impact on the facts of reduction in a given variety”. Even the conventionalized to-contractions gonna, gotta and wanna, while being the default option in spoken language, still carry a connotation of informality and colloquialness (Boas 2004; Lorenz 2013a: 184). We take up the formality of the speech situation as a factor in order to assess if such an informality feature is assigned to any specific variant of the V-to-Vinf types investigated here. Based on the file descriptions in the SBC documentation, we defined three categories of speech situation, ‘private’, ‘professional’ and ‘public’. ‘Private’ comprises conversations (usually face-to-face) among friends or family; conversations or task-related talk in a professional setting (e. g. among colleagues or business partners) were labeled as ‘professional’; the category ‘public’ contains organized public talks or discussions. Out of the 60 recordings in the corpus, 34 fall into the category ‘private’, 13 are ‘professional’ and 13 ‘public’.

3.5 Speaker’s age

In order to account for possible changes in the variations, we take up the speaker’s age as a measure of apparent time. If realization preferences change over the generations, this would indicate that a form is emerging as a stored pronunciation variant. The Santa Barbara Corpus provides the speakers’ age and the dates of the recordings. From this information we derived the year of birth for each speaker. The range is from 1903 to 1984, the mean birth year is 1955. For 15 speakers in our data set, age information is not provided.

4 Results

We are essentially dealing with two different sets of V-to-Vinf types that have different preconditions for their realization: Those that involve a fricative (have to, used to) and those that involve a ‘t-flapping environment’ (trying to, need to). We will therefore analyze these two sets separately.

We have suggested above that a large part of the observed data may fall into discrete variants, which show different degrees of reduction. Considering the durations of the tokens, it becomes clear that, expectedly, the more reduced variants tend to also be shorter in time, and this holds for /t/-lenition as well as centering of the final vowel. In contrast, the variants with a devoiced fricative take longer to produce. The mean durations are presented in Figures 2 and 3, with error bars for 95 % confidence intervals. The realization variants are those listed in Table 2 above, thus the very rare combination of /t/-lenition+final /ʊ/ in have/used to (with only nine tokens overall) is not included.

Figure 2: 
Duration of realization variants of have to and used to.
Figure 2:

Duration of realization variants of have to and used to.

Figure 3: 
Duration of realization variants of trying to and need to.
Figure 3:

Duration of realization variants of trying to and need to.

Now we will take a step back from the categorical view of variants and investigate the variation at the phonological level. We independently test the variation of each of the three phonological variables that are involved here, each taken as a dichotomous dependent variable:

Fricative: voiced [v] vs devoiced [f]

Plosive: full [t] vs t-lenition/elision

Final vowel: [ʊ] vs [ə]

These are tested for the effects of the five independent variables introduced above (speech rate, stress accent, following sound, speech situation, speaker’s age). We will present a multifactorial model for have to, which provides enough data for this approach. The phonologically similar yet less frequent used to is then compared by univariate tests of the same factors. Likewise, the data for trying to and need to is then presented by way of comparison, with statistical testing for individual factors.

4.1 Results for the types have/used to

We now take a closer look at the factors of variation for each of the three realization variables defined above. The following analyses of have to are based on 350 tokens, namely the ones that provide at least two syllables of context and the age of the speaker. Differences between the types (have to vs used to) are considered by including the univariate tests on used to.

In our analysis of have to, logistic regression modeling is applied for the purpose of effect estimation (cf. Harrell 2015: 98–99), that is, for testing the effects of a pre-defined set of factors (as outlined in Section 3), rather than attempting to ‘predict’ the variation as a whole. The factors were tested for interaction effects and relevant interactions taken up in the models. For used to, a meaningful multifactorial modeling is not feasible with the available data, firstly due to the small overall token number, and secondly due to the scarceness of some variant forms (e. g. a voiced fricative occurs only 8 times); we therefore revert to univariate statistical tests for each factor in turn. [7]

Tables 3 to 5 provide an overview of the three logistic regression models for have to hand in hand with the univariate screenings for used to. These tables deal, respectively, with fricative devoicing, /t/-lenition and final vowel centering. The logistic regression models are summarized by means of the coefficients, standard errors and p-values with significance rating. [8] We discuss the results for each variable in turn.

Table 3:

Logistic regression models for fricative devoicing in have to (left; ref. level=‘voiced’) and univariate comparison of used to (right).

Fricative devoicing in have to and used to
have to: logistic regression model used to: univariate screening
Coef. S.E. p Test result p
Intercept 1.413 0.843 0.093 .
speech rate speech rate t=–0.702;
(continuous) –1.281 0.454 0.005 ** Welch t-test df=9.17 0.500
stress accent

(ref. level=‘light’)
stress accent
heavy 1.349 0.260 <0.001 *** Fisher’s exact test odds ratio=0.943 1.000
following sound

(ref. level=‘alveolar’)
following sound
labial/dental 0.236 0.314 0.453 Fisher’s exact test (on 2x4 table) 0.919
velar 0.176 0.382 0.645
other 0.431 0.345 0.212
speech situation

(ref. level=‘private’)
speech situation

(‘private’ vs other)
professional –0.025 0.298 0.934 Fisher’s exact test odds ratio=1.403 0.653
public –0.396 0.533 0.458
year of birth year of birth
(continuous) 0.016 0.014 0.235 Wilcoxon rank sum test W=298.5 0.310
Interaction

stress accent * year of birth
heavy * year –0.042 0.018 0.019 *
model statistics p (χ 2) < 0.001; CC=0.729;

D xy=0.458

Beginning with the fricative variable (voiced /v/, /z/ vs devoiced /f/, /s/), the model for have to overall is significantly better than chance, but not very strong in predictive power (p(χ 2) < 0.001, C=0.729, D xy=0.458). The fricative has a slight tendency to be devoiced (in 53 % of cases), as the intercept reflects. Of the factors considered here, speech rate and an interaction of stress accent and speaker’s age significantly determine the variation for have to. Fricative devoicing is less likely in rapid speech, thus assimilation of the fricative predominantly occurs in slow speech. It is also more likely when a heavy stress accent is on have; this effect, however, declines with increasing year of birth, that is, it holds for older speakers but not for younger ones. It should be noted that overall, there is no change in the frequency of fricative devoicing across generations. Devoicing also does not appear to be conditioned by speech situation or the sound following have to.

In used to, fricative devoicing is much more common (89 %). This leaves only eight tokens with a voiced /z/. None of the effects found for have to can be confirmed on this data.

The model for /t/-lenition in have to is shown in Table 4 (p(χ 2) < 0.001, C=0.757, D xy=0.514). Overall, /t/-lenition is relatively rare in have to (13 %), hence the strong negative intercept. The significant main effects in the model are for stress accent, ‘other’ following sounds and ‘professional’ speech situations. Thus, /t/-lenition is less likely when have is heavily stressed and when have to occurs before pauses or vowels. It is also dispreferred in the more formal, careful speech of professional settings (which, somewhat surprisingly, does not extend to public speech). Speech rate exerts an influence in interaction with stress accent: rapid speech favors /t/-lenition in stressed items but has no influence when stress accent is light. Lastly, there is no indication of ongoing change (factor ‘year of birth’) for have to.

Table 4:

Logistic regression models for /t/-lenition in have to (left; ref. level=‘full’) and univariate comparison of used to (right).

/t/-lenition in have to and used to
have to: logistic regression model used to: univariate screening
Coef. S.E. p Test result p
Intercept –1.214 1.42 0.393
speech rate speech rate
(continuous) 0.126 0.76 0.868 Welch t-test t=–2.180;

df=24.01
0.039 *
stress accent

(ref. level=‘light’)
stress accent
heavy –5.596 2.48 0.024 * Chi -squared test χ 2 =6.919; df=1 0.009 **
following sound

(ref. level=‘alveolar’)
following sound
labial/dental –0.782 0.44 0.076 . Fisher’s exact test

(on 2x4 table)
velar 0.156 0.46 0.736
other –1.768 0.67 0.009 ** 0.628
speech situation

(ref. level=‘private’)
speech situation

(‘private’ vs other)
professional –1.481 0.63 0.019 *
public –0.123 0.83 0.882 Fisher’s exact test odds ratio=1.337 0.732
year of birth year of birth
(continuous) 0.006 0.01 0.658 Wilcoxon rank sum test W=384 0.765
Interaction

speech rate * stress accent
speech rate * heavy 2.903 1.340 0.030 *
model statistics p(χ 2) < 0.001; CC=0.757;

D xy=0.514
Table 5:

Logistic regression models for final vowel reduction in have to (left; ref. level=‘full’) and univariate comparison of used to (right).

final vowel reduction in have to and used to
have to: logistic regression model used to: univariate screening
Coef. S.E. p Test result p
Intercept –0.409 0.79 0.606
speech rate speech rate
t=–0.640;
(continuous) 0.523 0.42 0.213 Welch t-test df=58.74 0.525
stress accent

(ref. level=‘light’)
stress accent
heavy 0.092 0.24 0.707 Chi-squared test χ 2=7.984; df=1 0.005 **
following sound

(ref. level=‘alveolar’)
following sound
labial/dental –1.044 0.30 0.001 *** Chi-squared test χ 2=3.694; df=3 0.297
velar –0.133 0.36 0.715
other –0.075 0.33 0.819
speech situation

(ref. level=‘private’)
speech situation

(‘private’ vs other)
professional 0.112 0.28 0.692 Chi-squared test χ 2=1.626; df=1
public 0.241 0.50 0.627 0.202
year of birth year of birth
(continuous) 0.003 0.01 0.728 Wilcoxon rank sum test W=773.5 0.019 *
model statistics p (χ 2) < 0.017; C=0.631;

D xy=0.261

In the case of used to, only speech rate and stress accent have a clear effect on the form of the /t/ sound: /t/-lenition is more likely in rapid speech and when stress accent on used is light.

The statistical model for reduction to schwa of the final vowel in have to is weak (p(χ 2)=0.017, C=0.631, D xy=0.261) and the only, but very strong effect is that of following labial/dental sounds, which disfavor vowel reduction. This confirms the impression that the realization of the vowel in to is quite variable and rather gradient. The effect of following labial/dental sounds is perhaps really an assimilation effect. In strings like have to work, anticipation of the lip rounding in /w/ can render the vowel as [ʊ] rather than [ə]. Vowel realization is not conditioned by any of the other factors in the case of have to. Used to behaves differently as regards the reduction of the final vowel, showing significant effects of stress accent and year of birth. Thus, schwa reduction is avoided when used is heavily stressed, and younger speakers are more likely to produce a final schwa in used to.

Returning to the variants listed in Table 2 above, we can see how these are aligned with respect to environment speech rate and stress accent. Figures 4 (speech rate) and 5 (stress accent) show that variants with a devoiced fricative are aligned with slow speech and a heavy stress accent; in used to, stressed items also tend to have a final [ʊ]. The forms with /t/-lenition largely occur in rapid speech and are more frequent where the stress accent is light.

Figure 4: 
Environment speech rate of realization variants of have to and used to.
Figure 4:

Environment speech rate of realization variants of have to and used to.

While /t/-lenition usually comes with a voiced fricative, there are also the few cases of a devoiced fricative and lenited /t/ (see Table 1; predominantly the form “uSeda”). These 13 items show a mean environment speech rate of 6.96 syll/sec, a clear increase from the grand average of 5.98 syll/sec, and even from that of the ‘/t/-lenition’ group (6.44 syll/sec). The differences shown in Figure 4 suggest that any form with a voiced fricative (including the citation form [v]-[t]) is already a product of accelerated speech. It appears, then, that very high speech rates can produce /t/-lenition in spite of a devoiced fricative and in spite of heavy stress accent (recall the interaction effect on have to in Table 4 above).

4.2 The types trying to/need to in comparison

For the realizations of trying to and need to, the question of fricative devoicing obviously does not apply, but the situation regarding /t/ and the final vowel is more complex (viz. the frequent form “trying [ɾʊ]” and frequent /t/-elision in “tryna” which has no equivalent in need to, see Table 1). The token counts are clearly much lower than for have to. In order to compare the cases, we will therefore present univariate tests of those factors that have shown relevant effects on have to and used to.

The types trying to and need to present a useful case for comparison, since they provide the environment typical of /t/-flapping in American English (preceding vowel or nasal). Consequently, the rate of /t/-lenition is high, at least in trying to (60.4 %). A reduced /t/ in need to (“needa”) is rarer, as is /t/-elision in trying to (“tryna”). These forms have very similar relative frequencies (“needa” 21.9 %, “tryna” 23.6 %).

In the factor speech rate we see another effect of the ‘/t/-flapping environment’ in trying/need to as opposed to have/used to. There is no statistically significant effect of speech rate on either /t/-lenition or vowel reduction in trying to or need to. [9] There is, however, a trend for the most reduced variants (“needa”, “tryna”) to occur in faster speech (see Figure 6).

Figure 5: 
Stress accent and realization variants of have to and used to.
Figure 5:

Stress accent and realization variants of have to and used to.

Figure 6: 
Environment speech rate of realization variants of trying to and need to.
Figure 6:

Environment speech rate of realization variants of trying to and need to.

For the following sounds, the data on trying/need to is too sparse to usefully account for the separate categories. We only present a comparison of realizations before pauses or disruptions, which have been shown to disfavor /t/-lenition in have/used to above. Here we can observe a similar trend with trying/need to, albeit on a low level, as there are only a few tokens of this case. As Figure 7 shows, /t/-lenition (as well as elision) is extremely rare before speech disruptions in have to, used to (see Table 4 above) and need to, whereas in trying to a following disruption only disfavors elision but not lenition of /t/. That is, the form “tryin’ [ɾʊ]” freely occurs in this context which otherwise disfavors reduction of any kind.

Figure 7: 
/t/ realization in have/used to and trying/need to with and without following speech disruption.
Figure 7:

/t/ realization in have/used to and trying/need to with and without following speech disruption.

Since more formal (in particular ‘professional’) speech situations were found to inhibit /t/-lenition in have to and used to, we also test for this on trying to and need to. The distributions are presented in Figure 8. There appears to be a slight trend for fewer reduced realizations in ‘professional’ and ‘public’ settings, but the differences are not statistically significant in either set (Fisher’s exact test; p(trying to)=0.2619, p(need to)=0.0801). Thus, from what can be gleaned from the data at hand, /t/-lenition in trying to and need to does not generally carry a sense of casualness or informality.

Figure 8: 
/t/ realization in trying to and need to by speech situation.
Figure 8:

/t/ realization in trying to and need to by speech situation.

Stress accent also affects the realization of trying to and need to (see Figure 9). The form “tryna” (/t/-elision) makes up 34 % (16/47) of tokens with light stress but only 15 % (9/59) of those with heavy stress accent. The effect of stress accent on realization of trying to is on the threshold of statistical significance (χ 2=5.808, df=2, p=0.055). With need to, the reduced form (“needa”) occurs at a rate of 8 % (3/39) in heavy stress accent, and 33 % (18/36) in light stress. This difference is statistically significant (χ 2=8.012, df=1, p=0.005). Thus, in both trying to and need to heavy stress accent disfavors reduction; again, it is /t/-elision in trying to and /t/-lenition in need to that are subject to this effect, while /t/-lenition in trying to is less affected.

Figure 9: 
/t/ realization in trying to and need to by stress accent.
Figure 9:

/t/ realization in trying to and need to by stress accent.

5 Discussion

We discuss the above results with respect to the three phonological variables in turn – fricative devoicing, /t/-lenition and realization of the final vowel.

The most common realization of have to and used to is with a devoiced /v/ or /z/, that is, with a fricative that is assimilated to the following voiceless /t/. Phonetic assimilation typically occurs within a word rather than across a word boundary. The frequency of assimilation of the fricative in have to and used to is as expected if speakers treat these sequences as single units. [10] The finding that fricative assimilation in have to and used to is most prominent in slow speech indicates that the forms with [f/s] are firmly entrenched pronunciation variants. These variants, however, are used irrespective of the speech situation and thus show no evidence of indexing a social or register category such as the ‘colloquialness’ feature attributed to wanna (Boas 2004). We consider that have to should be more susceptible to register variation, since the modality of obligation/necessity is strongly tied to the relation between speaker and hearer in the given situation. In this respect, have to is the neutral variant in contrast to the less formal, more subjective have got to/gotta and the more formal, authoritative must (Myhill 1996; Tagliamonte and D’Arcy 2007). The main dividing lines seem to run between these items, not between pronunciation variants, as social connotations of the contraction gotta versus full got to are likewise hard to find in spoken language (Lorenz 2013a). If there is a register distinction between “have to” and “haf to”, it is more subtle than what can be inferred from the data presented here.

The variants “haf to” and “uSed to” are an outcome of coalescence – once a sequence like have to or used to is consistently perceived as a single unit, the assimilation of the fricative is a natural phonetic consequence. Interestingly, in the case of trying to and need to, the same argument can be made for /t/-lenition. Like fricative devoicing in have/used to, /t/-lenition is a natural phonetic consequence of fusing trying to or need to into a single unit. Accordingly, this reduction is not (or only very weakly) tied to rapid speech and not restricted to informal speech situations.

We have assumed lenition of /t/ to be the strongest indicator of coalescence in a V-to-Vinf sequence, yielding a contracted item by Broadbent and Sifaki’s (2013) definition (e. g. “havda”, “uzda”). Despite the high frequency of have to and used to (promoting their coalescence) and despite the analogy to gotta and wanna, the phonological ‘rule’ that a preceding fricative is not an environment for /t/-lenition (specifically, flapping) still has a strong impact on the pronunciation of these items, such that forms like “havda” and “uzda” are rare and rather restricted. This dovetails with Pitt et al.’s (2011) finding that both speakers and listeners are aware of the context-dependent variation in /t/ realization. It is noteworthy in this respect that the restrictions on /t/-lenition are different for have to/used to, trying to and need to, in accordance with the different phonological contexts they provide. In have to and used to, where the context inhibits /t/-lenition, it is conditioned by speech rate and stress accent, avoided before pauses and associated with informal speech situations. It thus shows the symptoms of on-line articulatory reduction occurring in a fast flow of speech, and it tends to occur in situations where speakers may be less careful about their pronunciation (i. e. in private conversations). In trying to and need to, on the other hand, lenition occurs across speech rates and speech situations. However, the favoring of /t/-lenition crucially depends on the coalescence of the sequence, such that /t/ is not perceived as being at the beginning of the word to but in the middle of the (single) item trying to or need to. In light of our results, this coalescence appears as a common routine in usage, specifically of trying to. Yet it takes an extra step to reduce the /t/ to zero (rendering [traɪnə]/[ni:ɾə]), which obscures the underlying morphological structure (i. e. the presence of to). This reduction is more restricted. Firstly, in terms of speech rate, there is a gradient towards faster speech from /t/-lenition to /t/-lenition + schwa to /t/-elision (see Figure 6 above). Secondly, before pauses, where articulatory reduction is not to be expected, /t/-lenition in trying to (“tryindu”) occurs frequently, but /t/-elision (“tryna”) and lenition in need to (“needa”) are virtually absent. Thirdly, these reduced forms very rarely occur with a heavy stress accent. We conclude from this that /t/-lenition is firmly entrenched in language users’ mental representation of trying to, whereas lenition in need to and elision in trying to are less entrenched and rather tend to occur by articulatory reduction.

The final vowel is the least clear-cut of our dependent variables. Our categorization (/ʊ/ vs /ə/) is somewhat imposed upon the data, vowels being notoriously variable. Yet for this distinction we find a straightforward context effect with the following sound of have to. The vowel adapts to following labial sounds by tending towards /ʊ/. The effects on the final vowel in used to (stress accent and year of birth) are somewhat inconclusive. We may assume that the final vowel has little import on the mental representation of pronunciation variants, which we discuss further in the next sections.

5.1 Pronunciation variants and articulatory reduction

Returning now to the different pronunciation variants of have to, used to, trying to and need to (as listed in Table 2 above), the following picture emerges. We find pronunciation variants that indicate coalescence but are not fully contracted: “haf to”/“hafta”, “uSed to”/“uSed ta”, “tryindu”/“tryinda”. They follow from general phonological processes (assimilation, /t/-lenition) and are highly frequent; judging from our data, they are entrenched in language users’ representations and may be targeted in production. [11] We also find reduced forms that qualify as full contractions but that occur as outcomes of on-line articulatory reduction: “havda”/“hava”, “uzda”, “tryna”, “needa”. These forms have a lower degree of entrenchment and are not or rarely targeted in production. We propose the following account of these sets of variants. Language users tend to avoid unnecessary articulatory effort in speech production (cf. e. g. Lindblom 1990); however, this drive for ease of articulation is always balanced against the listener’s need for explicitness (cf. Beckner et al. 2009: 16). Therefore, speakers will initially (and frequently) make articulatory adaptations (assimilation or reduction) that are conservative in terms of transparency. These are adaptations that signal coalescence and reduce the articulatory effort for the speaker but still allow morphological parsing of the structure on behalf of the listener. They do not require the listener to have a single-unit (coalesced) representation of the string, as the individual elements can still be identified. This rings with accounts of complex words in morphology, where “the whole takes precedence over its parts” (Hay and Baayen 2005: 344), but the compositional parts are still activated (Marslen-Wilson 2001). The resulting pronunciation variants are relatively unconstrained in their use, such as “haf to”/“uSed to” (devoiced fricative) and “tryinda” (/t/-lenition in flapping environment). In contrast, reductions that obscure the original morphological structure can only be communicatively successful if the listener has and applies a non-compositional representation and hence does not expect individual morphemes to be discernible. These reductions are (partly) constrained by speech rate, speech fluency, stress accent or speech situation and thus lead to reduced forms that are restricted in use, such as “havda”, “hava”, “uzda”, “tryna”, “needa”. [12] Thus, when “boundaries are blurred” (Diessel 2007: 116) in coalescence, it is due not only to frequency but to other reduction-favoring conditions as well.

These reduced forms are contractions, analogous to the well-established gonna, wanna, gotta – the difference is that the latter have become emancipated and have largely outgrown the constraints on their use (Lorenz 2013a). It is conceivable for “havda”/“hava”, “uzda”, “tryna”, “needa” to take a similar path, if they get perpetuated and increase in frequency and entrenchment strength. However, our data show no sign of such developments. Lorenz (2013a: 232–235) proposes a diachronic model of contraction and emancipation, in which reduced forms proceed through the stages ‘on-line phonetic reduction’, ‘on-line morpho-phonological fusion’, ‘stored pronunciation variant’, ‘stored lexical variant’ (see Figure 1 above). In this model, we may place forms like “hava” and “tryna” at the stage of ‘on-line morpho-phonological fusion’, where “due to the chunking of a sequence, a morphologically non-transparent form becomes available, but is not yet conventional at this stage” (Lorenz 2013a: 234), and verging on ‘stored pronunciation variant’, where language users have a stored representation of the form.

5.2 Analogy and the network of variants

We asked if there was a common pattern of reduction and contraction across these four types of the V-to-Vinf construction. Hollmann and Siewierska (2011), for instance, find a common pattern of reduction of the definite article following prepositions, which leads them to suggest a constructional schema that licenses these reductions. In the present case, the different phonological conditions have evidently led to different strategies of reduction or assimilation. Thus, we do not find a common pattern that would allow us to posit a single reduction schema for these items or the V-to-Vinf construction in general. But there are commonalities, which we think should be described in terms of analogy. Both have to and used to show frequent fricative devoicing, and /t/-lenition at a low rate. It seems plausible that speakers use these forms in analogy, that is, each use of a given variant form of one type (e. g. “hafta”) also strengthens the representation of the corresponding variant of the other type (e. g. “uSedta”). If this is correct, then the strengthening of representations could also increase the likelihood of the corresponding forms of less frequent types by analogy (e. g. “lof ta” for love to). [13] This is outside the scope of the present study, and to confirm (or reject) these conjectures, more evidence would clearly be needed, as well as a better understanding of how pronunciation variants of different items are interconnected.

Another analogy worth considering is with the conventionalized to-contractions gonna, wanna and gotta. Here, “tryna” and “needa” seem to profit from the analogy. These forms show somewhat higher relative frequencies than “havda” and “uzda”; their occurrence is probably aided by their phonological properties allowing for a smooth transition from [t] to flap to zero. The analogous forms of have to would be “hava” or “haffa” – yet, the former is rare and the latter inexistent. It seems that /t/ is not easily dropped without the intermediate stage of lenition, which is inhibited in these items. The case of have to shows that this phonological hurdle can be overcome when high frequency and speech-internal factors (light stress, rapid speech) coincide; used to, being less frequent, is one step behind, such that /t/-lenition is possible, but /t/-elision is strongly avoided.

In Figure 10 we summarize the above findings in a sketch of a hypothesized network of constructional types and pronunciation variants, as derived from the results of our study. It incorporates notions from Construction Grammar, in particular that have to/used to/need to/trying to/etc. are all types of one construction (the V-to-Vinf or semi-modal construction), and connected by instantiation links. On the level of instances, it presents the attested pronunciation variants and analogy links between them; the empirical findings of the present study inform this level, which is the crucial part of the network. The items and links are presented by the strength of their mental representation. We follow the convention of using dotted arrows and rounded boxes for weakly entrenched relations and items, while solid arrows and angular boxes stand for firm entrenchment (cf. Langacker 1987; Hollmann and Siewierska 2011: 43). [14] The most frequent realization of each type is in bold face. Forms that are only distinguished by the final vowel are represented as co-variants (sharing the same ‘box’), based on our finding that the variation in the final vowel is phonetically conditioned rather than a matter of variant choice; the form “hava” is seen as a further reduction and thus a sub-variant of “havda”. Curved arrows are used to indicate analogy (and dotted curved arrows for weak analogy). The analogy relations reflect the main results of this study: that have to and used to cluster together, that the devoiced forms “hafta” and “uSedta” are entrenched variants, while “havda” and “uzda” are tied to on-line reduction, and that the conventional contractions gonna/wanna/gotta likely exert a greater influence on trying to and need to than on have to and used to due to their phonological similarities. Thus, there is a frequent and entrenched reduced variant “tryinda”; the condensed forms “tryna” and “needa” show similar usage conditions as “havda” and “uzda”, and are therefore also classified as ‘weakly entrenched’. It should be noted that this network model is an attempt at pulling together the results of a detailed corpus study and visualizing their interpretation with regard to the mental representation of variants; it involves some necessary simplification (for example, entrenchment is really gradient, not a matter of weak or full representation) as well as conjecture (e. g. on the influence exerted by gonna/wanna/gotta). The network model thus constitutes a hypothesis which may serve as a basis for future research on similar phenomena. Its merit, apart from providing a visual summary, is in proposing relations between variants of different items, which is something that has hitherto received little coverage in usage-based studies.

Figure 10: 
A hypothesized network of construction types and pronunciation variants.
Figure 10:

A hypothesized network of construction types and pronunciation variants.

6 Conclusion

This corpus study has investigated types of the construction V-to-Vinf that are frequent, but for which there are no clearly conventionalized contracted variants (such as gonna for going to). The attested realization variants of have to, used to, trying to and need to provide evidence of the coalescence of these sequences. The study shows that their realizations are not fully predictable, but also far from random. They are conditioned by an interplay of item frequency, analogy, and communicative and speech-internal factors. Phonological properties, such as the presence of a fricative in have to and used to, may inhibit contraction in spite of frequency and analogy. This leads to a few wider conclusions. One is that phonetic reduction and assimilation occur frequently, but the use of reduced forms is ‘conservative’: reduction that obscures an item’s internal morphological structure is restricted and strongly tied to speech-related factors (such as context, speech rate and stress accent). In light of language use as a speaker-hearer interaction, this is an example of how speakers retain a level of explicitness that accommodates to the listener, but will also produce more condensed and morphologically non-explicit forms under reduction-favoring conditions. For the mental representation of pronunciation variants, this means that the ‘conservative’ variants (e. g. the bold-faced ones in Figure 10) are encountered frequently and can become entrenched and stored in memory. More ‘radically’ reduced forms may enter this process only when the sequence is so strongly fused that the original morphological structure becomes irrelevant. This seems to have happened in the cases of, e. g., gonna and wanna. The present study shows that it is not the case with the analogous reductions of other semi-modals, i. e. “hav(d)a”, “uzda”, “tryna”, “needa”. Thus, the presented findings on pronunciation variation, and specifically the issue of ‘conservative reduction’, are also relevant to questions of language change – given the restrictions on their occurrence, how can strongly reduced forms make it into the conventional repertoire of a speech community? The role that factors like speech rate or stress accent may play in the diachronic change of items or constructions is a question for future research. Finally, our interpretation of the results assigns an important role to analogy relations of varying strengths. The way we derived these from our empirical findings is admittedly somewhat impressionistic. Yet we do believe that such ‘horizontal’ links between instances of different types need to be taken into account if we want to describe how (pronunciation) variants are represented in the language user’s mind.

Funding statement: Autonomous Government of Galicia (Grant / Award Number: ‘GPC2014/060’, ‘POS-B/2016/029-PR’), Spanish Ministry of Economy and Competitiveness and the European Regional Development Fund (Grant / Award Number: ‘FFI2013-44065-P’, ‘FFI2016-77018-P’).

References

Andrews, Avery. 1978. Remarks on to adjunction. Linguistic Inquiry 9. 261–268.Search in Google Scholar

Beckner, Clay, Richard Blythe, Morten H. Joan Bybee, William Croft Christiansen, Nick C. Ellis, John Holland, Ke Jinyun, Diane Larsen-Freeman & Tom Schoenemann (a.k.a. “The Five Graces Group”). 2009. Language is a complex and adaptive system. Language Learning 59(1). 1–26.10.1111/j.1467-9922.2009.00533.xSearch in Google Scholar

Berglund, Ylva. 2000. Gonna and going to in the spoken component of the British National Corpus. In Christian Mair & Marianne Hundt (eds.), Corpus linguistics and linguistic theory – papers from the twentieth international conference on English language research on computerized corpora (ICAME 20), 35–49. Amsterdam: Rodopi.10.1163/9789004490758_005Search in Google Scholar

Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999. The Longman grammar of spoken and written English. London: Longman.Search in Google Scholar

Blumenthal-Dramé, Alice. 2012. Entrenchment in usage-based theories: What corpus data do and do not reveal about the mind. Berlin: Mouton de Gruyter.10.1515/9783110294002Search in Google Scholar

Boas, Hans C. 2004. You wanna consider a constructional approach towards wanna-contraction? In Michel Achard & Suzanne Kemmer (eds.), Language, culture, and mind, 479–491. Stanford: CSLI Publications.Search in Google Scholar

Boersma, Paul & David Weenink. 2014. Praat: Doing phonetics by computer [computer program]. Version 5.4.03. http://www.praat.org/ (accessed 1 December 2013).Search in Google Scholar

Bolinger, Dwight. 1981. Consonance, dissonance and grammaticality: The case of wanna. Language and Communication 1. 189–206.10.1016/0271-5309(81)90012-4Search in Google Scholar

Broadbent, Judith M. & Evi Sifaki. 2013. To-contract or not to-contract? That is the question. English Language and Linguistics 17(3). 513–535.10.1017/S1360674313000142Search in Google Scholar

Bürki, Audrey & Ulrich H. Frauenfelder. 2012. Producing and recognizing words with two pronunciation variants: Evidence from novel schwa words. The Quarterly Journal of Experimental Psychology 65(4). 796–824.10.1080/17470218.2011.634915Search in Google Scholar

Bybee, Joan L. 2002. Phonological evidence for exemplar storage of multiword sequences. Studies in Second Language Acquisition 24(2). 215–221.10.1017/S0272263102002061Search in Google Scholar

Bybee, Joan L. 2006. From usage to grammar: The mind’s response to repetition. Language 82(4). 711–733.10.1353/lan.2006.0186Search in Google Scholar

Bybee, Joan L. 2010. Language, usage and cognition. Cambridge: Cambridge University Press.10.1017/CBO9780511750526Search in Google Scholar

Bybee, Joan L. 2013. Usage-based theory and exemplar representations of constructions. In Thomas Hoffmann & Graeme Trousdale (eds.), The Oxford handbook of construction grammar, 49–69. Oxford: Oxford University Press.10.1093/oxfordhb/9780195396683.013.0004Search in Google Scholar

Byrd, Dani. 1994. Relations of sex and dialect to reduction. Speech Communication 15. 39–54.10.1016/0167-6393(94)90039-6Search in Google Scholar

Connine, Cynthia M. 2004. It’s not what you hear, but how often you hear it: On the neglected role of phonological variant frequency in auditory word recognition. Psychological Bulletin and Review 11. 1084–1089.10.3758/BF03196741Search in Google Scholar

Connine, Cynthia M. & Eleni Pinnow. 2006. Phonological variation in spoken word recognition: Episodes and abstractions. The Linguistic Review 23. 235–245.10.1515/TLR.2006.009Search in Google Scholar

Dankel, Philipp. 2015. Strategien unter der Oberfläche: Die Emergenz von Evidentialität im Sprachkontakt Spanisch – Quechua. Freiburg: Rombach.Search in Google Scholar

Diessel, Holger. 2007. Frequency effects in language acquisition, language use, and diachronic change. New Ideas in Psychology 25. 108–127.10.1016/j.newideapsych.2007.02.002Search in Google Scholar

Diessel, Holger. 2015. Usage-based Construction Grammar. In Ewa Dąbrowska & Dagmar Divjak (eds.), Handbook of Cognitive Linguistics, 296–321. Berlin: Mouton de Gruyter.10.1515/9783110292022-015Search in Google Scholar

Du Bois, John W., Robert Engelbertson, Wallace L. Chafe, Charles Meyer, Sandra A. Thompson & Nii Martey. 2000–2005. Santa Barbara Corpus of Spoken American English, Parts 1–4. Philadelphia. www.linguistics.ucsb.edu/research/sbcorpus.html (accessed 1 December 2013).Search in Google Scholar

Egan, Thomas. 2008. Emotion verbs with to-infinitive complements: From specific to general predication. In Maurizio Gotti, Marina Dossena & Richard Dury (eds.), English historical linguistics 2006. Volume 1: Syntax and morphology, 223–240. Amsterdam: John Benjamins.10.1075/cilt.295.16egaSearch in Google Scholar

Ellis, Nick C. 2002a. Frequency effects in language processing. Studies in Second Language Acquisition 24(2). 143–188.10.1017/S0272263102002024Search in Google Scholar

Ellis, Nick C. 2002b. Reflections on frequency effects in language processing. Studies in Second Language Acquisition 24(2). 297–339.10.1017/S0272263102002140Search in Google Scholar

Ernestus, Miriam & Natasha Warner. 2011. An introduction to reduced pronunciation variants. Journal of Phonetics 39. 253–260.10.1016/S0095-4470(11)00055-6Search in Google Scholar

Fox Tree, Jean E. & Herbert H. Clark. 1997. Pronouncing ‘the’ as ‘thee’ to signal problems in speaking. Cognition 62. 151–167.10.1016/S0010-0277(96)00781-0Search in Google Scholar

Goldberg, Adele. 2006. Constructions at work: The nature of generalization in language. Oxford: Oxford University Press.10.1093/acprof:oso/9780199268511.001.0001Search in Google Scholar

Greenberg, Steven, Hannah Carvey & Leah Hitchcock. 2002. The relation between stress accent and pronunciation variation in spontaneous American English discourse. Proceedings of the International Speech Communication Association Workshop on Prosody and Speech Processing 2002, 351–354.10.21437/SpeechProsody.2002-73Search in Google Scholar

Greenberg, Steven & Fosler-Lussier. Eric 2000. The uninvited guest: Information’s role in guiding the production of spontaneous speech. Proceedings of the CREST workshop on models of speech production: Motor planning and articulatory modeling, 129–132.Search in Google Scholar

Gregory, Michelle L., William D. Raymond, Alan Bell, Eric Fosler-Lussier & Daniel Jurafsky. 1999. The effects of collocational strength and contextual predictability in lexical production. Communication and Linguistic Studies 35. 151–166.Search in Google Scholar

Harrell, Frank E. 2015. Regression modeling strategies. 2nd edition. Cham: Springer.10.1007/978-3-319-19425-7Search in Google Scholar

Hay, Jennifer B. & R. Harald Baayen. 2005. Shifting paradigms: Gradient structure in morphology. Trends in Cognitive Science 9(7). 342–348.10.1016/j.tics.2005.04.002Search in Google Scholar

Hildebrand-Edgar, Nicole. 2016. Disentangling frequency effects and grammaticalization. Working Papers of the Linguistics Circle of the University of Victoria 26(1). 1–23.Search in Google Scholar

Hollmann, Willem B. & Anna Siewierska. 2011. The status of frequency, schemas, and identity in cognitive sociolinguistics: A case study on definite article reduction. Cognitive Linguistics 22(1). 25–54.10.1515/cogl.2011.002Search in Google Scholar

Hopper, Paul & Elizabeth C. Traugott. 1993. Grammaticalization. Cambridge: Cambridge University Press.Search in Google Scholar

Inhoff, Albrecht, Cynthia M. Connine & Ralph Radach. 2002. A contingent speech technique in eye movement research on reading. Behavior Research Methods, Instruments, and Computers 34. 471–480.10.3758/BF03195476Search in Google Scholar

Jurafsky, Daniel, Alan Bell, Eric Fosler-Lussier, Cynthia Girand & William Raymond. 1998. Reduction of English function words in Switchboard. Proceedings of ICSLP-98 7. 3111–3114.10.21437/ICSLP.1998-801Search in Google Scholar

Jurafsky, Daniel, Alan Bell, Michelle Gregory & William D. Raymond. 2001. Probabilistic relations between words: Evidence from reduction in lexical production. In Joan Bybee & Paul Hopper (eds.), Frequency and the emergence of linguistic structure, 229–254. Amsterdam: John Benjamins.10.1075/tsl.45.13jurSearch in Google Scholar

Krug, Manfred. 1998. String frequency: A cognitive motivating factor in coalescence, language processing, and linguistic change. Journal of English Linguistics 26. 286–320.10.1177/007542429802600402Search in Google Scholar

Krug, Manfred G. 2000. Emerging English modals: A corpus-based study of grammaticalization. Berlin & New York: Mouton de Gruyter.10.1515/9783110820980Search in Google Scholar

Lakoff, George. 1970. Global rules. Language 46(3). 627–639.10.2307/412310Search in Google Scholar

Langacker, Ronald W. 1987. Foundations of cognitive grammar. Volume1: Theoretical prerequisites. Stanford: Stanford University Press.Search in Google Scholar

Lindblom, Björn. 1990. Explaining phonetic variation: A sketch of the H and H theory. In William J. Hardcastle & Alain Marchal (eds.), Speech production and speech modelling, 403–439. Dordrecht: Kluwer Academic Publishers.10.1007/978-94-009-2037-8_16Search in Google Scholar

Lorenz, David. 2013a. Contractions of English semi-modals: The emancipating effect of frequency. NIHIN Studies. Freiburg: Rombach.Search in Google Scholar

Lorenz, David. 2013b. From reduction to emancipation: Is gonna a word? In Hilde Hasselgård, Jarle Ebeling & Signe Oksefjell Ebeling (eds.), Corpus perspectives on patterns of lexis, 133–152. Amsterdam: John Benjamins.10.1075/scl.57.11lorSearch in Google Scholar

Marslen-Wilson, William D. 2001. Access to lexical representations: Cross-linguistic issues. Language and Cognitive Processes 16(5-6). 699–708.10.1080/01690960143000164Search in Google Scholar

Marslen-Wilson, William D. & Alan Welsh. 1978. Processing interactions and lexical access during word-recognition in continuous speech. Cognitive Psychology 63. 10–29.10.1016/0010-0285(78)90018-XSearch in Google Scholar

Myhill, John. 1996. The development of the strong obligation system in American English. American Speech 71(4). 339–388.10.2307/455712Search in Google Scholar

Palmer, Frank R. 2001. Mood and modality. Cambridge: Cambridge University Press.10.1017/CBO9781139167178Search in Google Scholar

Patterson, David & Cynthia M. Connine. 2001. Variant frequency in flap production: A corpus analysis of variant frequency in American English flap production. Phonetica 58. 254–275.10.1159/000046178Search in Google Scholar

Pellegrino, François, Christophe Coupé & Egidio Marsico. 2011. A cross-language perspective on speech information rate. Language 87(3). 539–558.10.1353/lan.2011.0057Search in Google Scholar

Pichler, Heike. 2009. The functional and social reality of discourse variants in a northern English dialect: I DON’T KNOW and I DON’T THINK compared. Intercultural Pragmatics 6(4). 561–596.10.1515/IPRG.2009.028Search in Google Scholar

Pitt, Mark A., Laura Dilley & Michael Tat. 2011. Exploring the role of exposure frequency in recognizing pronunciation variants. Journal of Phonetics 39. 304–311.10.1016/j.wocn.2010.07.004Search in Google Scholar

Pullum, Geoffrey K. 1997. The morpholexical nature of English to-contraction. Language 73. 79–102.10.2307/416594Search in Google Scholar

Raymond, William D., Robin Dautricourt & Elizabeth Hume. 2006. Word-internal /t,d/ deletion in spontaneous speech: Modeling the effects of extra-linguistic, lexical, and phonological factors. Language Variation and Change 18. 55–97.10.1017/S0954394506060042Search in Google Scholar

Rimac, Robert & Bruce L. Smith. 1984. Acoustic characteristics of flap productions by American English-speaking children and adults: Implications concerning the development of speech motor control. Journal of Phonetics 12(4). 387–396.10.1016/S0095-4470(19)30898-8Search in Google Scholar

Scheibman, Joanne. 2000. I dunno: A usage-based account of the phonological reduction of don’t in American English conversation. Journal of Pragmatics 32. 105–124.10.1016/S0378-2166(99)00032-6Search in Google Scholar

Shockey, Linda. 2003. Sound patterns of spoken English. Oxford: Blackwell.10.1002/9780470758397Search in Google Scholar

Tagliamonte, Sali A. & Alexandra D’Arcy. 2007. The modals of obligation/necessity in Canadian perspective. English World-Wide 28(1). 47–87.10.1075/eww.28.1.04tagSearch in Google Scholar

Trousdale, Graeme. 2012. Grammaticalization, constructions and the grammaticalization of constructions. In Kristin Davidse, Tine Breban, Lieselotte Brems & Tanja Mortelmans (eds.), Grammaticalization and language change, 167–198. Amsterdam: John Benjamins.10.1075/slcs.130.07troSearch in Google Scholar

Tucker, Benjamin V. 2007. Spoken word recognition of the reduced American English flap. Tucson, AZ: University of Arizona dissertation.Search in Google Scholar

Tucker, Benjamin V. 2011. The effect of reduction on the processing of flaps and /g/ in isolated words. Journal of Phonetics 39. 312–318.10.1016/j.wocn.2010.12.001Search in Google Scholar

Tucker, Benjamin V. & Mirjam Ernestus. 2016. Why we need to investigate casual speech to truly understand language production, processing and the mental lexicon. The Mental Lexicon 11(3). 375–400.10.1075/ml.11.3.03tucSearch in Google Scholar

Tucker, Benjamin V. & Natasha Warner. 2007. Inhibition of processing due to reduction of the American English flap. Proceedings of the 16th International Congress of Phonetic Sciences, 1949–1952.Search in Google Scholar

Umeda, Noriko. 1977. Consonant duration in American English. The Journal of the Acoustical Society of America 61(3). 846–858.10.1121/1.381374Search in Google Scholar

Wichmann, Anne. 2011. Grammaticalization and prosody. In Bernd Heine & Heiko Narrog (eds.), The Oxford handbook of grammaticalization, 331–341. Oxford: Oxford University Press.10.1093/oxfordhb/9780199586783.013.0026Search in Google Scholar

Zue, Victor W. & Martha Laferriere. 1979. Acoustic study of medial /t, d/ in American English. The Journal of the Acoustical Society of America 66(4). 1039–1050.10.1121/1.383323Search in Google Scholar

Published Online: 2017-03-30
Published in Print: 2024-02-26

© 2017 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 13.3.2026 from https://www.degruyterbrill.com/document/doi/10.1515/cllt-2015-0067/html
Scroll to top button