Grammaticalisation of habitual aspect in World Englishes: assessing trajectories, areal patterns and rates of change with synchronic corpora

Jakob Neels; Sven Leuckert; Arne Lohmann

doi:10.1515/cllt-2024-0064

Article Open Access

Grammaticalisation of habitual aspect in World Englishes: assessing trajectories, areal patterns and rates of change with synchronic corpora

Jakob Neels , Sven Leuckert and Arne Lohmann

Published/Copyright: August 15, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Corpus Linguistics and Linguistic Theory

Abstract

This paper presents a geographically large-scale, yet structurally fine-grained study on grammaticalisation in World Englishes. Attending to the relatively neglected domain of habitual aspect, we explore structural and areal patterns of innovation or conservatism regarding grammaticalisation and its synchronic reflexes in 13 varieties. Drawing on the International Corpus of English, we quantify usage profiles of the habitual auxiliary [use(d) to V], taking into account semantic, morphosyntactic and text-linguistic features. Properties of use(d) to are compared across varieties by means of a behavioural profile analysis employing hierarchical cluster analysis. The corpus results are mixed in that variation and cross-linguistic similarities do not seem to follow one single phylogenetic or areal trend but reveal more individual patterns, which can be partly explained by specific contact situations. The case study illustrates that one and the same synchronic grammatical variant may qualify both as a conservative relic and as an innovation as long as earlier colonial stages remain unknown. This leads us to problematise how, owing to the shortage of corresponding diachronic corpora, even instances of strongly directional types of change like grammaticalisation are open to different interpretations within competing narratives such as colonial lag, contact-induced change and the epicentre hypothesis.

Keywords: colonial lag; contact-induced change; epicentre hypothesis; grammaticalisation; habitual aspect; International Corpus of English

1 Introduction

World Englishes are characterised by a high degree of grammatical variation (see, e.g., Bohmann 2019; Botha and Bernaisch 2025; Collins 2023; Suárez-Gómez and Loureiro-Porto 2020). Variation in the tense-modality-aspect systems of World Englishes in particular has received considerable attention (e.g. Collins 2009; Hundt et al. 2020), possibly motivated by noticeable differences in their historical input varieties, i.e. British English (BrE) for most varieties and American English (AmE) for Philippine English (PhilE). However, only a handful of publications consider core grammaticalisation developments in World Englishes. This constitutes a research gap, since, as Siemund and Davydova (2017: 140) point out, “the study of contact-induced change and the study of grammaticalization can be effectively combined to account for some of their [= World Englishes] specific properties.”

The key problem in any analysis of grammaticalisation patterns in World Englishes is the shortage of diachronic corpora: How can we find evidence of grammaticalisation if all we have for many varieties is contemporary snapshots? The lack of fitting (parallel) diachronic corpora can be partly overcome through synchronic comparisons (cf. Gries et al. 2018), which seem promising for this type of morphosyntactic change in particular. Given that grammaticalisation is an incremental, strongly directional change with cross-linguistically recurrent pathways, it should in principle be possible to reconstruct historical trajectories of structural development also for poorly recorded Englishes and assess the relative degree of grammaticalisation for specific constructions. For our study, we employ one of the main resources in the study of World Englishes, the set of corpora included in the International Corpus of English (ICE; Greenbaum and Nelson 1996). Their spoken components are still often the only available representation of naturally spoken English from a given country. Other studies, such as the ones presented in Collins (2015), are, for the most part, based on comparisons of corpora with different compilation dates to allow for diachronic analysis. In the paper at hand, we analyse grammaticalisation patterns by comparing national components of ICE to each other and by comparing them to what we know from other grammaticalisation studies. Our test case is an interesting addition to previous research on grammatical aspect (and its grammaticalisation) in World Englishes. The construction in focus here is habitual [used to V]. It has attracted some studies in the contexts of BrE and AmE (e.g. Binnick 2005), but to the best of our knowledge it has been investigated only in passing for World Englishes and never with regard to its grammaticalisation.

General narratives about the structural development of postcolonial Englishes leave the grammaticalisation researcher faced with partly conflicting predictions. In particular, three recurrent narratives, i.e. hypothesised trends, from the World Englishes literature will be assessed in the present study. They include colonial lag (e.g. Hundt 2009), contact-induced innovation (e.g. Onysko 2016) and the epicentre hypothesis (e.g. Peters and Bernaisch 2022). These hypothesised trends raise various questions regarding grammaticalisation in World Englishes. For postcolonial national varieties, compared to strongly norm-providing Inner Circle varieties, above all AmE and BrE, should one expect structural patterns of conservatism or of innovation? This question holds for grammaticalisation changes not only in the qualitative sense of how many categorically different grammatical innovations (or unique historical relics) exist in different varieties, but it also holds in a more fine-grained probabilistic sense, given that grammaticalisation progresses gradually, usually along a unidirectional multi-stage cline. So when measuring probabilistic features of one particular grammaticalising construction, how conservative or advanced along the expected grammaticalisation path will each variety be? Will similarities and differences between national Englishes correspond to commonly used classifications of variety types like those by Kachru (1985) and Schneider (2007)? Or will cross-variety differences in grammaticalisation patterns appear to result from factors less global than those suggested by the big-picture narratives of colonial lag or contact-induced change, i.e. factors that are specific to the individual environment of each variety? Moreover, in line with the epicentre hypothesis, will there be areal patterns such that geographically close varieties will exhibit the most similar feature distributions for grammaticalising constructions? Examining these questions through quantitative statistical analysis, we will touch upon a growing issue in recent usage-based studies on World Englishes, as recently addressed by Hundt (2021), namely why the increasingly sophisticated statistical approaches applied in the field often fail to verify major theoretical models and narratives of World Englishes research.

The paper proceeds as follows. In Section 2, we introduce our theoretical framework with an overview of grammaticalisation and grammatical change in World Englishes. Section 3 zooms in on habitual aspect in World Englishes and [used to V] as the construction in focus. We present our data, methods and results in Section 4. Finally, we discuss and summarise our findings in Sections 5 and 6.

2 Grammaticalisation and grammatical change in World Englishes

In order to establish the theoretical framework of our study, we first introduce grammaticalisation theory in this section before commenting on grammatical variation and change in World Englishes. The first subsection summarises essential tenets in the study of grammaticalisation that are relevant for our study. The second subsection, in turn, emphasises the multifactorial nature of variation and change on the basis of previous research, since other studies have repeatedly established that there is no one-size-fits-all solution for most, if not all, linguistic developments in World Englishes.

2.1 Grammaticalisation theory

Grammaticalisation is a pervasive historical process with rich implications also for synchronic language structure. It can be defined as “the gradual unidirectional change that turns lexical items into grammatical items and loose structures into tight structures, subjecting frequent linguistic units to more and more grammatical restrictions and reducing their autonomy” (Haspelmath 1998: 344). In short, grammaticalisation is characterised by functional expansion and formal reduction. That is, meaning and form tend to co-evolve (see Bybee et al. 1994: 20). Accordingly, the natural units of this process are form–meaning pairings, which is why the theoretical framework of construction grammar (e.g. Goldberg 2006) has been particularly fruitful in the study of grammaticalisation phenomena. In line with this, we will adopt a usage-based constructionist approach when operationalising degrees of grammaticalisation for our case study (see Section 4.1).

The diachronic paths that grammaticalising constructions take can be captured by cross-linguistically valid clines. For each type of grammatical marker in the world’s languages, there are only very few possible lexical sources as well as a recurrent order of intermediate stages, which at a cognitive conceptual level tend to be linked through metonymy and/or metaphor (e.g. Kuteva et al. 2019). For example, the future-tense markers of numerous languages have been demonstrated to derive from the source concepts of either volition (e.g. will), obligation (e.g. shall) or movement (e.g. gonna) via a likely intermediate stage of ‘intention’. Analogy is widely recognised as another key mechanism, directing change in newly grammaticalising constructions based on perceived similarities to established grammatical constructions (e.g. Fischer 2008). Existing constructions acting as analogical models may either be part of the same language, or they may stem from another language in cases of contact-induced grammaticalisation (Heine and Kuteva 2020). Crucially, grammaticalisation is a strongly directional change; its irreversibility is a very robust statistical language universal (Hopper and Traugott 2003: Ch. 5).

For synchronic linguistic systems, the nature of grammaticalisation entails, among other things, that grammatical categories are fuzzy rather than discrete, and that grammatical markers are often polysemous, spanning contiguous functions on an established cline. Diachronic grammaticalisation paths hence translate into implicational scales of (poly)functional coverage in synchrony (see the concept of semantic maps, as discussed in the World Englishes context by Siemund and Davydova 2017).

2.2 Grammatical variation and change in World Englishes

Since its inception, the study of World Englishes has been dominated by investigations of linguistic variation. In addition to asking how varieties of English differ from each other and their historical input varieties on all linguistic levels, studies have also put forward a range of explanations helping to answer why they differ. What has emerged from this research is that grammatical variation in World Englishes is a highly complex phenomenon that cannot be explained monofactorially, since factors at various levels of granularity may influence language choices. Key aspects to consider include processes of language acquisition (e.g. Buschfeld 2020; Williams 1987) and language contact (e.g. Sharma 2009). In particular, patterns of variation may be explained by approaches involving regional distance or proximity, such as colonial lag and the epicentre hypothesis. In the following paragraphs, we introduce these key aspects as promising candidates in the explanation of the grammaticalisation of habitual [used to V].

World Englishes are contact varieties of English by definition, and novel features that end up as part of nativised and later endonormatively stabilised varieties are frequently the result of contact. A key aspect in any analysis of language contact scenarios involves the morphological typology of the involved languages, in particular the analytic/synthetic contrast. An example of the effects of language contact repeatedly found in English varieties in contact with Sinitic languages, which are considered isolating or at least heavily analytic, is that affixation tends to be reduced in favour of analytic structures, such as the of-phrase instead of the ’s-genitive (see Heller et al. 2017). Employing L1 structures in this way constitutes what Matras and Sakel (2007) call pattern replication, i.e. the modelling of L2 structures based on a pivot structure (see (replica) contact-induced grammaticalisation in Heine and Kuteva’s 2020 framework). The acquisition of L2 features is highly complex, with various strategies at play at different acquisitional stages. However, certain processes have repeatedly been shown to be of particular importance in World Englishes contexts. In their programmatic article on bridging second language acquisition (SLA) and World Englishes, for instance, Sridhar and Sridhar (1986) stress overgeneralisation as a critical strategy: when learners acquire an L2, certain features may be expanded to grammatical contexts beyond those typical of the language in L1 contexts. As we show later, [use(d) to V] is used more flexibly in some varieties featured in ICE. A potential outcome of overgeneralisation and other SLA processes is the nativisation or fossilisation of features in a variety. The core idea behind either term is that recurring patterns of use may eventually manifest as stable patterns in a variety. The factors of contact and L1/L2 status also feature prominently in Kachru’s (1985) model of Englishes, and in this regard Kachru’s typology is relevant to change, including grammaticalisation, despite being a static model: notions such as Inner Circle (aka L1) versus Outer Circle (aka L2) Englishes provide a macro-level classification of which varieties are predicted to behave alike in terms of rates and types of change relative to another group of varieties united by patterns of contact and language acquisition as outlined above. In their large-scale survey on 72 grammaticalisation-relevant features in 52 non-standard Englishes, Kortmann and Schneider (2011) gained results that are partly compatible with Kachru’s coarse-grained typology: they found that varieties that fall under “contact Englishes”, including the more fine-grained variety types of high-contact L1, L2, as well as pidgins and creoles, exhibit most of the innovative grammaticalisation features. As a reasonable generalisation then, high-contact varieties of English may be assumed to exhibit higher rates of innovation than, and hence more substantial differences to, low-contact L1 Englishes or the group of Inner Circle Englishes as a whole.^[1]

This trend is in diametrical opposition to one of the older narratives in World Englishes research, the hypothesis of colonial lag. According to this hypothesis, which was first proposed in the context of AmE versus BrE (esp. Marckwardt 1958), postcolonial Englishes should change at a lower rate than the norm-providing variety in the original homeland. Some corpus-linguistic findings are in line with this idea, including World Englishes studies that are highly relevant to the present investigation as they deal with differential grammaticalisation: for example, Collins’ (2009) and Loureiro-Porto’s (2019) results on modal auxiliaries can be read as cases of Outer Circle Englishes following, but lagging behind, the probabilistic usage shifts observed in BrE and AmE. However, since its inception in the mid-20th century, the narrative of colonial lag has lost traction overall, as researchers using new empirical resources and methods have been accumulating what seem to be just as many counter-examples, i.e. instances of “home lag” or colonial innovativeness (see Hundt 2009 for a survey).

In addition to spreading supraregionally, a linguistic feature or bundle of features may spread beyond country borders, a process which is at the heart of the epicentre hypothesis. In this hypothesis, “[t]he term epicentre is a metaphor rather than a model for the connectivity between varieties of English in a given region” (Peters and Bernaisch 2022: 321; italics in original). At their core, “linguistic epicentres can represent regionalised transnational varietal developments […], illuminat[ing] relationships between regional varieties, and the spread of areal features from a local lead variety to others” (Peters and Bernaisch 2022: 327). A major candidate and most intensively researched area for the epicentre hypothesis is South Asia, with Indian English (IndE) as the epicentre and neighbouring varieties, such as Bangladeshi English and Sri Lankan English (SLE), changing as a result of their proximity to the much more influential IndE (see Schneider 2022 for an overview). While the concept of linguistic epicentres has gained prominence through World Englishes research only since the mid-2010s, links can be established to a long tradition of grammaticalisation research, where influential typological work has always stressed the impact of areal patterns (see, e.g., Bisang and Malchukov 2020).

3 Habitual [used to V]: What we know

Grammatical aspect in English is generally divided into the categories of perfect, progressive and habitual aspect. In World Englishes research, variation in the former two categories has been studied extensively (e.g. Davydova 2016; Hundt et al. 2020), whereas habituality has received less scholarly attention. Some peculiarities of habitual aspect in World Englishes have been documented, for example, in The electronic World Atlas of Varieties of English (eWAVE; Kortmann et al. 2020), such as the use of do as a habitual auxiliary in Caribbean creoles or extended habitual uses of progressive [be V-ing] in several contact Englishes around the globe. When it comes to used to as the most common English habitual auxiliary, however, interesting variation patterns across varieties have been overlooked so far. Before we explore the variability of the [used to V] construction in our corpus study on 13 national Englishes, this section outlines what is already known about the present-day properties and the grammaticalisation history of this construction, drawing on the following main sources: major reference grammars of English (e.g. Carter and McCarthy 2006; Quirk et al. 1985), the Oxford English Dictionary (OED) Online, Visser (1969) and Neels (2015), a comprehensive diachronic study on used to based on British and American English historical corpora.

Contemporary grammars classify used to as a marker of past habituality, i.e. a marker denoting a situation that was characteristic of an extended period of time. An additional key function of used to is conveying discontinuance, which likely also motivates the occurrence of used to in Example (1).

(1)

What was that building on the corner just past Chapel Street on the right where it used to be Lyon’s (ICE-GB: S1A-010 #1:1:A)

Indicating that an extended situation no longer obtains at the moment, used to signals a past disconnected from the present. Binnick (2005) and Hantson (2005) therefore speak of an “anti-present-perfect” function. In their papers, both authors argue that this function is becoming even more prominent than the meaning component of habitual aspect. Hantson (2005) is, however, also careful to acknowledge that the discontinuance/“anti-present-perfect” meaning has not yet been fully semanticised but is transported as a strong conversational implicature.

The synchronic polyfunctionality of used to is not unusual but in fact expected from the perspective of grammaticalisation theory, as outlined in Section 2.1 above. Neels (2015), analysing quantitative corpus data from the 15th century onwards, proposed the following cline of diachronic meaning shift of the construction [use(d) to V]:

(2)

‘to be in the habit of V-ing’ (= lexical sense)	>	‘the situation of V-ing is/was characteristic of an extended period of time’	>	‘the situation of V-ing (which was characteristic of the past) no longer obtains at present’
		(= habitual aspect)		(= anti-present-perfect)

Today’s meanings of the auxiliary used to evolved from one of the more concrete meanings of the Middle English (ca. 1100–1500) lexical verb use, in particular its sense ‘to follow a custom; to be in the habit of’ (“use, v.” OED Online). The meaning changes depicted in the cline reflect two well-known tendencies of grammaticalisation: first, the generalisation from human habits and customs to more abstract grammatical contexts of habitual aspect instantiates the process of semantic bleaching; second, the foregrounding of the implicature of discontinuance (“anti-present-perfect”) instantiates the process of pragmatic enrichment/strengthening (e.g. Hopper and Traugott 2003: Ch. 4).

At the morphosyntactic level, the used to construction underwent the typical grammaticalisation processes of decategorialisation and specialisation (Hopper 1991). It gradually lost formal behavioural properties of ordinary lexical verbs of English and became inflectionally more fixed. By the end of the 19th century, the present-tense variant of the habitual marker, use to, had become obsolete, at least in BrE and AmE. Present-participle forms disappeared as well, and so did regular infinitive forms in so far as, in dominant usage, the etymological past-tense -(e)d is now a permanent fixture, as in (3).

(3)

Didn’t there used to be deer in Richmond Park (ICE-GB: S1A-006 #230:1:B)

Note that the choice of infinitival <used to> here is that of the corpus transcriber more than that of the speaker because the difference was rendered inaudible by the historical phonological reduction from /juːzd tu(ː)/ to /ˈjuːstə/ (see Lorenz and Tizón-Couto 2017). In negation and question formation, combinations with do-support, as in (3), dominate in present-day usage. However, contemporary British and American corpora, and formal registers especially, occasionally feature minority patterns without do-support as known from canonical English auxiliaries, i.e. patterns with direct not-negation (used not to) and inversion in questions (Used SUBJ to). During the 20th century, there were prescriptive voices claiming that only such uses with do-less auxiliary syntax were correct (see Visser 1969: 1410–1419). Proscription against do-supported patterns and against the irregular infinitive <used to> seems to have caused some insecurity regarding the “proper” morphosyntactic treatment of the auxiliary used to, which is now commonly negated by never, a pattern circumventing problematic choices.

Overall, the intricate grammaticalisation history of used to has given rise to one of the more variable and idiosyncratic auxiliaries of present-day English, even within standard BrE and standard AmE alone.

4 Case study: [use(d) to V] in World Englishes

In this section, we turn to our corpus-based case study. After an overview of our data and methods, we formulate the research questions and working assumptions of our study. We then present our results first with a focus on absolute usage frequencies and then by means of cluster analyses and snake plots.

4.1 Data and methods

For our study, we analysed 13 national components featured in the International Corpus of English (ICE; Greenbaum and Nelson 1996). Eight of these represent Englishes that are widely considered Outer Circle varieties: East Africa (ICE-EA), Hong Kong (ICE-HK), India (ICE-IND), Jamaica (ICE-JA), Nigeria (ICE-NIG), Philippines (ICE-PHI), Singapore (ICE-SIN) and Sri Lanka (ICE-SL). The other five national components sample Inner Circle varieties: Canada (ICE-CAN), Great Britain (ICE-GB), Ireland (ICE-IRE), New Zealand (ICE-NZ) and USA (ICE-USA).^[2] Since there is no spoken part complementing the written section of ICE-USA in ICE itself, we included, as is common practice in the World Englishes context, the Santa Barbara Corpus of Spoken American English (Du Bois et al. 2000–2005).

Due to their easy accessibility, fairly high degree of balancing across spoken and written texts, and the range of included text types, the ICE corpora have been called “the Swiss army knife of world Englishes scholars” (Leuckert and Rüdiger 2021: 485). They are, to date, the only set of corpora that allow for a large-scale comparison of structural features in spoken and written World Englishes; many other World Englishes corpora, such as the Corpus of Global Web-Based English (GloWbE) (Davies and Fuchs 2015), tend to be limited to written contexts and often to specific registers. However, despite their usefulness, there are various limitations and caveats to the ICE family of corpora. Perhaps most importantly, the ICE corpora have been compiled at different points in time (sometimes decades apart), which introduces a diachronic dimension to their comparison and means that they are not fully representative of the varieties in their current state. In addition, as pointed out by Deshors and Götz (2020: 8), there are “differences and potential inconsistencies in corpus compilation procedures of the different ICE-components.”

We used AntConc (Anthony 2022) to extract all relevant forms of use(d) to in the ICE corpora and manually removed any false hits from the corpus output (e.g. passive be used ‘employed’ to do sth.).^[3] We also excluded the habitual, but adjectival, construction be used to sth. Moreover, as some texts are “inconsistent” in that they feature non-variety speakers, such as non-Hong Kongers in ICE-HK, we removed tokens produced by such speakers from the extracted concordances based on existing metadata. After this clean-up process, the resulting 2,338 instances of the auxiliary construction were coded for the nine variables listed in Table 1.^[4] These variables cover semantic, syntactic, morphological and text-linguistic criteria. In order to ensure inter-rater agreement, we took a random sample of 100 concordances which were then coded for all relevant categorical and nominal variables by all coders. We then calculated Cohen’s kappa, which was above 0.9 for all variables; in fact, for most variables, all coders were in perfect agreement.

Table 1:

Dependent variables and main variable levels in the analysis of the use(d) to construction.

Structural type	Variable	Variable levels	Main codes as seen in snake plots
Semantic	Subject animacy	Animate	subj_animate
		Inanimate	subj_inanimate
		Expletive (it, there)	subj_expletive
	Aktionsart	Dynamic	verb_dynamic
	Aktionsart	Stative	verb_stative
Syntactic	Negation	Do-support	neg_do
		Not	neg_not
		Contracted usen’t	neg_not_contracted
		Never	neg_never
		Affirmative clause	neg_affirmative
	Question formation	Do-support	quest_do
		Inversion without do-support	quest_aux_inversion
		Declarative sentence	quest_declarative
	Aux. combinations	Modals (e.g. used to could)	comb_modal
		Passive (e.g. used to be run)	comb_passive
		Perfect (e.g. has used to)	comb_perfect
		Progressive (e.g. used to be walking)	comb_progressive
		No multiple auxiliaries	comb_simple
Morphological	Time reference	Past	tense_past
		Present	tense_present
		Future (will use to)	tense_future
	Suffixation	Standard	morph_standard
		Surplus (e.g. didn’t used to)	morph_surplus
		Missing (e.g. I formerly use_to)	morph_missing
Text-linguistic	Mode	Spoken	–
	Mode	Written	–
	Token frequency	(Numeric)	–

The nine variables coded for reflect rich behavioural profiles (Gries 2010) of use(d) to, which were fed into a cluster analysis as described in Levshina (2015: Ch. 15). More specifically, we excluded the two text-linguistic variables of mode and token frequency from the analysis because of confounding details of corpus sampling and size and subjected the remaining seven variables to a hierarchical agglomerative cluster analysis (see Divjak and Gries 2009). The cluster analysis was conducted using the hclust function in R (R Core Team 2021). In the process, attested ratios of the variable levels were converted into a distance matrix for pairwise variety comparisons. The final cluster solution uses Euclidean distance and the merge algorithm “complete”, which we arrived at by comparing the agglomerative coefficients of competing solutions (incl. one with the merge algorithm “ward.D2”). The clusters were validated via multiscale bootstrap resampling, and the optimal number of “meaningful” clusters was determined via average silhouette width (Levshina 2015: 311–317). In short, the method makes varieties and variety bundles cluster together based on the degree of similarity of their use(d) to profiles.^[5]

A subsequent more fine-grained analysis of the resulting clusters employs so-called snake plots (again following recommendations by Levshina 2015: 313–315; see Divjak and Gries 2009). This visualisation method allowed us to zoom into and compare specific variety clusters. Essentially, the snake plots reveal how strongly specific variable levels contribute to the cluster patterns, thus highlighting distinctive structural features of the use(d) to construction in individual varieties or variety bundles.

When it comes to measuring degrees of grammaticalisation, two of the variables we coded for lend themselves to objectively represent a scale of grammaticalisation of the use(d) to construction. These are the two semantic variables of subject animacy and verb aspect/aktionsart. Recall from Section 3 that over the past six centuries the use(d) to construction has been grammaticalising along a functional cline: from habits in the literal sense, to grammatical habitual aspect, and most recently to an even more abstract ‘discontinuance’ meaning as its primary contribution. Especially from a construction grammar perspective (e.g. Coussé et al. 2018), it should become apparent how such semantic shifts manifest themselves through changes/differences in collocational patterns. Grammaticalising constructions essentially undergo a process of context expansion (Himmelmann 2004). The Middle English source of today’s use(d) to construction was confined to animate (typically human) subjects, which can actively follow a particular habit, and to dynamic verbs mostly, which qualify as habits to be performed (Neels 2015). The diachronic expansion to inanimate fillers in the subject slot is symptomatic of the semantic bleaching of use(d) to into a marker of habitual aspect; and so is the expansion to stative verbs, but stative verbs are even more characteristic of the discontinuance function of used to. Habituality is often no longer the primary meaning contribution of used to when combined with stative verbs, considering that states tend to have a long indefinite duration anyway. What typically motivates these combinations is the resulting meaning that the states no longer obtain at present. Since grammaticalisation is known to restructure linguistic categories in a highly gradual manner, novel fillers in the open slots of a construction are informative not only in terms of their first occurrences but also in terms of their increasing rates among other slot fillers. This is therefore how we measure semantic degrees of grammaticalisation of use(d) to, i.e. via the ratios of stative verbs and of non-animate subjects, including expletive it and there.

In essence, while only some of the structural variables enable a truly objective assessment in terms of “more” or “less” grammaticalised, all variables together can be used to measure overall degrees of similarities and differences regarding the probabilistic grammatical properties of the use(d) to construction across the 13 national Englishes.

4.2 Research questions and working assumptions for the corpus study

In the corpus study, we wish to investigate (i) whether synchronically World Englishes differ notably in the degree of grammaticalisation and overall usage profile of habitual [use(d) to V], and (ii) to what extent such differences pattern in ways that support any of the three general but partly conflicting trends regarding the development of World Englishes, i.e. colonial lag, contact-induced innovation and epicentres. The present section outlines how these three supposed trends translate into working assumptions for the concrete case of [use(d) to V] and how the corpus results should differ depending on which, if any, of the three trends or “narratives” outlined in Section 2.2 turn(s) out to manifest and dominate empirically.

The narratives of colonial lag versus contact-induced innovation share that they predict differential usage profiles of use(d) to based on lineage or variety type. Most generally, the five Inner Circle varieties, on the one hand, and the eight Outer Circle varieties, on the other hand, should cluster together, reflecting also the basic distinction between L1 Englishes as opposed to L2 Englishes with more potential for contact-induced transfer. Within this hypothesised pattern, lineage may still have a certain impact at a more fine-grained level, such that PhilE, for example, clusters sooner with AmE than a British-dominated L2 like IndE. More importantly, the competing narratives of colonial lag versus contact-induced innovation create opposite expectations too, however for other parts of the statistical analysis. Empirically teasing the two narratives apart in the case study will be possible as follows. The direction of the variety differences in terms of conservatism or innovativeness can be differentiated primarily thanks to the more specific analysis of the constructional semantics of use(d) to via subject and verb types, i.e. animacy and aktionsart. In addition, an inspection of grammatical structures that are categorically (aka qualitatively) unique or idiosyncratic (e.g. future-time will use to) will highlight innovativeness as well. Such structures can and will be pointed out in the analyses of snake plots. If idiosyncratic use(d) to variants unknown to the British (and American) norm are found primarily in Outer Circle (L2) Englishes rather than in other members of the Inner Circle (L1), this lends support to the narrative of contact-induced innovation.

The epicentre hypothesis is, in principle, compatible both with signs of innovativeness and with signs of conservatism. What distinguishes this hypothesised trend from the other two in the present case study is that if epicentres turn out to affect the behaviour of use(d) to the overall clustering results should (at least partly) reflect geographical proximity or the influence of a dominant variety in region, such as IndE in South Asia (see Schneider 2022 for a summary of studies on India as a potential South Asian epicentre). In our test case specifically, epicentres should thus lead to a greater similarity of the usage data from ICE-IND and ICE-SL or from ICE-NIG and ICE-EA, for instance.

4.3 Results

The results from the procedure described above are presented in this section. After an overview of absolute token frequencies (4.3.1), we present findings on the semantic degree of grammaticalisation (4.3.2), a cluster analysis (4.3.3) and a more detailed look at variety clusters and individual varieties using snake plots (4.3.4).

4.3.1 Absolute token frequencies

Early corpus-linguistic studies on grammaticalisation phenomena in national varieties of English tended to treat absolute token frequency as prima facie evidence of varying degrees of grammaticalisation. For example, Mair and Leech (2006) and Collins (2009) employ this frequentist line of thinking, demonstrating how a younger generation of grammaticalising English modals such as have to, be going to and want to have recently been on the rise in Inner Circle and Outer Circle Englishes, partly at the expense of older canonical modals such as must, shall and should (cf. the grammaticalisation principle of layering; Hopper 1991). In the same vein, absolute token frequency may serve as a rough proxy for the relative success of use(d) to in the domain of habitual aspect, relative to paradigmatic competitors such as would. By this rationale, it is worth comparing the token frequency of the use(d) to construction across varieties, although two additional, somewhat simplistic assumptions have to be made for this comparison: first, that the overall frequency by which different speech communities wish to express the concept of habituality does not differ considerably; second, that the corpora compared are alike in their balancing of text types.

Table 2 lists the token frequencies of habitual [use(d) to V] in the national components of ICE, ranking the 13 national varieties by normalised frequencies. The mean usage frequency of the construction in our data is ca. 16 occurrences per 100,000 words. The ranking in Table 2 shows some core Inner Circle Englishes, like BrE, AmE and CanE, very close to that average while a few Englishes from varying status groups deviate notably. In particular, this simple operationalisation of degrees of grammaticalisation via absolute token frequency would suggest that the use(d) to construction is most grammaticalised in NZE, JamE and IndE, and least grammaticalised in the two Asian Englishes of HKE and SinE.

Table 2:

Overall token frequencies of habitual [use(d) to V] in 13 national varieties of English.

Corpus/variety	N	Written/spoken ratio	Per 100,000 words
ICE-NZ	292	0.14	24.21
ICE-JA	235	0.18	21.93
ICE-IND	239	0.30	21.24
ICE-IRE	205	0.15	19.41
ICE-NIG	193	0.17	19.10
ICE-SL	193	0.24	16.54
ICE-EA	232	0.47	16.16
ICE-CAN	169	0.13	16.08
ICE-USA & SBCSAE	100	0.30	15.28
ICE-GB	150	0.14	13.93
ICE-PHI	131	0.27	11.51
ICE-SIN	111	0.39	10.77
ICE-HK	88	0.96	7.25

Although recent statistical modelling by Correia Saavedra (2021) has reassured that simple token-frequency measures can be a meaningful index of grammaticalisation, these measures seem not fully reliable when comparing multiple corpora that are not completely identical in their designs. Apparently, the national components differ somewhat in their inclusion of text types/topics favouring the use of habitual markers. This is tentatively suggested by the diversity of the written/spoken ratios of use(d) to across the ICE collection (Table 2). Above all, the meaningfulness of the absolute token-frequency differences is called into question by another, more sophisticated index of grammaticalisation, namely semantic context expansion, as described next.

4.3.2 Semantic degree of grammaticalisation

The shifting semantics of the auxiliary use(d) to is reflected in differences in collocational patterns. In particular, as introduced in Sections 3 and 4.1, meaning shifts from literal habits to grammatical habitual aspect and to the even more abstract meaning of discontinuance (“anti-present-perfect”) are reflected in increasing rates of non-animate subjects as well as stative verbs. Figure 1 provides a synchronic snapshot of these rates in each of the 13 English varieties under scrutiny. The plot can be read to depict a semantic space where the upper right corner represents semantically innovative, i.e. highly grammaticalised, uses and the lower left corner stands for conservative uses of the use(d) to construction. In this semantic analysis, just like in the above results on the absolute token frequencies, BrE and AmE turn out to constitute “mean” values around which the other varieties pattern. What is somewhat surprising, however, is that the ways in which the postcolonial Englishes pattern based on the present semantic grammaticalisation index are at times quite opposite to the results of the more simplistic grammaticalisation index of token frequency. Most notably, IndE and JamE are now found to be by far the most conservative varieties, whereas HKE exhibits uses of the auxiliary that are semantically most advanced, in particular regarding the verb slot: as plotted in Figure 1 and exemplified in (4), about 45 % of all verbs combining with use(d) to in HKE are stative verbs, which are associated with the foregrounded discontinuance meaning of the construction.

Figure 1:

Probabilistic semantic context expansion of [SUBJ use(d) to V] in 13 national varieties of English.

(4)

Yeah I used to like him in Top Gun (ICE-HK: S1A-071#806:1:A)

Other varieties in which use(d) to is also highly grammaticalised (semantically speaking) include PhilE and EAE, although in the latter variety the two variables of aktionsart and subject animacy seem not to be correlated in the same fashion they are in the other Englishes.

These reported differences in grammatical patterns concern probabilistic usage preference rather than discrete categorical differences. Still, as advocated in frameworks such as probabilistic grammar (Bresnan 2007; Szmrecsanyi et al. 2016), distributional differences in grammatical usage can be assumed to be mentally reflected in a probabilistic component of linguistic knowledge. And with variation across varieties as broad as, for instance, 17 % versus 45 % of stative verbs in IndE versus HKE, it is not unreasonable to assume that structural nativisation has occurred regarding the functional properties of the use(d) to construction.

Overall, the clearest trend emerging from the semantic analysis is that L2 aka Outer Circle Englishes exhibit more extreme values and differences than the Inner Circle Englishes. Being already quite apparent from eyeballing Figure 1, this contrast between the two groups of varieties manifests itself in different mean Euclidean distance measures between the varieties in each group. While the mean Euclidean distance between the L1 varieties is 0.0503, the mean Euclidean distance between the L2 varieties is 0.1437. The difference in mean distances is statistically significant based on a Mann–Whitney U-test (U = 31, p < 0.001). These grammaticalisation data hence underscore the differentiation of varieties made by Kachru’s contact-related model. However, it is also apparent from Figure 1 that the Outer Circle Englishes are scattered in multiple directions relative to the more homogeneous group of Inner Circle Englishes. Consequently, the data do not consistently align with either of the two opposing narratives, i.e. colonial lag or contact-induced innovation.

4.3.3 Clustering all ICE varieties

To the semantic features presented above, we added morphological and syntactic variables (see Table 1 above) to create fuller profiles of the characteristics of use(d) to in each national variety. Clustering the 13 varieties based on these behavioural profiles gives the picture plotted in Figure 2. In the depicted tree, the further down two varieties or variety bundles connect, the more alike they are regarding the overall usage profile of the use(d) to construction.

Figure 2:

Cluster analysis of the behaviour profiles of habitual [use(d) to V] in 13 national varieties of English; rectangles signal the optimal number of clusters.

Figure 2 shows two main high-level splits, with the split on the left-hand side creating a cluster containing IndE and JamE. This pairing confirms the similarity of these varieties that was already noticeable in the semantic analysis in Figure 1, but it does so in consideration of their larger behaviour profiles. The second split creates two further clusters. The first of these clusters pairs HKE and PhilE, which, in a similar fashion to IndE and JamE, were already close to each other in Figure 1. Finally, the right-most cluster groups all remaining varieties. This cluster contains EAE, NigE, BrE, CanE, NZE, SLE, AmE, IrE and SinE. What becomes evident by looking at this selection is that all L1 aka Inner Circle varieties are part of this cluster. While there are well-known differences in how these varieties express habitual aspect (see, e.g., Hickey 2007 on the habitual in IrE), they appear similar enough in their use of use(d) to to form a cluster. The lower-level clusters, however, represent pairs of varieties whose proximity in the cluster proves challenging to explain. None of the varieties that form low-level immediate neighbours in the cluster analysis are geographically close. The higher-order adjacency of EAE and NigE does not constitute evidence for epicentre-like influences either. Overall, while some larger tendencies are observable (esp. all L1 varieties being part of the same large cluster), differences and similarities occur at finer levels of granularity, which we investigate in the form of snake plots in the next section.

4.3.4 Comparing and zooming into variety clusters and individual varieties

While the dendrogram examined in the previous section is, by itself, a “black box” when it comes to knowing what structural features from the behaviour profiles are most distinctive, snake plots are a convenient means for adding this information. For two clusters of interest, a snake plot visually contrasts relative differences in the attested ratios of variable levels. These ratio differences show up as transparent descriptive values on the x-axis of the plot.

Figures 3 and 4 zoom in on the two small main clusters established earlier, specifically on how each of them differs from the remaining 11 varieties taken together. Most prominently, both figures reaffirm the findings on semantic degrees of grammaticalisation presented in Section 4.3.2. That is, the ratio of stative verbs is 15 % higher in HKE and PhilE than in the other Englishes on average (Figure 3). For the cluster uniting IndE and JamE, the conservatively high ratios of dynamic verbs and animate subject are shown to be the most distinctive features, creating a 13 % (aktionsart) and a 9 % difference (animacy) to the other varieties in the analysis (Figure 4). Other differences per variable turn out to be more subtle. To some extent, this means that the behaviour of the use(d) to construction does not vary that strongly across national Englishes. However, this impression is also partly conditioned (i) by some of the inter-variety differentials being levelled out in the larger bundles of 9–11 varieties and (ii) by certain grammatical environments and non-standard features being infrequent in the data overall (e.g. negating use(d) to in general, and hence also negation via usen’t contraction in particular). Among the variable levels “exposed” in Figure 4, consider the new insight that IndE and JamE are characterised by a slightly higher rate of use(d) to with non-standard morphology, specifically absent agreement or tense suffixes (+2.77 % “morph_miss”), as illustrated in (5).

Figure 3:

Distinctive variable levels of main clusters: ratio differences of HKE and PhilE versus 11 other national Englishes.

Figure 4:

Distinctive variable levels of main clusters: ratio differences of IndE and JamE versus 11 other national Englishes.

(5)

Luna use to work for this man and his wife and Luna’s sister started having a relationship with him so in a way he use to be Luna’s boss. (ICE-JA: W1B-011#22:2)

Note, however, that the percentage points of the morphology features have to be taken with a grain of salt since the absence or presence of the past-tense suffix in particular should mostly be inaudible in the phonetic context of this auxiliary (see Section 3). Occurrences in the written mode, like Example (5), help attest such morphological idiosyncrasies with certainty, but the vast majority of tokens of the use(d) to construction come from the spoken mode, where transcription uncertainties apply. Still, what is at times more noteworthy about such non-standard features than the exact percentage points is their mere existence as deviations from Inner Circle/BrE/AmE norms. Consider tense-related properties of the use(d) to construction in HKE. Whereas the construction is restricted to past-tense contexts in standard varieties, HKE exhibits present-tense (6a) as well as will-combinations (6b).

(6)

Yeah just like I I I type the Chinese word is even more faster than type the English / Because I use to type Chinese (ICE-HK: S1A-086#239:1:B)

Will you use to be more authoritative than than Martin and I … [?] (ICE-HK: S1B-033#67:1:B)

These two unusual time-reference features make up only a small percentage of usage data from ICE-HK. Still, the existence of categorically different grammatical structures is meaningful from a qualitative perspective.

In fact, several, mostly Outer Circle varieties in the data exhibit such categorically distinct but rather infrequent structures. Below, we list some examples without additional visualisations. Besides HKE, present-tense versions of the use(d) to auxiliary are attested in the ICE data also for EAE, IndE, JamE, PhilE (hapax), SinE (hapax) and, above all, in the NigE data, where 15 tokens of present-tense use to make up 7.8 % of the sample. This is still a low relative frequency, but certainly enough to consider this non-standard structure an established feature of NigE rather than isolated idiolectal “slips”. Among the Inner Circle varieties, IrE stands out as the only variety in the entire sample that negates use(d) to exclusively as usen’t (7), hence preferring direct not-negation over the do-supported negation pattern otherwise prevalent across varieties.

(7)

I used to always get on well with my Mam but I usen’t get on well with my Dad (ICE-IRE: S1A-048$B)

At face value, the presence of the contracted form usen’t can be taken to indicate an above-average degree of grammaticalisation of IrE at the morphosyntactic and phonological dimensions since this negation pattern is a “mature” structure largely restricted to canonical English modals and primary auxiliaries.

Overall, however, translating the above-mentioned non-standard features into higher or lower degrees of grammaticalisation of the use(d) to construction is no trivial task on the basis of a synchronic snapshot. When it comes to the future and/or present-tense forms attested in EAE, HKE, IndE, JamE, NigE, PhilE and SinE, different grammaticalisation tendencies could come into play that impact one’s verdict in terms of “innovative” or “conservative” features. While the expansion to more tense forms seems to nicely match the grammaticalisation tendency of context expansion (Himmelmann 2004), the presence of these forms might also be read as lagging specialisation in the sense of Hopper’s (1991) grammaticalisation principle. Therefore, a full interpretation of the present-day grammatical differences between the national Englishes can only be reached when adding available historical evidence, as will be discussed further in Section 5.3.

5 General discussion

At least for the present case study on one grammatical marker, we devised an analysis that has allowed us to tease out, if not fully evaluate, the empirical values of the narratives of colonial lag, contact-induced innovation and epicentres for grammaticalisation research on World Englishes. Essentially, we asked: which, if any, of these three common narratives account for the attested usage data on the grammatical(ising) construction [use(d) to V] most consistently?

The quantitative analysis has produced inconclusive, “mixed” results. On the one hand, the generated variety clusters are partly compatible with Kachru’s typology. In particular, a big main cluster emerged that comprises all L1 aka Inner Circle Englishes, though not exclusively so. On the other hand, the results do not consistently align with any of the three macro-level trends that have been postulated in the literature and that may shape expectations about grammaticalisation patterns in World Englishes. In our purely semantic analysis specifically, some Outer Circle varieties – above all IndE and JamE – were found to be conservative, but just as many varieties are more advanced than their colonial parents, most notably HKE and PhilE. In this sense, there is thus just as much evidence for colonial lag as there is for contact-induced innovation. Similarly, for the epicentre hypothesis, little robust evidence was found since none of the variety clusters could be predicted geographically.^[6]

In essence, big-picture narratives of change in World Englishes appear to largely fail offering adequate explanations of the attested grammaticalisation and usage patterns of the use(d) to construction. This is why in the remainder of this paper, we want to direct our discussion to the following three issues. First, in Section 5.1, we shift perspectives from a big-picture approach to a more local, fine-grained approach probing the added explanatory value of taking into account the individual sociolinguistic contexts of each variety. Second, Section 5.2 contextualises our case study within the growing concern in World Englishes research that theoretical models and statistical modelling fail to converge. Lastly, Section 5.3 stresses the importance of initiatives for new diachronic World Englishes resources, using the case of use(d) to to illustrate where reconstructions of grammaticalisation trajectories are error-prone without more historical evidence from New Englishes.

5.1 Language contact and sociolinguistic contexts

Major (potential) explanatory factors in variation are language contact and the impact of sociolinguistic variables, such as age, gender and educational level. As briefly discussed earlier, the effects of language contact do not influence a feature in isolation but, instead, are intertwined with processes of SLA, target variety input and other factors. However, a closer look at the characteristics of the key contact languages of the varieties under consideration may potentially illuminate the reasons behind the variational patterns identified above.

Contact-induced transfer could be one of the reasons behind the HKE + PhilE cluster. According to Matthews and Yip (1994: 200), aspect markers are optional in Cantonese, the dominant language spoken in Hong Kong, but there is the aspect marker hōi that may express habituality. However, “[u]nlike the [standard] English used to construction, it is not restricted to past time, and in fact is typically used of the present” (Matthews and Yip 1994: 209). The presence of this temporally flexible habitual marker in Cantonese is a reasonable source of grammatical innovation in HKE through analogical thinking and pattern replication in the bilingual mind (see Sections 2.1 and 2.2). Tagalog as the main contact language of PhilE, in turn, marks aspect on the verb but, like Cantonese, requires additional lexical material to indicate tense. While this does not reveal much about the higher frequency of stative verbs occurring with use(d) to in the two varieties, it could be one of the reasons why present-tense use to is attested in PhilE and more frequent in HKE compared to other varieties in the dataset.

The varieties in the largest cluster are closely linked mainly by sharing BrE as their input variety (AmE, CanE, IrE, NZE), being (highly) advanced in their development in Schneider’s (2007) Dynamic Model of Postcolonial Englishes (AmE, CanE, IrE, NZE, SinE) and being in contact with inflectional languages with affixal habitual markers (EAE [contact language Swahili, Mpiranya 2015: 102], SLE [contact language Tamil, Schiffman 1999]).^[7]

While language contact may explain the data to some extent, there are obvious limitations. First, comprehensive grammatical descriptions of contact languages are not always easy to find, particularly when a specific feature is in focus. Second, contact does not act in a vacuum, and its impact is notoriously difficult to quantify. However, as the examples above have shown, some impact of contact appears likely in the usage patterns of use(d) to.

In sum, then, how do macro-level and micro-level analyses of varieties complement each other? A macro-level take-away from our corpus study was that constructional variants that categorically differ from the BrE and AmE norms were found almost exclusively in the L2 aka Outer Circle Englishes; and it is also members of this variety type that exhibit the greatest probabilistic differences in the semantic context expansion of use(d) to to non-animate subjects and stative verbs. Still, researchers should not be content with the coarse-grained potential explanation that language contact and SLA per se trigger extended, “overgeneralised” (see Section 2.2) uses of grammatical markers. The sociolinguistic specifics and intensity of contact certainly differ also within the group of Outer Circle Englishes. For example, take SLE as the odd one out in our dataset: why does SLE differ from the other seven Outer Circle varieties in not showing any categorically non-standard extensions of used to? A likely explanation lies in the sociolinguistic reality that meso- or acrolectal uses of English are less common as standard SLE is only used by a (typically highly educated) minority in the country (Ekanayaka 2020: 341; see also Kortmann et al. 2020) – a bias that the compilers of ICE-SL acknowledge to be reinforced by the over-representation of academics in their corpus (Bernaisch et al. 2019: 8). Therefore, at a local level of analysis, taking into account such sociolinguistic details as well as the grammatical systems of major contact languages in order to identify structures likely involved in cross-linguistic transfer appears to hold greater explanatory value than hypothesised macro-level trends.

5.2 Theoretical models and statistical modelling

In recent usage-based research on World Englishes, a gap has become apparent between major theoretical models, such as those by Kachru (1985) and Schneider (2007), and the increasingly sophisticated statistical models that seek to verify these theories but often fail to do so. Hundt (2021), in her survey on the issue, argues that this discrepancy partly derives from the fact that they target language structure at different levels of granularity. Indeed, only aggregate studies combining many structural features seem to replicate theoretical models, such as Kachru’s variety types, reasonably well (e.g. Szmrecsanyi and Kortmann 2009). For studies zooming in on single features, like habitual use(d) to here, more fine-grained “multifactorial” approaches have proved necessary that move past big-picture narratives and consider individual sociolinguistic constellations and contact languages more closely (e.g. Werner 2016, Leuckert 2019). For our own case study, this perspective has been sketched out briefly in the previous section. Moreover, we agree with Hundt in that:

[W]hat matters to speakers in terms of identity construction are not frequently used patterns and local patterns of co-variation with language-internal predictor variables but the use of salient patterns (often below the level of statistical significance and thus beyond the reach of probabilistic modelling) that are closely associated with language use in a particular region. (Hundt 2021: 15)

Categorically unique structures such as future-time will use to in HKE or contracted usen’t in IrE may be considered indicators of structural nativisation. These unique and non-standard features, however, occur with very low token frequencies, which make them prone to have little impact on statistical clusters, despite potentially being salient shibboleths for speech-community membership. In other words, such structural features may well be the linguistic reflexes of identity constructions shaping newer Englishes, but they can easily escape statistical modelling.

In a critical response to Hundt’s (2021) survey, Bernaisch et al. (2022) defend a quantitative approach for testing World Englishes models that focuses on constructions qualifying as “linguistic ants”. The metaphor of “linguistic ants” stands for largely schematic grammatical constructions with high usage frequencies and subtle, non-salient probabilistic variation. According to Bernaisch et al.’s argument, since these kinds of constructions bear the brunt of grammatical expression in a language, they should serve as the major test cases for verifying theories rather than focusing on “linguistic butterflies”. Butterfly collecting in linguistics pertains to non-canonical structures with low frequencies but typically high salience. Applied to the present case of habitual use(d) to, one may consider aforementioned structures like contracted usen’t and future-time will use to butterflies in this sense, whereas the common pattern of past-tense [used to V], including probabilistic variation regarding the main verb’s aktionsart, could be said to do the work of a linguistic ant. How to do justice to both the qualitative value of linguistic butterflies and the statistical impact of linguistic ants remains an open issue in World Englishes research as well as in usage-based linguistics more broadly.

To summarise, whether statistical analyses truly test general World Englishes theories seems to depend on whether they operate at a fitting level of granularity, including factors such as the number of varieties, the detail of classifying their sociolinguistic constellations, the number of structural features, and possibly also the “ant” or “butterfly” status of these features.

5.3 A matter of data resources: the limits of assessing change with synchronic data

The present case study can also serve to demonstrate that, although grammaticalisation processes tend to be fairly predictable in the sense of their strong inherent directionality of change, purely synchronic reconstructions of how conservative or innovative a grammatical marker of a given variety is may give rise to errors in interpretation. The concerns that Gries et al. (2018) raise about the uncritical application of the apparent-time method in contrastive World Englishes research, and which they substantiate for the case of the genitive alternation, therefore fully apply to investigating grammaticalisation processes.

Not all categorically deviant, non-standard variants of used to continue to appear as innovative once more detailed historical information is added, which is available for BrE and AmE at least thanks to rich diachronic records as reported in the OED Online, Visser (1969) and Neels (2015). For example, thanks to these sources, we know that present-tense use to has coexisted with past-tense used to for the longer part of the usage history of this auxiliary. Present-tense use to became obsolete in (standard) BrE and AmE only during the 19th century. Before, it was far from a being a minor variant, but diachronic corpus data discussed in Neels (2015) reveal that it was just as frequent as its past-tense counterpart during the Early Modern English period (ca. 1500–1700) before fading from usage during the subsequent centuries. For present-day L2 Englishes exhibiting present-tense use to, such as HKE and NigE among others, this triggers the open question if it is the outcome of innovation in the sense of a reinvention or if it was historically exported as part of the L1 founder dialects and survived until the present day. The latter scenario of transplanted, surviving dialect features (see Kortmann and Schneider 2011) would support the idea of colonial lag.

Similarly, in the ICE data, the negative contraction usen’t was identified as a formally highly grammaticalised variant exclusive to IrE. However, when consulting historical records, it turns out that it is far from clear, and perhaps rather unlikely, that IrE went this grammaticalisation step “on its own”. The OED Online lists several 19th-century and early 20th-century instances of usen’t from BrE prose (“use, v.” OED Online), suggesting that usen’t was rendered an “Irishism” from a synchronic present-day perspective only once this variant had disappeared from BrE usage.

To conclude, only through historical records do we learn that some variants that look like colonial innovations, and that possibly represent more advanced stages of grammaticalisation, have in fact a more complex history that would allow for a different interpretation, an opposite interpretation even when it comes to colonial lag. It is also conceivable that present-day grammatical differences between national varieties result not so much from different rates and kinds of change in the new Englishes but go back to differences in the original founder dialects (i.e. historical varieties that must not be equated with standard English). This underscores the methodological need for more diachronic/historical corpora for researching postcolonial Englishes. Creating such corpora, both as complements of synchronic ICE corpora (e.g. HiCE-Ghana, Brato 2019) and as independent corpora (e.g. The Diachronic Corpus of Hong Kong English; DC-HKE; Biewer et al. 2014), has become a recent trend in World Englishes research. For example, Evans (2014) has shown for HKE that data such as council proceedings, in his case prepared as the Corpus of Legislative Council Proceedings (1858–2012), often reach back quite far in time, allowing not only for historical but genuinely diachronic perspectives. Access to such corpora will provide critical access for studies of grammaticalisation and other processes of language change.

6 Conclusions

In the present study, we set out to investigate how well grammaticalisation processes align with recurrent big-picture narratives about World Englishes. The three narratives of colonial lag, contact-induced innovation and the epicentre hypothesis would lead researchers to make different, partly opposing predictions about grammaticalisation patterns depending on variety type. Our corpus results suggest that none of these three hypothesised trends is borne out consistently in the test case of habitual [use(d) to V] across 13 national Englishes. The most robust finding appears to be that Outer Circle Englishes deviate more strongly from BrE and AmE standards than other Inner Circle Englishes. Quantitatively speaking, most differences in the behaviour profiles of use(d) to were found to be small (single-digit ratio differences). From a qualitative angle, however, the attested presence of categorically distinct grammatical variants such as contracted usen’t, present-tense use to or future will use to constitute meaningful differences to dominant BrE and AmE standards. This angle should not be neglected when low usage frequencies prevent such features from showing up prominently in clustering analyses or other statistical models.

It is tempting to assign the greater presence of categorically distinct grammatical variants in Outer Circle Englishes to the factor of contact-induced innovation by L2 learners. On the one hand, we were able to substantiate this idea in some cases by identifying similarities and hence concrete transfer potential in the habituality systems of relevant contact languages. On the other hand, for some other features that appear innovative, we stressed that on closer inspection they could equally be “conservative” variants transplanted and surviving from historical founder dialects. An important realisation from this case study is therefore that, even for strongly directional changes like grammaticalisation, reconstructed trajectories remain speculative as long as the earlier colonial records remain unknown (see Schneider 2007: 85).

Corresponding author: Jakob Neels, Leipzig University, Leipzig, Germany, E-mail: jakob.neels@uni-leipzig.de

Acknowledgments

We would like to thank Dennis Schmechel and Ulrike Schönfeld for their assistance in the coding process. We also thank the anonymous reviewers and Stefan Th. Gries for their invaluable feedback on earlier versions of this article.

References

Anthony, Laurence. 2022. AntConc (Version 4.2.0) [computer software]. Tokyo, Japan: Waseda University. https://www.laurenceanthony.net/software (accessed 16 May 2024).Search in Google Scholar

Bernaisch, Tobias, Stefan Th. Gries & Benedikt Heller. 2022. Theoretical models and statistical modelling of linguistic epicentres. World Englishes 41(3). 333–346. https://doi.org/10.1111/weng.12580.Search in Google Scholar

Bernaisch, Tobias, Dushyanthi Mendis & Joybrato Mukherjee. 2019. Manual to the International Corpus of English – Sri Lanka. Giessen: Justus Liebig University, Department of English.Search in Google Scholar

Biewer, Carolin, Tobias Bernaisch, Mike Berger & Benedikt Heller. 2014. Compiling the diachronic corpus of Hong Kong English (DC-HKE): Motivation, progress and challenges. In Poster presentation at the 35th annual conference of the international computer archive of modern and medieval English (ICAME 35), University of Nottingham, April 30–May 4, 2014.Search in Google Scholar

Binnick, Robert I. 2005. The markers of habitual aspect in English. Journal of English Linguistics 33(4). 339–369. https://doi.org/10.1177/0075424205286006.Search in Google Scholar

Bisang, Walter & Andrej Malchukov (eds.). 2020. Grammaticalization scenarios: Cross-linguistic variation and universal tendencies (Comparative Handbooks of Linguistics 4.1–4.2). Berlin: Mouton de Gruyter.Search in Google Scholar

Bohmann, Axel. 2019. Variation in English worldwide: Registers and global varieties (Studies in English Language). Cambridge: Cambridge University Press.10.1017/9781108751339Search in Google Scholar

Botha, Werner & Tobias Bernaisch. 2025. World Englishes and sociolinguistic variation. World Englishes 44(1–2). 2–11. https://doi.org/10.1111/weng.12695.Search in Google Scholar

Brato, Thorsten. 2019. The historical Corpus of English in Ghana (HiCE Ghana): Motivation, compilation, opportunities. In Alexandra Esimaje, Ulrike Gut & Bassey E. Antia (eds.), Corpus linguistics and African Englishes (Studies in Corpus Linguistics 88), 119–141. Amsterdam: John Benjamins.10.1075/scl.88.06braSearch in Google Scholar

Bresnan, Joan. 2007. Is syntactic knowledge probabilistic? Experiments with the English dative alternation. In Sam Featherston & Wolfgang Sternefeld (eds.), Roots: Linguistics in search of its evidential base, 75–96. Berlin: Mouton de Gruyter.10.1515/9783110198621.75Search in Google Scholar

Bruthiaux, Paul. 2003. Squaring the circles: Issues in modeling English worldwide. International Journal of Applied Linguistics 13(2). 159–178. https://doi.org/10.1111/1473-4192.00042.Search in Google Scholar

Buschfeld, Sarah. 2020. Children’s English in Singapore: acquisition, properties, and use (Routledge Studies in World Englishes). London: Routledge.10.4324/9781315201030-2Search in Google Scholar

Bybee, Joan, Revere Perkins & William Pagliuca. 1994. The evolution of grammar: Tense, aspect and modality in the languages of the world. Chicago: University of Chicago Press.Search in Google Scholar

Carter, Ronald & Michael McCarthy. 2006. Cambridge grammar of English: A comprehensive guide; spoken and written English grammar and usage. Cambridge: Cambridge University Press.Search in Google Scholar

Collins, Peter. 2009. Modals and quasi-modals in World Englishes. World Englishes 28(3). 281–292. https://doi.org/10.1111/j.1467-971x.2009.01593.x.Search in Google Scholar

Collins, Peter (ed.). 2015. Grammatical change in English world-wide (Studies in Corpus Linguistics 67). Amsterdam: John Benjamins.10.1075/scl.67Search in Google Scholar

Collins, Peter. 2023. Grammatical variation in World Englishes: An onomasiological study. English World-Wide 44(2). 184–218. https://doi.org/10.1075/eww.21055.col.Search in Google Scholar

Correia Saavedra, David. 2021. Measurements of grammaticalization: Developing a quantitative index for the study of grammatical change (Trends in Linguistics, Studies and Monographs 366). Berlin: Mouton de Gruyter.10.1515/9783110753073Search in Google Scholar

Coussé, Evie, Peter Andersson & Joel Olofsson (eds.). 2018. Grammaticalization meets construction grammar (Constructional Approaches to Language 21). Amsterdam: John Benjamins.10.1075/cal.21Search in Google Scholar

Davies, Mark & Robert Fuchs. 2015. Expanding horizons in the study of World Englishes with the 1.9 billion word Global Web-based English Corpus (GloWbE). English World-Wide 36(1). 1–28. https://doi.org/10.1075/eww.36.1.01dav.Search in Google Scholar

Davydova, Julia. 2016. The present perfect in New Englishes: Common patterns in situations of language contact. In Valentin Werner, Elena Seoane & Cristina Suárez-Gómez (eds.), Re-Assessing the present perfect: Corpus studies and beyond (Topics in English Linguistics 91), 169–194. Berlin: Mouton de Gruyter.10.1515/9783110443530-009Search in Google Scholar

Deshors, Sandra C. & Sandra Götz. 2020. Common ground across globalized English varieties: A multivariate exploration of mental predicates in World Englishes. Corpus Linguistics and Linguistic Theory 16(1). 1–28. https://doi.org/10.1515/cllt-2016-0052.Search in Google Scholar

Divjak, Dagmar & Stefan Th. Gries. 2009. Corpus-based cognitive semantics: A contrastive study of phasal verbs in English and Russian. In Barbara Lewandowska-Tomaszczyk & Katarzyna Dziwirek (eds.), Studies in cognitive corpus linguistics (Łódź Studies in Language 18), 273–296. Frankfurt am Main: Peter Lang.Search in Google Scholar

Du Bois, John W., Wallace L. Chafe, Charles Meyer, Sandra A. Thompson, Robert Englebretson & Nii Martey. 2000–2005. Santa Barbara Corpus of spoken American English, parts 1–4. Philadelphia: Linguistic Data Consortium.Search in Google Scholar

Ekanayaka, Tanya N. I. 2020. Sri Lankan English. In Kingsley Bolton, Werner Botha & Andy Kirkpatrick (eds.), The handbook of Asian Englishes (Blackwell Handbooks in Linguistics), 337–354. Hoboken, NJ: Wiley-Blackwell.10.1002/9781118791882.ch14Search in Google Scholar

Evans, Stephen. 2014. The evolutionary dynamics of postcolonial Englishes: A Hong Kong case study. Journal of Sociolinguistics 18(5). 571–603. https://doi.org/10.1111/josl.12104.Search in Google Scholar

Fischer, Olga. 2008. On analogy as the motivation for grammaticalization. Studies in Language 32(2). 336–382. https://doi.org/10.1075/sl.32.2.04fis.Search in Google Scholar

Goldberg, Adele E. 2006. Constructions at work: The nature of generalization in language. Oxford: Oxford University Press.10.1093/acprof:oso/9780199268511.001.0001Search in Google Scholar

Greenbaum, Sidney & Gerald Nelson. 1996. The International Corpus of English (ICE) project. World Englishes 15(1). 3–15. https://doi.org/10.1111/j.1467-971x.1996.tb00088.x.Search in Google Scholar

Gries, Stefan Th. 2010. Behavioral profiles: A fine-grained and quantitative approach in corpus-based lexical semantics. The Mental Lexicon 5(3). 323–346. https://doi.org/10.1075/ml.5.3.04gri.Search in Google Scholar

Gries, Stefan Th., Tobias Bernaisch & Benedikt Heller. 2018. A corpus-linguistic account of the history of the genitive alternation in Singapore English. In Sandra C. Deshors (ed.), Modeling World Englishes: Assessing the interplay of emancipation and globalization of ESL varieties (Varieties of English Around the World 61), 245–279. Amsterdam: John Benjamins.10.1075/veaw.g61.10griSearch in Google Scholar

Hantson, André. 2005. The English perfect and the anti-perfect used to viewed from a comparative perspective. English Studies 86(3). 245–268. https://doi.org/10.1080/0013838042000339862.Search in Google Scholar

Haspelmath, Martin. 1998. Does grammaticalization need reanalysis? Studies in Language 22(2). 315–351. https://doi.org/10.1075/sl.22.2.03has.Search in Google Scholar

Heine, Bernd & Tania Kuteva. 2020. Contact and grammaticalization. In Raymond Hickey (ed.), The handbook of language contact (Blackwell Handbooks in Linguistics), 2nd edn. 93–112. Hoboken, NJ: Wiley-Blackwell.10.1002/9781119485094.ch4Search in Google Scholar

Heller, Benedikt, Tobias Bernaisch & Stefan Th. Gries. 2017. Empirical perspectives on two potential epicenters: The genitive alternation in Asian Englishes. ICAME Journal 41(1). 111–144. https://doi.org/10.1515/icame-2017-0005.Search in Google Scholar

Hickey, Raymond. 2007. Irish English: History and present-day forms (Studies in English Language). Cambridge: Cambridge University Press.10.1017/CBO9780511551048Search in Google Scholar

Himmelmann, Nikolaus P. 2004. Lexicalization and grammaticization: Opposite or orthogonal? In Walter Bisang, Nikolaus P. Himmelmann & Björn Wiemer (eds.), What makes grammaticalization? A look from its fringes and its components (Trends in Linguistics, Studies and Monographs 158), 21–42. Berlin: Mouton de Gruyter.10.1515/9783110197440.1.21Search in Google Scholar

Hopper, Paul J. 1991. On some principles of grammaticization. In Elizabeth Closs Traugott & Bernd Heine (eds.), Approaches to grammaticalization, vol. 1 (Typological Studies in Language 19:1), 17–35. Amsterdam: John Benjamins.Search in Google Scholar

Hopper, Paul J. & Elizabeth Closs Traugott. 2003. Grammaticalization (Cambridge Textbooks in Linguistics), 2nd edn. Cambridge: Cambridge University Press.Search in Google Scholar

Hundt, Marianne. 2009. Colonial lag, colonial innovation, or simply language change? In Günter Rohdenburg & Julia Schlüter (eds.), One language, two grammars? Differences between British and American English (Studies in English Language), 13–37. Cambridge: Cambridge University Press.10.1017/CBO9780511551970.002Search in Google Scholar

Hundt, Marianne. 2021. On models and modelling. World Englishes 40(3). 298–317. https://doi.org/10.1111/weng.12467.Search in Google Scholar

Hundt, Marianne, Paula Rautionaho & Carolin Strobl. 2020. Progressive or simple? A corpus-based study of aspect in World Englishes. Corpora 15(1). 77–106. https://doi.org/10.3366/cor.2020.0186.Search in Google Scholar

Kachru, Braj B. 1985. Standards, codification, and sociolinguistic realism: The English language in the outer circle. In Randolph Quirk & Henry Widdowson (eds.), English in the world: Teaching and learning the language and literatures, 11–30. Cambridge: Cambridge University Press.Search in Google Scholar

Kortmann, Bernd, Kerstin Lunkenheimer & Katharina Ehret (eds.). 2020. The electronic world Atlas of varieties of English. Zenodo.Search in Google Scholar

Kortmann, Bernd & Agnes Schneider. 2011. Grammaticalization in non-standard varieties of English. In Bernd Heine & Heiko Narrog (eds.), The Oxford handbook of grammaticalization (Oxford Handbooks), 263–278. Oxford: Oxford University Press.10.1093/oxfordhb/9780199586783.013.0021Search in Google Scholar

Kuteva, Tania, Bernd Heine, Bo Hong, Haiping Long, Heiko Narrog & Seongha Rhee. 2019. World lexicon of grammaticalization, 2nd edn. Cambridge: Cambridge University Press.10.1017/9781316479704Search in Google Scholar

Leuckert, Sven. 2019. Topicalization in Asian Englishes: Forms, functions, and frequencies of a fronting construction (Routledge Studies in World Englishes). London: Routledge.10.4324/9781351000437Search in Google Scholar

Leuckert, Sven & Sofia Rüdiger. 2021. Discourse markers and World Englishes. World Englishes 40(1). 482–487. https://doi.org/10.1111/weng.12535.Search in Google Scholar

Levshina, Natalia. 2015. How to do linguistics with R: Data exploration and statistical analysis. Amsterdam: John Benjamins.10.1075/z.195Search in Google Scholar

Lorenz, David & David Tizón-Couto. 2017. Coalescence and contraction of V-to-Vinf sequences in American English – evidence from spoken language. Corpus Linguistics and Linguistic Theory 20(1). 1–36. https://doi.org/10.1515/cllt-2015-0067.Search in Google Scholar

Loureiro-Porto, Lucía. 2019. Grammaticalization of semi-modals of necessity in Asian Englishes. English World-Wide 40(2). 115–142. https://doi.org/10.1075/eww.00025.lou.Search in Google Scholar

Mair, Christian & Geoffrey Leech. 2006. Current changes in English syntax. In Bas Aarts & April McMahon (eds.), The handbook of English linguistics (Blackwell Handbooks in Linguistics), 318–342. Oxford: Blackwell.10.1002/9780470753002.ch14Search in Google Scholar

Marckwardt, Albert H. 1958. American English. New York: Oxford University Press.Search in Google Scholar

Matras, Yaron & Jeanette Sakel. 2007. Investigating the mechanisms of pattern replication in language convergence. Studies in Language 31(4). 829–865. https://doi.org/10.1075/sl.31.4.05mat.Search in Google Scholar

Matthews, Stephen & Virginia Yip. 1994. Cantonese: A comprehensive grammar. London: Routledge.Search in Google Scholar

Mpiranya, Fidèle. 2015. Swahili grammar and workbook. London: Routledge.10.4324/9781315750699Search in Google Scholar

Neels, Jakob. 2015. The history of the quasi-auxiliary use(d) to: A usage-based account. Journal of Historical Linguistics 5(2). 177–234. https://doi.org/10.1075/jhl.5.2.01nee.Search in Google Scholar

Onysko, Alexander. 2016. Modeling World Englishes from the perspective of language contact. World Englishes 35(2). 196–220. https://doi.org/10.1111/weng.12191.Search in Google Scholar

Peters, Pam & Tobias Bernaisch. 2022. The current state of research into linguistic epicentres. World Englishes 41(3). 320–332. https://doi.org/10.1111/weng.12581.Search in Google Scholar

Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. A comprehensive grammar of the English language. London: Longman.Search in Google Scholar

R Core Team. 2021. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. URL. https://www.R-project.org/.Search in Google Scholar

Schiffman, Harold F. 1999. A reference grammar of spoken Tamil. Cambridge: Cambridge University Press.10.1017/CBO9780511519925Search in Google Scholar

Schneider, Edgar W. 2007. Postcolonial English: Varieties around the world (Cambridge Approaches to Language Contact). Cambridge: Cambridge University Press.Search in Google Scholar

Schneider, Edgar W. 2022. Parameters of epicentral status. World Englishes 41(3). 462–474. https://doi.org/10.1111/weng.12589.Search in Google Scholar

Sharma, Devyani. 2009. Typological diversity in new Englishes. English World-Wide 30(2). 170–195. https://doi.org/10.1075/eww.30.2.04sha.Search in Google Scholar

Siemund, Peter & Julia Davydova. 2017. World Englishes and the study of typology and universals. In Markku Filppula, Juhani Klemola & Devyani Sharma (eds.), The Oxford handbook of World Englishes (Oxford Handbooks), 123–146. Oxford: Oxford University Press.Search in Google Scholar

Sridhar, Kamal K. & Shikaripur N. Sridhar. 1986. Bridging the paradigm gap: second language acquisition theory and indigenized varieties of English. World Englishes 5(1). 3–14. https://doi.org/10.1111/j.1467-971x.1986.tb00636.x.Search in Google Scholar

Suárez-Gómez, Cristina & Lucía Loureiro-Porto. 2020. World Englishes and grammatical variation. World Englishes 39(3). https://doi.org/10.1111/weng.12478.Search in Google Scholar

Szmrecsanyi, Benedikt & Bernd Kortmann. 2009. The morphosyntax of varieties of English: A quantitative perspective. Lingua 119(11). 1643–1663. https://doi.org/10.1016/j.lingua.2007.09.016.Search in Google Scholar

Szmrecsanyi, Benedikt, Jason Grafmiller, Benedikt Heller & Melanie Röthlisberger. 2016. Around the world in three alternations: Modeling syntactic variation in varieties of English. English World-Wide 37(2). 109–137. https://doi.org/10.1075/eww.37.2.01szm.Search in Google Scholar

“Use, v.” 2004. Oxford English dictionary, March 2024. Oxford: Oxford University Press.Search in Google Scholar

Visser, Fredericus Th. 1969. An historical syntax of the English language, vol. 3: Syntactical units with two verbs. Leiden: Brill.Search in Google Scholar

Werner, Valentin. 2016. Overlap and divergence – aspects of the present perfect in World Englishes. In Elena Seoane & Cristina Suárez-Gómez (eds.), World Englishes: New theoretical and methodological considerations (Varieties of English Around the World 57), 113–142. Amsterdam: John Benjamins.10.1075/veaw.g57.06werSearch in Google Scholar

Williams, Jessica. 1987. Non-native varieties of English. A special case of language acquisition. English World-Wide 8(2). 161–199. https://doi.org/10.1075/eww.8.2.02wil.Search in Google Scholar

Received: 2024-06-06

Accepted: 2025-07-15

Published Online: 2025-08-15

This work is licensed under the Creative Commons Attribution 4.0 International License.

https://doi.org/10.1515/cllt-2024-0064

Keywords for this article

colonial lag; contact-induced change; epicentre hypothesis; grammaticalisation; habitual aspect; International Corpus of English

Creative Commons

BY 4.0