The diachronic change of English relativizers: a case study in the State of the Union addresses across two centuries

Tingyu Zhang; Jinman Li; Lei Lei

doi:10.1515/cllt-2023-0114

Article Open Access

The diachronic change of English relativizers: a case study in the State of the Union addresses across two centuries

Tingyu Zhang , Jinman Li and Lei Lei

Published/Copyright: September 27, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Corpus Linguistics and Linguistic Theory Volume 21 Issue 2

Abstract

Different types of relativizers are used to introduce relative clauses, and the change in the use of relativizers has attracted attention in recent decades. Despite progress in this area, the challenge of extracting relative clauses, especially those with zero relativizers, has limited our understanding of relativizer change. To this end, we investigated the evolving patterns of relativizers in the State of the Union addresses spanning two centuries, employing novel methodologies developed for this purpose. Different from findings of previous studies, our results showed an increase of that and a decrease of which in subject relative clauses, and an increase of zero relativizers in object relative clauses. The change could be attributed to changes in factors concerning text complexity and styles. The faster change rate around 1940s can be explained by factors such as prescriptivism and the transition in speech styles. Methodologically, our study has confirmed dependency parsing as a reliable method for automatic extraction of relative clauses.

Keywords: English relativizers; zero relativizers; diachronic change; language simplification; the State of the Union addresses

1 Introduction

A relative clause is a type of clause introduced by a relativizer, which serves to modify a noun phrase (Comrie 1989 [1981]; Croft 1990). For example, in the sentence “This is the letter that the man gave me.”, the relative clause (the man gave me) is introduced by the relativizer (that) to modify the head noun (the letter). The relativizer that in this case can be used interchangeably with which (i.e., “This is the letter which the man gave me.”) or the zero relativizer (“This is the letter ∅ the man gave me.”). Due to the functional overlap of the three relativizer variants, competition arises (Sauerland 2003), resulting in the phenomenon of relativizer variation (Cheshire et al. 2013). This variation has garnered considerable interest in recent linguistic research (Ball 1996; Biber and Conrad 2014; Fox and Thompson 2007; Grafmiller et al. 2018; Hinrichs et al. 2015).

One line of research investigates how the variation of relativizers is affected by their explicitness of referents (Biber and Conrad 2014). Which is an explicit and elaborated marker of a relative clause, directly signaling its presence. By contrast, that is a more versatile element, serving as a relativizer, complementizer, demonstrative, or determiner depending on the context (Biber and Conrad 2014: 215; Grafmiller et al. 2018). The contextual flexibility makes its role in registering an upcoming relative clause less explicit. In addition, a null or zero relativizer, lacking any overt form, is much less explicit in indicating the subsequent relative clause (Fox and Thompson 2007; Quirk 1957). The referential explicitness of relativizers affects their frequency of use and their distribution patterns (Hawkins 1990). To be specific, within a group of competing structures, more explicit grammatical alternatives are favored in contexts that exhibited higher linguistic complexity (Rohdenburg 1996: 149).

Several studies have thus explored how linguistic complexity affected the variation of relativizers (Gibson 2000; Hinrichs et al. 2015; Liu and Liu 2021; Quirk 1957; Zhang and Zhang 2021). First, the length of a clause was proposed as a factor to account for the distribution pattern of English relativizers (Quirk 1957). Shorter clauses preferred zero relativizers, whereas longer ones were less likely to use them (Quirk 1957: 108). Second, the dependency distance has been identified as a predictor of the complexity of relative clauses, particularly concerning the use of relativizers (Gibson 2000; Liu 2008). A shorter mean dependency distance in relative clauses may lead to the omission of relativizers (Liu and Liu 2021). Furthermore, Hinrichs et al. (2015) investigated a series of factors that may affect the use of relativizers, such as the length of the head noun, the length of the relative clause, the mean word length (by letter), the sentence length (by word), and type-token ratios in texts. Their findings revealed a significant correlation between the complexity in structures and the use of explicit relativizers. To be specific, which was preferred over that and zero in a more complex context.

In addition, the relativizer variation was also affected by the stylistic differences in texts (Biber 1986; Romaine 1982; Tagliamonte et al. 2005). Romaine (1982) found that wh-pronouns gradually infiltrated the relative system of modern written English but had little effect on spoken English. Over time, that has come to be recognized for its informal or participatory stylistic value whereas which was associated with formal stylistic value (D’Arcy and Tagliamonte 2010). Formal English writing, such as academic proses, exhibited a proportion of relativizers (which > that > zero) distinct from that in spoken varieties such as conversations (that > zero > which) (Biber 1986; Tagliamonte et al. 2005). Even within the same written register, fiction exhibited a higher frequency of that compared to learned prose, because fiction resembled closer to colloquial texts (Hinrichs et al. 2015).

Recently, an emerging line of research has focused on the diachronic change in the use of relativizers, which yielded mixed findings (Ball 1996; Lee 2020; Leech and Smith 2006). Ball (1996) was one of the pioneers who investigated the changing trajectory of English relativizers from a quantitative perspective. Ball’s study revealed that from the 17th century to the 19th century, the use of which increased, whereas that decreased, and zero remained stable at 1 % of nonpersonal subject relative clauses (Ball 1996: 249). In the 20th century, which decreased and that increased and zero fell to 0 % (Ball 1996: 249). However, probably due to the absence of techniques suitable for a large-scale data analysis, Ball (1996) manually examined texts and analyzed a limited sample of relative clauses. Thanks to techniques such as part-of-speech tagging and regular expressions, Leech and Smith (2006) analyzed the use of relativizers in the Lancaster-Oslo/Bergen (LOB) and Brown Corpora (1960s), comparing them with that in the Freiburg-Lancaster-Oslo/Bergen (FLOB) and Frown Corpora (1990s). Their study revealed an increase of that, a decrease of which, and no change of zero. Their finding was further verified in the follow-up studies analyzing the same corpora such as Leech et al. (2009), Hinrichs et al. (2015),and Grafmiller et al. (2018). However, Lee (2020) observed a divergent trend when comparing the use of relativizers in three corpora collected in 1993, 2005, and 2016. Lee (2020) noted a decrease of which from 1993 to 2005, followed by an increase of which from 2005 to 2016. The inconsistency with the findings of Leech and Smith (2006) suggests a more complex pattern of relativizers across different time periods.

Although previous studies have offered valuable insights into the change of English relativizers and possible factors behind them, they are limited as follows. First, existing diachronic studies on the use of relativizers have yielded contradictory results. The discrepancy likely stemmed from the cross-sectional nature of their research design and different languages or genres in their data. On the one hand, data at discrete time points and across different time spans were examined (e.g., data at two time points [i.e., 1960 and 1990] in Leech and Smith [2006], data at three time points [i.e., 1993, 2005, and 2016] in Lee [2020], and data at five time points [i.e., 1700, 1750, 1800, 1850, and 1900] in Krielke [2021]). On the other hand, different dialects of English (e.g., American and British English [Leech and Smith 2006] and Singaporean English [Lee 2020]) and various genres (e.g., general and scientific genres [Krielke 2021]) were explored. The reliance on a limited number of time points across various periods restricts the ability to offer a comprehensive and continuous account of long-term trends in relativizer usage. Therefore, variations in the use of relativizers may be observed across different time periods and/or languages and genres (Lee 2020; Leech and Smith 2006).

Second, there is a need for a reliable method for the efficient extraction of relative clauses, particularly those with zero relativizers. Earlier researchers, such as Ball (1996), had to manually tag and extract relative clauses and relativizers, which is labor-intensive and time-consuming with a very limited sample size. Subsequent researchers have employed techniques such as regular expressions (Leech and Smith 2006) or machine learning algorithms based on the tags of relative clauses defined in Penn Treebank (Grafmiller et al. 2018; Hinrichs et al. 2015). Although such methods are more sophisticated and can be used to process large samples of texts, their performance seems not satisfactory. For example, Frazee et al. (2015) reported a recall rate of only 0.85 and a precision value of 0.60 in the research of Hinrichs et al. (2015). The challenges in automatic detection of relative clauses and relativizers have led many studies to exclude zero relativizers, leaving the question of diachronic change of relativizers incompletely answered (Fajri and Okwar 2020; Krielke 2021; Lee 2020). For example, when Lee (2020) used the tag WDT (wh-determiner) to extract relativizers, they found zero “was difficult to locate, let alone extract tokens of their occurrence” (Lee 2020: 35).

Third, it is important to note the possible roles that two factors, i.e., language simplification and colloquialization, play in the diachronic evolution of relativizers. Language simplification is one of the key driving forces in shaping language structures for efficient communication. It has been examined in terms of metrics such as dependency distance minimization (Futrell et al. 2015; Lei and Wen 2020; Liu 2008) and inflection decay (Millward and Hayes 2012 [1988]; Zhu and Lei 2018). Colloquialization, on the other hand, referred to the process by which written English prose has incorporated elements of spoken language such as certain morphosyntactic and discourse features in the 20th century (Mair and Hundt 1995). The trend is marked by an increase of various colloquial features such as the use of colloquial lexicons, semi-modals (e.g., be going to), and personal pronouns (Leech et al. 2009). However, the evolution of colloquial language has often overlooked the role of relativizers, despite a notable shift towards more informal variants that overtaking its more formal counterpart which in written English (Leech and Smith 2006).

To summarize, previous studies on the diachronic evolution of relativizers are limited as follows. First, while investigations have mostly found an increase of that, a decrease of which in the 20th century (Ball 1996; Grafmiller et al. 2018; Hinrichs et al. 2015; Leech et al. 2009), inconsistent results were found due to difference in genres and time spans (Lee 2020; Leech et al. 2009). To date, no diachronic research has investigated relativizer usage within a consistently defined genre across a long span. Second, challenges in the automatic extraction of relative clauses may have caused a scarcity of large-scale studies on relativizers and relative clauses, particularly zero relativizers (Fajri and Okwar 2020; Krielke 2021; Lee 2020). Third, limited research has examined the effect of factors such as simplification (Krielke 2021) and colloquialization (Mair and Hundt 1995) on the use of relativizers. As one of our external reviewers correctly argued, the change in the use of relativizers are not solely because of the change of relativizers per se but also conditioned by changes in the textual habitat (Szmrecsanyi 2016).

To address the foregoing issues, this study endeavors to examine the factors underlying the relativizer change and their potential effect on the change in the use of relativizers. To be specific, the study aims to investigate the use of English relativizers in the State of the Union (SOTU) addresses over the past two centuries. First, we aim to develop a novel method for automatic and accurate extraction of English relative clauses. Second, we examine the diachronic change in the use of relativizers and the underlying factors within the context of the SOTU addresses. Three research questions are to be addressed as follows.

Research Question 1:

Did the use of relativizers change in the SOTU addresses over the past two centuries? If yes, what was the changing trend of the use of relativizers?

Research Question 2:

If there is a change, what are the factors that have affected the use of relativizers in the SOTU addresses and how do they affect the change of the use in relativizers?

Research Question 3:

If there is a change, does the changing speed remain consistent or vary in different periods of time in the SOTU addresses?

2 Methods

2.1 Data

This study utilized the corpus of the State of the Union (SOTU) addresses delivered by American presidents. The dataset has been widely used in previous research concerning the change of linguistic features (e.g., Lei and Wen [2020] and Burgers and Ahrens [2020]). The corpus is composed of 232 SOTU addresses delivered by 44 presidents,^[1] from President George Washington’s first address in 1790 to President Joseph Biden’s address in 2023.^[2] It consists of a total of 2,079,563 tokens and 74,174 sentences, with an average of approximately 28 tokens per sentence.

The corpus was chosen for our study for the following reasons. First, the timespan of the SOTU addresses covers 234 years, which provides sufficient temporal coverage to observe the possible evolution of linguistic features. In addition, the one-year intervals enable a fine-grained analysis of potential change trajectories. Second, all texts are freely available, and have already undergone proofreading prior to release (Lei and Wen 2020; Savoy 2015). Third, it should be noted that despite the comparable registers, i.e., political speeches (Burgers and Ahrens 2020), the delivery method of the SOTU addresses have varied. As shown in Figure 1, before 1933, the SOTU messages were primarily written reports. After President Franklin D. Roosevelt’s first speech in 1934, the modern practice of delivering spoken addresses began. That is, from 1934 onwards, the SOTU addresses can be considered primarily as oral speeches rather than written reports.^[3] Given that the method of delivery (spoken vs. written) may affect the use of relativizers (Biber 1986; Romaine 1982), the factor of the delivery method was considered in our study, which may provide us an opportunity to examine the effect of genres (i.e., written reports vs. oral speeches) on the changes of relativizers.

Figure 1:

Delivery methods of SOTU (1790–2023).

2.2 Extraction of relative clauses: a new method^[4]

The corpus texts underwent initial tokenization using Natural Language Toolkit (NLTK) and then parsing with SuPar. SuPar is a highly efficient dependency parser, which can help parse a text with an accuracy of 93.37 % for the unlabeled attachment score (UAS) and that of 91.27 % for the labeled attachment score (LAS) (Zhang et al. 2020). It parses the texts and identifies 37 universal syntactic relations between words in a sentence based on the Universal Dependency (UD) Scheme v2 (De Marneffe et al. 2021). To extract relative clauses from our data, we identified sentences containing the relation acl:relcl, which refers to the finite clause modifying a noun phrase (Chen et al. 2021). For example, in sentence (1), the relative clause that I proposed was successfully detected by the parser, represented by the subtree of the label acl:relcl (see Figure 2 for the dependency relations of word pairs in the sentence).

Figure 2:

Dependency relations of sentence (1).

(1)

I cared about the programs that I proposed.

Our proposed method also effectively detected zero relative clauses. For example, in sentence (2) where the relativizer is omitted, the acl:relcl relation between saw and man was accurately identified (see Figure 3). Therefore, regardless of the presence of relativizers, all possible relative clause candidates can be extracted from the texts as long as sentences with the acl:relcl relations were identified.

Figure 3:

Dependency relations of sentence (2).

(2)

The man I saw yesterday left this morning.

After the extraction of the relative clauses, we also extracted other linguistic features, i.e., the head nouns, the clause verbs, the relativizers, the restrictiveness of relative clauses, the role of head nouns, and the animacy of head nouns. The flowchart for the extraction of candidate relative clauses and linguistic features is depicted in Figure 4.

Head noun and clause verb: the dependency relation acl:relcl connected the head noun and the clause verb. The identification of the dependency relation acl:relcl allowed for the extraction of both head nouns and clause verbs.
Relativizer: we extracted the relativizers by matching them with a predefined list of known relativizers. The list includes which, whichever, that, who, whom, whose, whoever, whomever, however, when, whence, whenever, where, wherein, whereon, whereof, whereby, wherever, why, what, whatever, whatsoever, than, as, and in case. To ensure accuracy in our analysis, we required that the identified relativizers directly connect with the clause verbs. For example, in “…programs which that president proposed”, only which was extracted, because that functioned as a pronoun referring to president. In instances where no relativizers were identified, the absence was considered a relativizer omission. Consequently, the relativizers were designated as zero.
Restrictiveness of relative clauses: the restrictiveness of a relative clause was determined based on the presence or absence of a comma separating the relativizer from the head noun. A comma indicated a non-restrictive relative clause, while its absence suggested a restrictive clause.
Role of head nouns: the roles of head noun were extracted by means of the dependency relation of relativizers (e.g., nsubj, nsubjpass, obj), because the relativizer assumed the role of the head noun in the relative clause.
Animacy of head nouns: The head noun was assigned one of the binary animacy values (animate or inanimate) by referring to an animacy dictionary (Ji and Lin 2009). Words lacking entries in the dictionary, such as pronouns (e.g., those, all, and some) and infrequent words (e.g., Qaida), were manually checked and assigned animacy values.

Figure 4:

Flowchart for the extraction of candidate relative clauses.

After extracting all candidate relative clauses, we identified prototype relative clauses, defined as definite and restrictive relative clauses (Comrie 1989 [1981]; Givón 2001 [1984]: 24). Non-restrictive relative clauses were excluded due to their exclusive use of wh-relativizers. Among prototype relative clauses, we also excluded adverbial relative clauses and nominal relative clauses because this study did not consider adverbial relativizers^[5] (e.g., when, where, why) or nominal relativizers (e.g., what). Cleft sentences were also excluded because they only employed that. After the exclusion of relative clauses of the aforementioned types of clauses, our study narrowed its focus to subject and object relative clauses.

We restricted the analysis to inanimate head nouns, since animate head nouns employ zero and that but not which. For subject relative clauses, we focused on the use of which and that as in the sentence (3), since zero relativizers are not allowed to introduce subject relative clauses. For object relative clauses, we focused on the use of which, that and zero as in the sentence (4). The selection of subject and object relative clauses was driven by two considerations. First, they have been extensively examined and constitute the largest proportion of all relative clauses according to noun phrase accessibility hypothesis (Comrie 1989 [1981]). Our study was also not an exception with subject relative clauses accounting for 56.34 % and object relative clauses for 21.93 % of all relative clauses. Second, only under such circumstances could these relativizers be used interchangeably without grammar restriction (Sauerland 2003). The interchangeability allows for the observation of the natural changes in relativizers, facilitating the detection of other diachronic effects.

(3)

They saw the movie	*which*	received favorable reviews last night.
They saw the movie	*that*	received favorable reviews last night.

(4)

The movie	*which*	they saw last night received favorable reviews.
The movie	*that*	they saw last night received favorable reviews.
The movie	z ero	they saw last night received favorable reviews.

The detailed procedures for exclusion are listed in Table 1.

Table 1:

The process for the exclusion of non-target relative clauses.

Objectivity	Exclusion type	Example sentence	Exclusion rationale	Exclusion method	Count
To identify prototypical relative clauses	Noise	–	–	Excluded (clause verb, head nouns are nonalphanumeric characters)	25
	Nonrestrictive relative clauses	The concert, *which was held at the stadium* , was a great success.	Not protype relative clauses	Excluded (restrictiveness = “non-restrictive”)	3,153
	Nonfinite relative clauses	…might begin on models *the most approved by experience.*	Not protype relative clauses	Excluded (relativizer = “zero” and role of relativizer = “nsubj”)	53

To exclude types of relative clauses	Adverbial relative clauses	This is historic museum *where I have been* .	Adverbial relativizers not considered in the study	Excluded (relativizer = “when”, “where”, “why”, “as”, “whereby”, “whatever”, “whence”, “however”, “wherein”, “in case”, “wherever”, “whenever”, “whichever”, “whoever”, “whereof”, “whereon”, role of head noun = “advcl”, “advmod”)	1,067
	Adverbial relative clauses	The moment *our peace was threatened* …	Adverbial relativizers not considered in the study	Excluded with manual check (head nouns = “way(s)”, “means”, “manner”, “time”, “day(s)”, “year(s)”, “moment(s)”, “minute(s)”, “hour(s)”, relativizer = “zero”)	246
	Nominal relative clauses	I enjoyed *what he strongly recommended* .	Only use what	Excluded (relativizer = “what”, “whatsoever”, “whatever”, or role of relativizer = “appos”)	842
	Cleft sentences	*It is* under these circumstances only *that the elector can feel* …	Only use that	Excluded (role of relativizer = “mark”)	49

To guarantee the interchangeability of the relativizers	After the above exclusion, 19,695 relative clauses were obtained.
	The proportion of the roles of head nouns in relative clauses: subject (11,097, 56.34 %) > object (4,319, 21.93 %) > oblique (3,850, 19.55 %) > genitive (376, 1.91 %) > others (53, 0.27 %)
	Oblique relative clauses	The city *from which she comes* is very beautiful.	Not use that, zero	Excluded (role of relativizer = “obl”, “obl:npmod”, “det”, “nmod”, “case”)	3,873
	Genitive relative clauses	…a power *whose friendship we have uniformly and sincerely desired to cultivate* .	Only use whose	Excluded (role of relativizer = “nmod:poss”)	376
	Other types of relative clauses	…oil production is the highest *that it’s been in 8 years* .	Other relations according to UD	Excluded (role of relativizer not in “nsubj”, “obj”, such as “acl”)	45
	Relative clauses with animate head nouns	…people *that tries to walk in these difficult paths of independence and right* .	Not use which	Excluded (animacy = “animate”, relativizer = “who”, “whom”)	2,972
	12,429 relative clauses were obtained, with 8,255 subject relative clauses introduced by which and that and 4,174 object relative clauses introduced by which, that, and zero.

The role of relativizers (according to the Universal Dependency Framework v2.3^a):
NSUBJ	The syntactic subject and the proto-agent of a clause.
OBJ	The (accusative) object of a verb.
ADVMOD	A (non-clausal) adverb or adverbial phrase modifier.
MARK	The word introducing a clause subordinate to another clause.
OBL	A non-core (oblique) argument or adjunct.
OBL:NPMOD	A noun (phrase) (e.g., a measure phrase, extent, certain other absolutive nominal constructions) used as an adverbial modifier.
DET	The determiner of a noun phrase.
NMOD	A nominal dependent of another noun (phrase) and functionally corresponds to an attribute, or genitive complement.
NMOD:POSS	A possessive modifier preceding its nominal head.
APPOS	An appositional modifier of an NP to define or modify that NP.
CASE	A marker in an extended clausal projection.
ACL	A finite and non-finite clause that modify a noun.

^aSee a detailed description of English Dependency Relations: https://universaldependencies.org/en/dep/.

To validate our proposed method, we manually checked the recall rate and the precision value of the extracted relative clauses. First, we selected six sample texts from the SOTU corpus (1,237 sentences, 27,805 tokens). The samples were randomly chosen from various periods of time, with one in every half a century, namely 1792, 1809, 1871, 1927, 1965, and 2002. Next, two of our authors independently read the texts and manually identified the relative clauses. Any discrepancies in manual identification were discussed and resolved to reach an agreement. Our manual identification yielded 140 eligible relative clauses, while our proposed method automatically extracted 143 relative clauses. The proposed method missed one relative clause and mistakenly extracted four relative clauses, with a recall rate of 99.29 % and a precision value of 97.20 %.

Meanwhile, we manually verified the accuracy of the extracted features in sample texts. The extraction of head nouns, relativizers, roles of head nouns in relative clauses (e.g., subject and object), restrictiveness (restrictive vs. nonrestrictive relative clauses), and clause verbs all achieved a precision value of 100 %. The detection of animacy achieved a precision rate of 97.84 % when relying solely on the animacy dictionary (Ji and Lin 2009). The precision increased to 100 % when incorporating both dictionary-based assignment and manual annotation of words not found in the dictionary. In conclusion, the manual verification of the automatic extraction confirmed the reliability and validity of our proposed method.

2.3 Data processing and analysis^[6]

We processed and analyzed the data in the following steps. The data processing and analyses of the data were conducted with the programming language R. First, we calculated the proportions of which and that in subject relative clauses per year and the proportions of which, that, and zero in object relative clauses per year. The proportions from 1790 to 2023 were then visualized to illustrate the trends of relativizers within subject and object relative clauses. Then, simple linear regression (Wen and Lei 2022) was used to assess whether the proportions of relativizers were subject to significant changes over the two hundred years.^[7]

Second, to investigate the factors that may affect the use of relativizers, we identified 11 factors based on prior research, including studies by Hinrichs et al. (2015) and Grafmiller et al. (2018). The factors can be categorized into three types, i.e., those concerning linguistic complexity, those pertaining to text styles, and other external factors (see Table 2). To be specific, factors such as relative clause length, filler-gap dependency distance, and head noun length were the complexity indicators of relative clauses (Fox and Thompson 2007; Tagliamonte et al. 2005). Preceding relativizers are related to structural persistence or priming, which affects the ease of production (Szmrecsanyi 2006). Mean word length and mean sentence length serve as context indicators, with the former reflecting lexical complexity and the latter measuring syntactic complexity (Flesch 1948; Lei and Shi 2023). Text style factors included type-token ratio reflecting lexical richness (Biber 1986) and personal pronouns indicating involved or less formal styles (Biber 1986). It is worth noting that mean word length also pertains to text styles, as longer words often convey more specific, specialized meanings than shorter words (Zipf 1949). Precise lexical choice is not easy and rarely fully accomplished in speech (Chafe and Danielewicz 1987), leading to stylistic variation in mean word length (Biber 1986).

Table 2:

Factors that may affect the use of relativizers included in the multivariate modelling.

Factors	Description
Linguistic complexity

Relative clause length	The number of words in a relative clause (logarithm-converted).
Filler-gap dependency distance	The dependency distance between the filler (head noun) and the gap (clausal verb) of a relative clause.
Head noun length	The number of letters in a head noun.
Preceding relativizer	The relativizer used in the previous context in the same text (default: None).
Mean word length	The mean number of letters of a word in the text.
Mean sentence length	The mean number of words of a sentence in the text (log-converted).

Style

Delivery method	There are two types of delivery method, i.e., written reports and spoken speeches.
Type-token ratio	The number of unique word types divided by the number of individual words (tokens) in the text.
Personal pronouns	The mean number of personal pronouns per 10,000 words in the text (logarithm-converted).

External factors

President	There are 46 presidents delivering the State of the Union addresses.
Year	There are 234 years (1790–2023) in the State of the Union addresses.

To investigate the factors that may affect the use of relativizers, we conducted a series of simple logistic regressions, examining the significance and effect size of each independent factor following the steps of minimal adequate model.^[8] Factors demonstrating significance value below 0.05 (p < 0.05) were considered as relevant factors and included in the subsequent multivariate modelling.

We employed a mixed-effects multivariate logistic regression analysis to predict the use of which and that in subject relative clauses and the use of which/that and zero in object relative clauses.^[9] It should be noted that although object relative clauses exhibited a ternary alteration (which, that, and zero), we aggregated which and that to examine the alteration between which/that and zero. The method of the aggregation of which and that employed in this study aligns with established methodologies outlined in previous research by Hinrichs et al. (2015). It also facilitated the distinction between explicit and zero relativizers. To ensure comparability across factors, the factors relative clause length, mean sentence length and personal pronouns were logarithm-converted. The transformation can improve the interpretability of the models and prevent variables from dominating others due to larger eigenvalue ratios (Tomaschek et al. 2018). During the modelling process, the external factor year was excluded due to its strong correlation with other factors (e.g., relative clause length, filler-gap dependency distance, and mean sentence length), which introduced multicollinearity issues. After the multivariate modelling, we analyzed the evolution of predicative factors to determine the effect of time on these factors and their subsequent effects on relativizers.

Third, we examined whether the rate of the change is consistent throughout the time spans or varies in different periods of time. To identify possible abrupt changes, we employed the method of change point detection, i.e., the Pettitt test for single change point detection (Lanzante 1996; Shi and Lei 2022). The Pettitt test was a widely used statistical test applicable across various disciplines for the detection of significant change points or abrupt changes in time-series data (Verstraeten et al. 2006).

3 Results

In this section, we report the results concerning the diachronic change of relativizers, the factors affecting the change, and the change points in their diachronic trajectory.

3.1 Diachronic change of relativizers

To investigate the diachronic change of relativizers, we first analyzed the use of which and that in subject relative clauses, and then analyzed that of which, that and zero in object relative clauses.

Table 3 presents the proportions of relativizers in subject and object relative clauses. The results showed that approximately two thirds of the subject relative clauses were introduced by which, while approximately a half of the object relative clauses were zero relative clauses.

Table 3:

Proportions of relativizers in subject and object relative clauses.

	which	that	zero	Total
Subject relative clause	5,376 (65 %)	2,879 (35 %)	–	8,255
Object relative clause	1,551 (37 %)	561 (13 %)	2,062 (49 %)	4,174

The proportions of which and that in subject relative clauses from 1790 to 2023 are shown in Figure 5. The linear regression analysis revealed a significant decrease in the use of which (t = −24.18, F(1,230) = 584.8, p < 2.2e⁻¹⁶) and a significant increase in the use of that (t = 24.18, F(1,230) = 584.8, p < 2.2e⁻¹⁶) in subject relative clauses. The results suggested that that gradually replaced which in subject relative clauses in the State of the Union (SOTU) addresses.

Figure 5:

Diachronic change of relativizers in subject relative clauses. The dots represent the proportion of which (blue) and that (red) used in the subject relative clauses. The solid line represents smoothed trends of the dots using the locally weighted scatterplot smoothing, with the shadow indicating the 95 % confidence interval.

The proportions of which, that and zero in object relative clauses from 1790 to 2023 are reported in Figure 6. The results show a significant downtrend of which (t = −16.59, F[1,229] = 275.4, p < 2.2e⁻¹⁶). In contrast, upward trends were observed for that (t = 8.84, F[1,229] = 78.09, p < 2.2e⁻¹⁶) and zero (t = 10.58, F[1,229] = 111.9, p < 2.2e⁻¹⁶) in object relative clauses. By the end of the observed period, zero emerged as the dominant relativizer in place of which and that in object relative clauses in the SOTU addresses.

Figure 6:

Diachronic change of relativizers in object relative clauses. The dots represent the proportion of which (blue), that (red) and zero (green) used in the object relative clauses. The solid line represents smoothed trends of dots using the locally weighted scatterplot smoothing, with the shadow indicating the 95 % confidence interval.

A comparison of the proportions of zero relative clauses at two time points, 1960 and 1990 revealed inconsistencies with Hinrichs et al.’s (2015) findings. While their dataset demonstrated stability in the proportions of zero (69 % in 1960 vs. 70 % in 1990), our dataset showed an increase (57 % in 1960 vs. 88 % in 1990). Notably, our data revealed fluctuations in the use of zero relative clauses from 1960s to 1990s, characterized by a plunge from 1961 to 1969 followed by a rebound (see Figure 7). The discrepancy of Hinrichs et al.’s (2015) and our findings underscored the importance of a finer-grained investigation. Without it, the complex changes in the use of relativizers would have remained unnoticed (Lee 2020; Leech et al. 2009).

Figure 7:

The fluctuation in the use of zero relative clauses from 1960s to 1990s. The dots represent the proportion of zero used in the object relative clauses.

3.2 Factors affecting the diachronic change of relativizers

The results of the mixed-effects multivariate logistic regression model concerning subject relative clauses are summarized in Table 4. The model achieved an 80.6 % prediction accuracy of relativizers with a concordance value (C) of 0.81. The result indicated the model’s ability to discriminate which and that in subject relative clauses (Hinrichs et al. 2015). The inclusion of mean word length in the multivariate model was favored due to a strong negative correlation (R = −0.82) between mean word length and personal pronouns.^[10] Personal pronouns were subsequently excluded from the model, resulting in improved predictive performance. In addition, the variance inflation factors^[11] of all factors were lower than 1.01, well under the threshold 5 for detecting the collinearity issues (Tomaschek et al. 2018). Factors without a significant value were excluded in the model. It should be noted that although the delivery method of the addresses could predict relativizers in the simple logistic regression (z = 37.28, p < 2e⁻¹⁶), it failed to reach statistical significance (p = 0.99) in the multivariate model. A possible explanation is that the effect of delivery methods was diminished by indirect factors such as mean word length. In other words, its effect may be mediated or affected indirectly by its relationship with mean word length (MacKinnon et al. 2007).

Table 4:

The mixed-effects multivariate logistic regression model for the alteration between which and that in subject relative clauses. The predicted relativizer was which.

	Odds ratio	Estimate	p
Relative clause length	1.48	0.394	0.000***
Filler-gap dependency distance	1.07	0.068	0.000***
Head noun length	1.06	0.066	0.000***
Preceding relativizer (default: None)
that	0.37	−0.986	0.000***
which	0.85	−0.167	0.529
zero	0.63	−0.472	0.068
Mean word length	1.09	2.387	0.000***
Random factors
1 \| President	Intercept, N = 43, variance = 3.13
Summary statistics	N	8,255
	Correctly predicted	80.6 (baseline: 65.1 %)
	Somer’s D _xy	0.61
	C	0.80

*** denotes a level of statistical significance with a p-value less than 0.001.

To summarize, relativizer alternations (which vs. that) in subject relative clauses were affected by factors related to linguistic complexity and text styles. First, factors such as relative clause length, filler-gap dependency distance, and head noun length are indicators of linguistic complexity of relative clauses. Longer length or greater distance indicated higher complexity of relative clauses, leading to a greater probability of using the explicit relativizer which. Second, preceding relativizers served as an indicator of structural priming and persistence (Szmrecsanyi 2006). It was of interest to find that the use of that predicted subsequent use of that, but which failed to predict itself. The finding suggested that the effect of structural priming remained for that, but was mitigated by other complexity factors for which. For example, in cases where the relative clause length was short enough to use that, which may not be used irrespective of the preceding relativizer. Zero did not exhibit a priming effect, consistent with findings in syntactic priming literature (Gries 2005; Hinrichs et al. 2015). Third, mean word length served as an indicator of lexical complexity within contexts, affecting the use of which and that. The reason may be that mean word length not only reflected lexical complexity (Flesch 1948) but also demonstrated stylistic variation given its strong correlation with personal pronouns.

The findings of the mixed-effects multivariate logistic regression model in object relative clauses were presented in the Table 5. The model achieved an 82.3 % correct prediction rate and a concordance value of 0.82 in the use of relativizers in object relative clauses. The factors affecting relativizer alternations (which/that vs. zero) were related to linguistic complexity. For example, relative clause length and the filler-gap dependency distance were the measures of the complexity of relative clauses. Longer lengths and distances were associated with the use of explicit relativizer (which/that) and discouraged the use of zero. In addition, the significance of mean sentence length, an indicator of the overall complexity of context, suggested that contextual complexity also predicted the use of relativizers. Contexts characterized by higher linguistic complexity were likely to use explicit relativizers and prevent the use of zero.

Table 5:

The mixed-effects multivariate logistic regression model for alteration between which/that and zero in object relative clauses. The predicted relativizer was zero.

	Odds ratio	Estimate	p
Relative clause length	0.566	−0.570	0.000***
Filler-gap dependency distance	0.490	−0.713	0.000***
Mean sentence length	0.252	−1.380	0.000***
Random factors
1 \| President	Intercept, N = 43, variance = 0.872
Summary statistics	N	4,174
	Correctly predicted	82.3 (baseline: 50.6 %)
	Somer’s D _xy	0.65
	C	0.82

*** denotes a level of statistical significance with a p-value less than 0.001.

Diachronic changes of the predicative factors are listed in Table 6. Factors concerning linguistic complexity, i.e., relative clause length, filler-gap dependency distance, head noun length, mean word length, and mean sentence length had undergone a decreasing trend. The trend suggested that language simplification occurred in terms of various perspectives of speeches in the SOTU addresses (Lei and Wen 2020). Besides, factors associated with language styles such as mean word length showed a decreasing trend, particularly after the 1950s. The trend aligned with the general trend in English towards colloquialization (Leech et al. 2009). To be specific, the trend toward a more colloquial style affected factors such as mean word length, and potentially increased the use of zero relativizers. Also, the use of personal pronouns, which exhibited a “V” curve growth, largely mirrored the change in delivery methods of the addresses. During the first decade (1790–1800), the addresses were delivered as spoken speeches, accompanied by a large number of personal pronouns. From 1801 to 1912, a span featuring a shift to written reports, the use of personal pronouns steadily declined. After 1933, with the return to spoken speeches, the number of personal pronouns surged again. It may be President Franklin Roosevelt (presidency: 1933–1945) who ushered the modern tradition of the frequent use of we and our (Shogan 2016). The increased use of personal pronouns can also be attributed to the effect of colloquialization (Hundt and Mair 1999; Leech et al. 2009) and the need to establish a closer relation with the audience.

Table 6:

The diachronic changes of predicative factors of relativizer alteration.

Factors	Diachronic changes
Relative clause length
Filler-gap dependency distance
Head noun length
Mean word length
Mean sentence length
Personal pronouns

3.3 The abrupt change in the use of relativizers

Although a general upward trend of zero and a downward trend of which and that has been observed in the examined period, it remains unclear whether the rate of the change is consistent throughout the time series, or any sudden change occurred at specific phases. The Pettitt test for single change point detection (Verstraeten et al. 2006) revealed significant shifts of which, that, and zero relative clauses (p < 2.2e⁻¹⁶) in the examined timespan. The change points for the three types of relative clauses are listed in Table 7. It is of interest to find that the turning points for all the three types of relative clauses clustered around the year 1940, which revealed a faster descend of which relative clauses in 1939, an accelerated growth of that relative clauses in 1939 and zero relative clauses in 1942.

Table 7:

The change points in different types of relative clauses.

Type of relative clauses	Change points	Figures
Which relative clauses	1939
That relative clauses	1939
Zero relative clauses	1942

4 Discussion

Our study revealed the diachronic change of relativizers in the State of the Union (SOTU) addresses, which provided evidence to findings in previous studies (Grafmiller et al. 2018; Hinrichs et al. 2015; Krielke 2021; Leech and Smith 2006). To be specific, the examined two centuries witnessed a decline of which and an increase of that in subject relative clauses (Hinrichs et al. 2015; Leech et al. 2009), alongside a rise of zero in object relative clauses. In the following sections, we discuss the factors underlying the diachronic change of relativizers and the significance of the present research.

4.1 Diachronic changes of relativizers

The first finding of our study pertains to the diachronic change of relativizers in the SOTU addresses across the past 234 years. The study revealed an increase of that over which and an increase of zero over which and that. The results of the multivariate model suggested that the observed changes may not solely reflect grammatical alternations in relativizer forms. Instead, these shifts could be indicative of broader external factors influencing the textual habitat (Szmrecsanyi 2016). To be specific, the textual habitat has undergone a decrease in linguistic complexity across various aspects, indicating a trend towards simplification in the SOTU addresses. The complexity principle posited a preference for explicit relativizers in complex linguistic contexts (Rohdenburg 1996), because explicit relativizers could help facilitate clarifying the structure and meaning of complex relative clauses (Tagliamonte et al. 2005). In other words, in simpler relative clauses, explicit relativizers may not be necessary, which results in their reduced frequency. Hence, due to the trend of linguistic simplification in the textual habitat, the occurrence of explicit relativizers has become less frequent, resulting in less which and more that, and more zero than which/that. In addition, style factors such as personal pronouns also exhibited a trend towards a more colloquial style in the SOTU addresses. Considering the stylistic difference of the relativizers (Biber 1986, 1995), the colloquialization in texts led to the increase of that and the decrease of which. In summary, relativizers evolved with an increase of that over which and zero over which/that, which appeared to be influenced by factors such as simplification and colloquialization in the textual habitat.

In addition, our findings provided valuable insights concerning the change of various linguistic features from the perspective of the American presidents’ addresses. First, the sentence length has undergone a decreasing trend, which corroborated the previous findings in public speaking (Tsizhmovska and Martyushev 2021) and scientific discourses (Hundt et al. 2012). Second, the decrease in filler-gap dependency distance, which is related to the integration and retrieval costs in processing relative clauses (Futrell et al. 2015; Gibson 2000), reflected the trend of cost saving in human communication. The finding provided further evidence for dependency distance minimization (Futrell et al. 2020; Lei and Wen 2020) from the perspective of non-adjacent dependency. Last, with more zero relativizers, noun phrases with modifier construction (Nouns + relative clauses) (Hoffmann and Trousdale 2013) tended to be compressed, allowing speakers to convey the same message with the minimum number of words (Baker 2011; Zhu and Lei 2018). The results support the claim that sentence structure is optimized for efficiency, similar to observed patterns at both lexical (e.g., omission of infinitive marker) (Berdicevskis et al. 2024) and phonetic levels (e.g., omission of explosive sounds) (Mahowald et al. 2018; Manin 2008).

The findings of diachronic change of relativizers also reflected a trend towards explicitness reduction (Biber and Gray 2016). The trend manifested in the increased frequency of less explicit relativizers. For example, the less explicit relativizer that was increasingly favored over which in subject relative clauses. Similarly, within object relative clauses, the least explicit option zero exhibited a growing prevalence, surpassing both which and that. In 1790, both zero and that only held a small proportion, but they increased gradually and overtook which around 1960. The post-1960 distribution (zero > that > which) inverted the degree of their explicitness in the referents (which > that > zero). The observed trend of explicitness reduction aligned with the findings of Biber and Gray (2016), who reported the decrease of explicitness across various linguistic elements, such as connectors (e.g., from linking adverbials to colons) and modifiers (e.g., from clausal modifiers to phrasal modifiers). Our finding, showing a decrease in relativizer explicitness, corroborated Biber and Gray’s (2016) observations regarding linguistic change.

4.2 What may accelerate language change?

The growth pattern of language change is also of interest. According to the Piotrowski Law (Piotrowskaja and Piotrowski 1974), the development of linguistic units (e.g., vocabulary growth) usually follows a path of initial slow growth, subsequent acceleration, and eventual deceleration (Sun and Baayen 2021; Wu et al. 2016). The path shape is similar to an “S”, also known as an S-curve model (Sun and Baayen 2021). The S-curve model describes changes in frequency at morphological, semantic, and syntactic levels (Sun and Baayen 2021). Our study indicated that the development of relativizers also adhered to the S-curve model, exhibiting a gradual start and subsequent acceleration. However, the change of relativizers appeared to have not yet reached the final stable phase. The change points of the relativizers which, that, and zero have revealed an acceleration in their diachronic change around 1940. If the principle of the least effort as a universal constraint on language development (Millward and Hayes 2012 [1988]; Zipf 1949) was considered, the abrupt change in the rate or the acceleration of the relativizers around 1940 can be explained as follows.

The first possible explanation pertains to the issue of prescriptivism. In 1926, Henry Fowler, renowned for A Dictionary of Modern English Usage, advocated the use of that in restrictive relative clauses to balance which, the sole permissible relativizer in non-restrictive relative clauses (Fowler 1926). The prescriptive rule, disseminated through education, dictionaries, and proofreading tools, gradually permeated everyday language use (Hinrichs et al. 2015). However, the widespread adoption of such a prescriptive rule can be a protracted process (Schmidt and McCreary 1977), which may explain the time lag between the proposal of prescriptive rules in 1926 and the accelerated shift in the relativizer change in the 1940s.

Technological innovations in media around the 1940s may also have sped up the change of relativizers. One pivotal development was the invention of color television in 1940, which profoundly impacted the life of the American public (Murray 2018). This groundbreaking technology broke down geographical barriers, facilitated language communication, and fostered extensive language contact. The surge in language contact, in turn, could provide opportunities for audiences to encounter diverse linguistic structures and usage patterns (Blount and Sanches 1977), thereby expediting the formation of new relativizer distributions.

Another possible catalyst that could accelerate the change of the relativizers is the transition in delivery styles of the SOTU addresses. The year 1940 was in the middle of the administration of President Franklin D. Roosevelt, who has significantly shaped the modern oral practice of the Annual Message (Shogan 2016). The stylistic analyses by Savoy (2015) revealed that since Roosevelt, the presidents tended to adopt a style distinct from their predecessors, such as the use of more adjectives (e.g., Eisenhower), pronouns (e.g., Clinton), or verbs (e.g., Obama). Consequently, the shift towards a modern oral speech could constitute another possible factor that accelerated the change of relativizers.

While we have discussed the aforementioned factors, it is important to acknowledge that other factors may also contribute to its acceleration, such as the popularization of education after the second world war. However, the factors which can affect language change are so complex and intricate that we cannot exhaust all of them in this study.

4.3 Extracting zero relative clauses based on dependency grammar

Methodologically, our analysis highlighted the benefits of using a long-term diachronic corpus and an improved method of automatic extraction of relative clauses.

First, the long-term diachronic corpus is useful for revealing both the general trend and the nuanced fluctuations. An important finding is that the increase of zero in our datasets appeared to contradict the stable state of zero reported in previous studies such as Hinrichs et al. (2015). The observed difference may arise from disparities in time span and sampling intervals. Hinrichs et al. (2015) focused on comparison of two periods (1960 and 1990), a timeframe of 30 years that may not be sufficient enough for a comprehensive observation of relativizer change. While the use of zero may appear consistent at both ends of the 30-year period, fluctuations in its occurrence could have transpired that were not captured by the two designated time points. Therefore, for one thing, the long-term corpus in our study, encompassing successive periods over more than 200 years, enriches our understanding of the long-term development of American presidents’ speeches (Hilpert and Cuyckens 2016; Hilpert and Gries 2016). For another, the intervals are also of importance because the dynamics of diachronic change can range from relative stability to substantial reorganization (Buchstaller 2009), necessitating a fine level of observational granularity (Hilpert and Gries 2016: 44). These even and fine-grained intervals enable us to scrutinize ups and downs, even when the proportions at either end appear similar (Ball 1996).

Second, the automatic retrieval and annotation of relativizers and relative clauses, particularly zero ones, have proven invaluable for large-scale research on relative clauses. Researchers previously manually classified and extracted relative clauses from a limited number of texts (Olofsson 1981). With the advent of tree banks, an exhaustive compilation of regular expressions was used to search for specific forms of relative clauses (Roland et al. 2007). Subsequently, supervised machine learning was introduced to extract relative clauses utilizing pretrained data. The method, nevertheless, has yielded barely satisfactory recall and precision values of 0.85 and 0.60 respectively (Frazee et al. 2015). This issue has been addressed in our study with the use of a dependency parser to identify the relation labeled as acl:relcl (clausal modifiers of nouns). The approach facilitates the extraction of all relative clauses with high accuracy and efficiency (with recall and precision rates both above 0.9). Hopefully, this efficient retrieval approach can be adopted to explore large corpora and to explore the use and change of different sophisticated syntactic structures.

5 Conclusions

Based on a large corpus of texts over a long span of fine-grained intervals, this study has first traced the diachronic change of relativizers in the State of the Union (SOTU) addresses across the past 234 years. The findings revealed a general upward trend in the use of that over which in subject relative clauses and an increasing preference for zero over either of the two overt relativizers (which and that) in object relative clauses. In addition, the diachronic change of relativizers is not only the change in relativizers per se, but also arises from the effect of simplification and colloquialization on factors predicative of relativizers. We also discussed the social or educational effect of factors such as prescriptivism, the media innovation, and transition in speech delivery methods, which could account for the acceleration during the S-curve growth of relativizers. Our study also provides methodological implications for the use of dependency parsers in the automatic extraction of complex structures and the utilization of diachronic corpus data in research concerning language change.

Future research may focus on addressing the following limitation. Due to the study’s reliance on the SOTU addresses, which are different from spontaneous, everyday use of language, we should be cautious about the representativeness of the data and generalizability of the findings to general English. As one of the anonymous reviewers pointed out, future investigation can employ other diachronic data (either of a larger size or of different genres and languages) to examine how relativizers may change.

Corresponding author: Lei Lei, Shanghai International Studies University, Shanghai, China, E-mail: leileicn@126.com

Tingyu Zhang and Jinman Li contributed equally to the work and should be regarded as co-first authors.

Funding source: The Major Research Grant of Shanghai International Studies University

Award Identifier / Grant number: 23ZD011

Research funding: This work was supported by “The Major Research Grant of Shanghai International Studies University” under the grant no. 23ZD011.

References

Baker, Paul. 2011. Times may change, but we will always have money: Diachronic variation in recent British English. Journal of English Linguistics 39(1). 65–88. https://doi.org/10.1177/0075424210368368.Search in Google Scholar

Ball, Catherine N. 1996. A diachronic study of relative markers in spoken and written English. Language Variation and Change 8(2). 227–258. https://doi.org/10.1017/s0954394500001150.Search in Google Scholar

Berdicevskis, Aleksandrs, Evie Coussé, Alexander Koplenig & Yvonne Adesam. 2024. To drop or not to drop? Predicting the omission of the infinitival marker in a Swedish future construction. Corpus Linguistics and Linguistic Theory 20(1). 219–261. https://doi.org/10.1515/cllt-2022-0101.Search in Google Scholar

Biber, Douglas. 1986. Spoken and written textual dimensions in English: Resolving the contradictory findings. Language 62(2). 384–414. https://doi.org/10.2307/414678.Search in Google Scholar

Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge University Press.10.1017/CBO9780511519871Search in Google Scholar

Biber, Douglas & Susan Conrad. 2014. Variation in English: Multi-dimensional studies. London & New York: Routledge.10.4324/9781315840888Search in Google Scholar

Biber, Douglas & Bethany Gray. 2016. Grammatical complexity in academic English: Linguistic change in writing. Cambridge: Cambridge University Press.10.1017/CBO9780511920776Search in Google Scholar

Blount, Ben & Mary Sanches (eds.). 1977. Sociocultural dimensions of language change. New York: Academic Press.10.1016/B978-0-12-107450-0.50007-3Search in Google Scholar

Buchstaller, Isabelle. 2009. The quantitative analysis of morphosyntactic variation: Constructing and quantifying the denominator. Language and Linguistics Compass 3(4). 1010–1033. https://doi.org/10.1111/j.1749-818x.2009.00142.x.Search in Google Scholar

Burgers, Christian & Kathleen Ahrens. 2020. Change in metaphorical framing: Metaphors of TRADE in 225 years of State of the Union addresses (1790–2014). Applied Linguistics 41(2). 260–279. https://doi.org/10.1093/applin/amy055.Search in Google Scholar

Chafe, Wallace & Jane Danielewicz. 1987. Properties of spoken and written language. In Horowitz Rosalind & Samuels Jay (eds.), Comprehending oral and written language, 83–113. San Diego: Academic Press.10.1163/9789004653436_007Search in Google Scholar

Chen, Xiaobin, Theodora Alexopoulou & Ianthi Tsimpli. 2021. Automatic extraction of subordinate clauses and its application in second language acquisition research. Behavior Research Methods 53. 803–817. https://doi.org/10.3758/s13428-020-01456-7.Search in Google Scholar

Cheshire, Jenny, David Adger & Sue Fox. 2013. Relative who and the actuation problem. Lingua 126. 51–77. https://doi.org/10.1016/j.lingua.2012.11.014.Search in Google Scholar

Comrie, Bernard. 1989 [1981]. Language universals and linguistic typology: Syntax and morphology, 2nd edn. Chicago: University of Chicago Press.Search in Google Scholar

Croft, William. 1990. Typology and universals. Cambridge: Cambridge University Press.Search in Google Scholar

D’Arcy, Alexandra & Sali A. Tagliamonte. 2010. Prestige, accommodation, and the legacy of relative who. Language in Society 39(3). 383–410. https://doi.org/10.1017/s0047404510000205.Search in Google Scholar

De Marneffe, Marie-Catherine, Christopher D. Manning, Joakim Nivre & Daniel Zeman. 2021. Universal dependencies. Computational Linguistics 47(2). 255–308.10.1162/coli_a_00402Search in Google Scholar

Fajri, Muchamad & Victoria Okwar. 2020. Exploring a diachronic change in the use of English relative clauses: A corpus-based study and its implication for pedagogy. SAGE Open 10(4). 1–10. https://doi.org/10.1177/2158244020975027.Search in Google Scholar

Flesch, Rudolph. 1948. A new readability yardstick. Journal of Applied Psychology 32(3). 221–233. https://doi.org/10.1037/h0057532.Search in Google Scholar

Fowler, Henry. 1926. A dictionary of Modern English usage. Oxford: Clarendon Press.Search in Google Scholar

Fox, Barbara A. & Sandra A. Thompson. 2007. Relative clauses in English conversation: Relativizers, frequency, and the notion of construction. Studies in Language 31(2). 293–326. https://doi.org/10.1075/sl.31.2.03fox.Search in Google Scholar

Frazee, Joseph, Lars Hinrichs, Benedikt Szmrecsanyi & Axel Bohmann. 2015. Which-hunting and the Standard English relative clause: Online supplement: Automatic zero-relative detection. Language 91(4). s1–s3. https://doi.org/10.1353/lan.2015.0070.Search in Google Scholar

Futrell, Richard, Kyle Mahowald & Edward Gibson. 2015. Large-scale evidence of dependency length minimization in 37 languages. Proceedings of the National Academy of Sciences 112(33). 10336–10341. https://doi.org/10.1073/pnas.1502134112.Search in Google Scholar

Futrell, Richard, Roger P. Levy & Edward Gibson. 2020. Dependency locality as an explanatory principle for word order. Language 96(2). 371–412. https://doi.org/10.1353/lan.2020.0024.Search in Google Scholar

Gibson, Edward. 2000. The dependency locality theory: A distance-based theory of linguistic complexity. In Marantz Alec, Miyashita Yasushi & O’Neil Wayne (eds.), Image, language, brain: Papers from the first mind articulation project symposium, 95–126. Cambridge & Massachusetts: MIT Press.10.7551/mitpress/3654.003.0008Search in Google Scholar

Givón, Talmy. 2001 [1984]. Syntax: A functional-typological introduction, 2nd edn. Amsterdam: John Benjamins Publishing Company.Search in Google Scholar

Grafmiller, Jason, Benedikt Szmrecsanyi & Lars Hinrichs. 2018. Restricting the restrictive relativizer: Constraints on subject and non-subject English relative clauses. Corpus Linguistics and Linguistic Theory 14(2). 309–355. https://doi.org/10.1515/cllt-2016-0015.Search in Google Scholar

Gries, Stefan Th. 2005. Syntactic priming: A corpus-based approach. Journal of Psycholinguistic Research 34. 365–399. https://doi.org/10.1007/s10936-005-6139-3.Search in Google Scholar

Hawkins, John A. 1990. A parsing theory of word order universals. Linguistic Inquiry 21(2). 223–261.Search in Google Scholar

Hilpert, Martin & Hubert Cuyckens. 2016. How do corpus-based techniques advance description and theory in English historical linguistics? An introduction to the special issue. Corpus Linguistics and Linguistic Theory 12(1). 1–5. https://doi.org/10.1515/cllt-2015-0065.Search in Google Scholar

Hilpert, Martin & Stefan Gries. 2016. Quantitative approaches to diachronic corpus linguistics. In Merja Kytö & Päivi Pahta (eds.), The Cambridge handbook of English historical linguistics, 36–53. Cambridge: Cambridge University Press.10.1017/CBO9781139600231.003Search in Google Scholar

Hinrichs, Lars, Benedikt Szmrecsanyi & Axel Bohmann. 2015. Which-hunting and the Standard English relative clause. Language 91(4). 806–836. https://doi.org/10.1353/lan.2015.0062.Search in Google Scholar

Hoffmann, Thomas & Graeme Trousdale (eds.). 2013. The Oxford handbook of construction grammar. Oxford & New York: Oxford University Press.10.1093/oxfordhb/9780195396683.001.0001Search in Google Scholar

Hundt, Marianne & Christian Mair. 1999. “Agile” and “uptight” genres: The corpus-based approach to language change in progress. International Journal of Corpus Linguistics 4(2). 221–242. https://doi.org/10.1075/ijcl.4.2.02hun.Search in Google Scholar

Hundt, Marianne, David Denison & Gerold Schneider. 2012. Relative complexity in scientific discourse. English Language and Linguistics 16(2). 209–240. https://doi.org/10.1017/s1360674312000032.Search in Google Scholar

Ji, Heng & Dekang Lin. 2009. Gender and animacy knowledge discovery from web-scale n-grams for unsupervised person mention detection. In Paper presented at proceedings of the 23rd Pacific Asia Conference on language, Information and computation, Hong Kong, December.Search in Google Scholar

Krielke, Marie-Pauline. 2021. Relativizers as markers of grammatical complexity: A diachronic, cross-register study of English and German. Bergen Language and Linguistics Studies 11(1). 91–120. https://doi.org/10.15845/bells.v11i1.3440.Search in Google Scholar

Lanzante, John R. 1996. Resistant, robust, and non-parametric techniques for the analysis of climate data: Theory and examples, including applications to historical radiosonde station data. International Journal of Climatology 16(11). 1197–1226. https://doi.org/10.1002/(sici)1097-0088(199611)16:11<1197::aid-joc89>3.0.co;2-l.10.1002/(SICI)1097-0088(199611)16:11<1197::AID-JOC89>3.0.CO;2-LSearch in Google Scholar

Lee, Kit Mun. 2020. Relative clauses in a modern diachronic corpus of Singapore English. Asia Pacific Journal of Corpus Research 1(1). 31–60.Search in Google Scholar

Leech, Geoffrey & Nicholas Smith. 2006. Recent grammatical change in written English 1961-1992: Some preliminary findings of a comparison of American with British English. In Antoinette Renouf & Andrew Kehoe (eds.), The changing face of corpus linguistics, 185–204. Netherlands: Brill.10.1163/9789401201797_013Search in Google Scholar

Leech, Geoffrey, Marianne Hundt, Christian Mair & Nicholas Smith. 2009. Change in contemporary English: A grammatical study. Cambridge & New York: Cambridge University Press.10.1017/CBO9780511642210Search in Google Scholar

Lei, Lei & Yaqian Shi. 2023. Syntactic complexity in adapted extracurricular reading materials. System 113. 103002. https://doi.org/10.1016/j.system.2023.103002.Search in Google Scholar

Lei, Lei & Ju Wen. 2020. Is dependency distance experiencing a process of minimization? A diachronic study based on the State of the Union addresses. Lingua 239. 102762. https://doi.org/10.1016/j.lingua.2019.102762.Search in Google Scholar

Liu, Haitao. 2008. Dependency distance as a metric of language comprehension difficulty. Journal of Cognitive Science 9(2). 159–191. https://doi.org/10.17791/jcs.2008.9.2.159.Search in Google Scholar

Liu, Jinlu & Haitao Liu. 2021. A quantitative investigation of the ellipsis of English relativizers. Linguistics Vanguard 7(1). 20210020. https://doi.org/10.1515/lingvan-2021-0020.Search in Google Scholar

MacKinnon, David P., Amanda J. Fairchild & Matthew S. Fritz. 2007. Mediation analysis. Annual Review of Psychology 58. 593–614. https://doi.org/10.1146/annurev.psych.58.110405.085542.Search in Google Scholar

Mahowald, Kyle, Isabelle Dautriche, Edward Gibson & Steven T. Piantadosi. 2018. Word forms are structured for efficient use. Cognitive Science 42(8). 3116–3134. https://doi.org/10.1111/cogs.12689.Search in Google Scholar

Mair, Christian & Marianne Hundt. 1995. Why is the progressive becoming more frequent in English? A corpus-based investigation of language change in progress. Zeitschrift Fur Anglistik Und Amerikanistik [Journal of English and American Studies] 43(2). 111–122.Search in Google Scholar

Manin, Dmitrii Y. 2008. Zipf’s law and avoidance of excessive synonymy. Cognitive Science 32(7). 1075–1098. https://doi.org/10.1080/03640210802020003.Search in Google Scholar

Millward, Celia M. & Mary Hayes. 2012 [1988]. A biography of the English language, 3rd edn. Boston: Cengage Learning.Search in Google Scholar

Murray, Susan. 2018. Bright signals: A history of color television. Duham, NC & London: Duke University Press.10.1215/9780822371700Search in Google Scholar

Olofsson, Arne. 1981. Relative junctions in written American English. Göteborg: University of Gothenburg dissertation.Search in Google Scholar

Piotrowskaja, Anna & Raimund Piotrowski. 1974. Matematičeskie modeli v diachronii i tekstoobrazovanii [Mathematical models in diachronism and text imaging]. In Raimund Piotrowski (ed.), Statistika reči i avtomatičeskij analiz teksta [Statistics of things and automatic analysis of text], 361–400. Leningrad: Nauka.Search in Google Scholar

Quirk, Randolph. 1957. Relative clauses in educated spoken English. English Studies 38(1–6). 97–109. https://doi.org/10.1080/00138385708596993.Search in Google Scholar

Rohdenburg, Günter. 1996. Cognitive complexity and increased grammatical explicitness in English. Cognitive Linguistics 7(2). 149–182. https://doi.org/10.1515/cogl.1996.7.2.149.Search in Google Scholar

Roland, Douglas, Frederic Dick & Jeffrey L. Elman. 2007. Frequency of basic English grammatical structures: A corpus analysis. Journal of Memory and Language 57(3). 348–379. https://doi.org/10.1016/j.jml.2007.03.002.Search in Google Scholar

Romaine, Suzanne. 1982. Socio-historical linguistics: Its status and methodology. Cambridge & New York: Cambridge University Press.10.1017/CBO9780511720130Search in Google Scholar

Sauerland, Uli. 2003. Unpronounced heads in relative clauses. In Kerstin Schwabe & Susanne Winkler (eds.), The interfaces: Deriving and interpreting omitted structures, 205–226. Amsterdam: John Benjamins Publishing Company.10.1075/la.61.10sauSearch in Google Scholar

Savoy, Jacques. 2015. Text clustering: An application with the State of the Union addresses. Journal of the Association for Information Science and Technology 66(8). 1645–1654. https://doi.org/10.1002/asi.23283.Search in Google Scholar

Schmidt, Richard W. & Carol F. McCreary. 1977. Standard and super-standard English: Recognition and use of prescriptive rules by native and non-native speakers. TESOL Quarterly 11(4). 415–429. https://doi.org/10.2307/3585738.Search in Google Scholar

Shi, Yaqian & Lei Lei. 2022. Lexical richness and text length: An entropy-based perspective. Journal of Quantitative Linguistics 29(1). 62–79. https://doi.org/10.1080/09296174.2020.1766346.Search in Google Scholar

Shogan, Colleen J. 2016. The president’s State of the Union address: Tradition, function, and policy implications. Congressional Research Service R40132. https://crsreports.congress.gov/product/pdf/R/R40132 (Accessed 27 August 2024).Search in Google Scholar

Sun, Kun & Harald Baayen. 2021. Hyphenation as a compounding technique in English. Language Sciences 83. 101326. https://doi.org/10.1016/j.langsci.2020.101326.Search in Google Scholar

Szmrecsanyi, Benedikt. 2006. Morphosyntactic persistence in spoken English: A corpus study at the intersection of variationist sociolinguistics, psycholinguistics, and discourse analysis. Berlin & New York: Mouton de Gruyter.10.1515/9783110197808Search in Google Scholar

Szmrecsanyi, Benedikt. 2016. About text frequencies in historical linguistics: Disentangling environmental and grammatical change. Corpus Linguistics and Linguistic Theory 12(1). 153–171. https://doi.org/10.1515/cllt-2015-0068.Search in Google Scholar

Tagliamonte, Sali, Jennifer Smith & Helen Lawrence. 2005. No taming the vernacular! Insights from the relatives in northern Britain. Language Variation and Change 17(1). 75–112. https://doi.org/10.1017/s0954394505050040.Search in Google Scholar

Tomaschek, Fabian, Peter Hendrix & R. Harald Baayen. 2018. Strategies for addressing collinearity in multivariate linguistic data. Journal of Phonetics 71. 249–267. https://doi.org/10.1016/j.wocn.2018.09.004.Search in Google Scholar

Tsizhmovska, Natalia L. & Leonid M. Martyushev. 2021. Principle of least effort and sentence length in public speaking. Entropy 23(8). 1023. https://doi.org/10.3390/e23081023.Search in Google Scholar

Verstraeten, Gert, Jean Poesen, Gaston Demarée & Christian Salles. 2006. Long-term (105 years) variability in rain erosivity as derived from 10-min rainfall depth data for Ukkel (Brussels, Belgium): Implications for assessing soil erosion rates. Journal of Geophysical Research 111(D22). 1–11. https://doi.org/10.1029/2006jd007169.Search in Google Scholar

Wen, Ju & Lei Lei. 2022. Linguistic positivity bias in academic writing: A large-scale diachronic study in life sciences across 50 years. Applied Linguistics 43(2). 340–364. https://doi.org/10.1093/applin/amab037.Search in Google Scholar

Wu, Junhui, Qingshun He & Guangwu Feng. 2016. Rethinking the grammaticalization of ruture be going to: A corpus-based approach. Journal of Quantitative Linguistics 23(4). 317–341. https://doi.org/10.1080/09296174.2016.1226427.Search in Google Scholar

Zhang, Lili & Chun Zhang. 2021. Why do L2 learners choose the implicit marker for object relative clauses when it is optionally explicit? Lingua 258. 103097. https://doi.org/10.1016/j.lingua.2021.103097.Search in Google Scholar

Zhang, Yu, Zhenghua Li & Min Zhang. 2020. Efficient second-order TreeCRF for neural dependency parsing. In Paper presented at the Proceedings of the 58th annual Meeting of the Association for computational linguistics, Online, July.10.18653/v1/2020.acl-main.302Search in Google Scholar

Zhu, Haoran & Lei Lei. 2018. Is Modern English becoming less inflectionally diversified? Evidence from entropy-based algorithm. Lingua 216. 10–27. https://doi.org/10.1016/j.lingua.2018.10.006.Search in Google Scholar

Zipf, George. 1949. Human behavior and the principle of least effort. Cambridge & Massachusetts: Addison-Wesley Press.Search in Google Scholar

Received: 2023-11-30

Accepted: 2024-09-06

Published Online: 2024-09-27

Published in Print: 2025-05-26

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/cllt-2023-0114

Keywords for this article

English relativizers; zero relativizers; diachronic change; language simplification; the State of the Union addresses

Creative Commons

BY 4.0

The diachronic change of English relativizers: a case study in the State of the Union addresses across two centuries

Article

Abstract

1 Introduction

Research Question 1:

Research Question 2:

Research Question 3:

2 Methods

2.1 Data

2.2 Extraction of relative clauses: a new method[4]

2.3 Data processing and analysis[6]

3 Results

3.1 Diachronic change of relativizers

3.2 Factors affecting the diachronic change of relativizers

3.3 The abrupt change in the use of relativizers

4 Discussion

4.1 Diachronic changes of relativizers

4.2 What may accelerate language change?

4.3 Extracting zero relative clauses based on dependency grammar

5 Conclusions

References

Articles in the same Issue

Articles in the same Issue

Articles in the same Issue

2.2 Extraction of relative clauses: a new method^[4]

2.3 Data processing and analysis^[6]