Home Linguistics & Semiotics A cross-modal analysis of lexical sophistication: EFL and ESL learners in written and spoken production
Article Open Access

A cross-modal analysis of lexical sophistication: EFL and ESL learners in written and spoken production

  • Hyunbin Yoo and Hyunwoo Kim EMAIL logo
Published/Copyright: December 12, 2023

Abstract

The present study investigated the effects of two usage-related factors – modality (written vs. spoken) and language learning contexts (EFL vs. ESL) – on lexical sophistication in second language (L2) production. We measured 14 features of lexical sophistication in written and spoken texts produced by EFL and ESL learners with matched proficiency. The results showed significant interactions of modality and L2 learning contexts in several indices. In three indices, the EFL learners used more sophisticated words in writing than in speaking. In six indices, the gaps of lexical sophistication scores between writing and speaking were greater for the EFL than the ESL group. The conjoined effects of these factors are argued to stem from disparity in the amount and type of L2 input provided in EFL and ESL contexts.

1 Introduction

In the field of second language (L2) production, lexical sophistication has been widely adopted as a core construct for evaluating the depth and breadth of L2 vocabulary knowledge, as well as overall production abilities (e.g., Kim et al. 2018; Kyle and Crossley 2015; Read 2000). This concept primarily pertains to the frequency of language use, with its indices measuring the range of uses of uncommon and advanced words (Crossley et al. 2016; Lu 2012). Lexical sophistication has predominantly been investigated in terms of its explanatory validity in L2 proficiency, specifically focusing on how the consistent use of sophisticated words contributes to explaining variations in learner proficiency (Crossley and McNamara 2012; Eguchi and Kyle 2020; Kyle and Crossley 2015; Lu 2012). However, a noticeable gap exists in understanding how factors associated with language experiences, such as modality and L2 learning contexts, modulate lexical sophistication in learner production.

The current study attempts to expand the research scope on lexical sophistication by exploring the effects of two usage-related factors on L2 lexical sophistication. Specifically, it aims to investigate the roles of modality and L2 learning contexts within the theoretical framework of the usage-based approach (Bybee 2008; Ellis 2006). Modality, including writing and speaking, emerges as a crucial factor affecting L2 production performance. Previous studies have reported convergent findings that L2 learners exhibit greater linguistic complexity, including vocabulary usage and structural complexity, in written production compared to spoken production (e.g., Ellis and Yuan 2005; Hwang et al. 2020; Kormos 2014; Vasylets et al. 2017). The context of L2 learning is another factor that can influence L2 production performance. It is broadly acknowledged that learners engaged in learning English as a foreign language (EFL) differ in various aspects from those learning English as a second language (ESL), particularly in the quantity and types of linguistic input they encounter. EFL learners, for instance, receive a limited amount of L2 input, mostly in written form from textbooks, with minimal access to spoken input (e.g., Saito 2017). Conversely, ESL learners can have extensive exposure to both written and spoken input in various social contexts, both inside and outside of the classroom (Barrot and Gabinete 2021). These distinct learning contexts may significantly impact L2 production performance, including lexical sophistication, among EFL and ESL learners.

While modality and L2 learning contexts have been investigated independently in diverse research areas, their combined contribution to L2 vocabulary production remains less explored. We predict that modality interacts with L2 learning contexts, resulting in differing levels of lexical sophistication between written and spoken production among EFL and ESL learners due to variations in the quantity and quality of linguistic input they receive. The findings from this study are expected to shed light on the interacting roles of production modes and L2 learning contexts in L2 vocabulary production, thereby advancing our understanding of how L2 learners acquire and use sophisticated vocabulary in diverse settings.

2 Literature review

2.1 Lexical sophistication

Lexical sophistication is operationalized as a metric that measures “the proportion of relatively unusual or advanced words in the learner’s text” (Read 2000: 203). Traditional indices assessing lexical sophistication include the ratio of high-grade words (e.g., those introduced at grade 9 or later) to the total word count (Linnarud 1986) and the proportion of low-frequency words relative to all words (Laufer and Nation 1995). With the advancements in computational programs and text analysis techniques, recent research has introduced finer-grained lexical sophistication indices, capturing the multidimensional aspects of L2 lexical knowledge. In this section, we discuss 14 widely adopted lexical sophistication indices, which fall into four dimensions (Kyle and Crossley 2015).

Word frequency and range. Word frequency has been recognized as a key indicator of L2 production abilities. It is widely accepted that high-frequency words are typically learned earlier and more easily than low-frequency words (Ellis 2002). The underlying assumption of this frequency effect is that repeated exposure to a word strengthens the association between its form and meaning, allowing for more rapid and efficient access and retrieval during production (Ellis 2006; Gollan et al. 2008). Previous studies have commonly assessed word frequency by either counting the number of words falling within specific frequency bands determined by their ranking in a reference corpus (e.g., Laufer and Nation 1995; Morris and Cobb 2004), or by tallying the frequency of a word’s appearance in a reference corpus (e.g., Kyle and Crossley 2015).

However, the word frequency index has a limited application since it often falls short of covering specific types of words that remain uncaptured in a broad term of frequency. For instance, some technical words may be highly frequent in a narrow set of contexts but appear less frequently in general language use (Kyle et al. 2018). To address this issue, the concept of word range (known variably as dispersion, entropy, and contextual diversity) has been introduced. Word range counts the number of documents in a reference corpus in which a target word occurs, with a lower range value indicating a word that appears in fewer contexts and is considered more specialized and sophisticated (Kyle et al. 2018). Previous research on L2 lexical sophistication has measured this index by counting the number of documents in which a word appears, both in spoken contexts (e.g., Crossley et al. 2013; Kyle and Crossley 2015) and written contexts (e.g., Kim et al. 2018).

Academic language. Sophisticated lexical items also include words and phrases commonly encountered in academic contexts (Kyle et al. 2018). Two notable academic word lists serve as key indices of lexical sophistication: the Academic Word Lists (AWL; Coxhead 2000) and the Academic Formulas List (AFL; Simpson–Vlach and Ellis 2010). The AWL includes academic words identified in written academic corpora, while the AFL contains lists of formulaic sequences documented in both written and spoken academic corpora. Previous studies computed frequency scores of academic language based on these lists, revealing significant associations with L2 speaking proficiency (Kyle and Crossley 2015) and writing proficiency (Kim et al. 2018).

Psycholinguistic word information. The psycholinguistic properties of words are associated with words’ learnability (Salsbury et al. 2011). Specific indices within this dimension include word concreteness, imageability, meaningfulness, familiarity, and age of acquisition (Coltheart 1981; Kuperman et al. 2012). Concreteness refers to the extent to which a word is concrete or abstract. Words evoking more tangible and perceptible mental representations (e.g., orange, doctor) are deemed more concrete and, consequently, considered less sophisticated compared to abstract words (e.g., impossible, truth). Imageability reflects the ease with which a mental image of a word can be created. Meaningfulness measures the degree to which a word is associated with other words. Familiarity assesses how commonly a word is experienced (e.g., paper vs. encephalon), with this index strongly correlated with word frequency (Kim et al. 2018). Age of acquisition refers to the age at which a particular word is learned.

Previous studies have utilized native speakers’ judgment ratings from the Medical Research Council Psycholinguistic Database (Coltheart 1981) to evaluate words’ concreteness, imageability, meaningfulness, and familiarity (e.g., Crossley and McNamara 2012; Kim et al. 2018). In addition, native speaker judgment scores reported by Kuperman et al. (2012) have been employed to determine the age of acquisition of words (Kim et al. 2018). These psycholinguistic word indices have been shown to account for variance in the quality of written texts (Crossley and McNamara 2012; Kim et al. 2018) and reflect the longitudinal development of L2 vocabulary in spoken languages (Crossley et al. 2016; Salsbury et al. 2011).

N-gram frequency, range, and association strength. N-grams, referring to sequences of multiword units (Hyland 2008), encompass bigrams (e.g., each other, thank you) and trigrams (e.g., out of the, a lot of). These n-grams serve as valuable indicators of phraseological complexity in learner production, revealing how learners combine different lexical items to create a variety of word combinations (Paquot 2019). Several studies have adopted n-gram features as constructs of lexical sophistication, which is a subset of phraseological sophistication (e.g., Kim et al. 2018; Kyle and Crossley 2015).

Indices related to n-grams include frequency, range, and association strength. Similar to the assessment of word frequency and range, previous research computed n-gram frequency and range by tallying the occurrences of n-grams and determining the number of documents in which they appear in a native-speaker reference corpus (Kim et al. 2018; Kyle and Crossley 2015). These indices have been identified as strong indicators of L2 proficiency in both written (e.g., Kim et al. 2018) and spoken production (e.g., Kyle and Crossley 2015).

N-gram association strength measures the likelihood of a sequence of words occurring together. A widely used index for this purpose is Delta P, a well-established measure for evaluating the association strength of word sequences. Delta P is calculated as the difference between the probability of word A occurring with word B and the probability of word A occurring with any word other than word B (Kyle et al. 2018). Given its consideration of directionality between words, Delta P effectively addresses issues related to word order effects. Texts exhibiting higher proficiency tended to include bigrams and trigrams with stronger associations (Kim et al. 2018; Kyle et al. 2018; Paquot 2019). Consequently, higher Delta P scores are interpreted as indicative of a higher degree of lexical sophistication.

2.2 Effect of modality on L2 production

It is well established that L2 learners show varying degrees of linguistic complexity in writing versus speaking (e.g., Biber et al. 2016; Ellis and Yuan 2005; Hwang et al. 2020; Kim and Hwang 2022; Kormos 2014; Vasylets et al. 2017). However, the majority of studies have focused on lexical diversity and/or syntactic complexity, with few exploring lexical sophistication. For example, Kormos (2014) examined various linguistic complexity measures in the written and spoken texts produced by secondary-school foreign language learners with upper-intermediate proficiency. Her findings indicated higher lexical accuracy, greater lexical variation, and more complexity in noun phrases in written output compared to spoken output. Similarly, Ellis and Yuan (2005) found that adult Chinese EFL learners demonstrated increased vocabulary accuracy in writing than in speaking.

The impact of production modality, as observed in previous research, can be captured in terms of different psycholinguistic mechanisms underlying the two production modes (Kormos 2014). In general, writing is considered a cognitively less demanding process than speaking. For instance, Vasylets et al. (2017: 397) characterized writing as “offline, permanent, and visible” while describing speaking as “online and evanescent”. According to their descriptions, writing involves a cyclical process in which a writer can move back and forth between message planning, linguistic encoding, and writing processes (Halliday 2002; Kellogg 1996). In contrast, speaking is a linear, time-constrained process that places significant processing demands on the speaker, particularly for L2 speakers who are more constrained than native speakers in the automatic formulation and articulation of speech (Kormos 2014).

The distinct nature of the writing and speaking processes suggests that modality can directly influence L2 learners’ vocabulary production. However, there is a gap in research specifically investigating the effect of modality on L2 lexical sophistication. Instead, studies have examined L2 lexical sophistication separately in the domains of written and spoken production. For example, Kim et al. (2018) analyzed college students’ written essays using various lexical sophistication indices, as mentioned in the previous section. Their results showed that several indices, including word frequency and range, age of acquisition, and n-gram properties (frequency, range, and association), significantly contributed to the variance in L2 writing proficiency. In another study, Eguchi and Kyle (2020) explored lexical sophistication in oral interviews of Japanese-speaking learners, demonstrating that indices such as word concreteness, word frequency, orthographic/phonological neighbors, word accessibility, and bigram association significantly explained the variance in the learners’ proficiency scores.

While each of these studies identified different lexical sophistication indices as significant components of L2 proficiency, it is impossible to make direct comparisons of these outcomes due to differences in data and analysis methods. This limitation underscores the need for further research that examines both written and spoken production using consistent methodologies. Moreover, it remains unclear whether the modality effect interacts with other factors such as L2 learning contexts. In what follows, we outline differences between EFL and ESL contexts and consider how these differences may interact with modality to influence L2 lexical sophistication.

2.3 Differences between EFL and ESL contexts

The defining characteristics of EFL and ESL contexts are best captured by Kachru’s (1985) landmark descriptions of the three major English use contexts: the inner circle (L1 varieties, e.g., Australia, the United States, United Kingdom), the outer circle (ESL varieties, e.g., India, Malaysia, Pakistan, Philippines, Singapore), and the expanding circle (EFL varieties, e.g., China, Greece, Indonesia, Japan, Korea). As members of the outer circle, speakers in the ESL context use English for daily communication (Tarnopolsky 2000), either as the official language or as an additional language of the government. As a result, ESL learners encounter a variety of linguistic and sociolinguistic features of English in diverse, realistic situations beyond classroom settings. In contrast, the use of English in the EFL context is confined to formal instructional settings, offering limited opportunities for students to use the language for communicative purposes in real-world situations (Tarnopolsky 2000).

These differences imply a significant divergence in both the quantity and the type of L2 input provided to ESL and EFL students. Given that ESL students are exposed to English in various social settings, they tend to have ample opportunities to receive a multiplicity of written and spoken linguistic input on a daily basis (Barrot and Gabinete 2021). Conversely, students in the EFL context encounter English predominantly within a classroom setting, with limited exposure to real-world communicative situations (Barrot and Gabinete 2021; Tarnopolsky 2000). Moreover, textbooks serve as the primary source of input in the EFL context, resulting in students mostly receiving written input, with few experiences of spoken language through in-class practices (Saito 2017).

Based on the distinctive features of these two English learning contexts, several studies have compared vocabulary usage between EFL and ESL students, with a primary focus on examining written production. The findings of these studies have been varied. For example, Barrot and Gabinete (2021) observed that ESL learners tend to use a greater number of words compared to their EFL counterparts. Similarly, in their analyses of thesis abstracts written by EFL learners, ESL learners, and English native speakers, Nasseri and Thompson (2021) found that ESL learners and native speakers exhibited a higher number of lexical words and a wider variety of vocabulary compared to EFL learners. In contrast, Zhang and Kang (2022) showed that Chinese EFL students displayed a higher degree of lexical diversity in written production compared to Hong Kong ESL learners.

Regarding lexical sophistication, to the best of our knowledge, only one study has conducted comparative analyses between EFL and ESL learners. Choemue and Bram (2021) examined scientific journal articles written by EFL authors from Indonesia and Thailand and ESL authors from the Philippines and Malaysia. Their assessment of lexical sophistication included calculations based on the number of advanced word tokens as defined by Laufer and Nation (1995), along with measuring lexical density and lexical diversity. Their results indicated no significant differences between the two learner groups across all three domains.

While these studies have contributed to our understanding of diverse aspects of lexical complexity influenced by learning contexts, several questions remain unanswered. Although Choemue and Bram (2021) found little difference between EFL and ESL learners, this finding may not necessarily apply to non-scientific contexts of production. Moreover, their focus on lexical sophistication primarily centered on word frequency, such as the use of advanced words. It thus remains unclear whether differences exist between EFL and ESL learners in other aspects of lexical sophistication. Crucially, most of the previous investigations have focused on written production, leaving open the question of how learning contexts interact with production modality in affecting L2 lexical sophistication.

Examining the influence of L2 learning contexts provides a valuable avenue for testing various theoretical models that establish a link between language experience and L2 vocabulary usage. One noteworthy model in this context is the usage-based approach to L2 acquisition (Bybee 2008; Ellis 2006). This approach emphasizes the role of a learner’s language usage experience as a driving force behind language acquisition. It posits that the memories of prior utterances within an individual’s language use profile, combined with frequency-based statistical learning mechanisms, serve as the primary sources of their linguistic knowledge (Ellis 2006). Of particular importance in the context of usage-based vocabulary learning is the role of word exposure frequency. According to this approach, words that are experienced frequently become firmly entrenched and automatized in a learner’s mental lexicon, enabling more efficient processing, storage, and retrieval of lexical information (Gollan et al. 2008; Verspoor and Schmitt 2013).

In line with this perspective, we hypothesize that L2 learners will exhibit varying degrees of lexical sophistication contingent upon their language experience. Specifically, within the usage-based approach, when a language learner is exposed to a lexical item with greater frequency, their knowledge of the word, including its form and meaning, becomes more deeply entrenched, facilitating the acquisition and retrieval of the lexical information (Ellis 2002, 2006; Ellis and Wulff 2015). From this standpoint, one might anticipate that ESL learners, who encounter English in natural settings, exhibit higher degrees of lexical sophistication compared to EFL learners, who predominantly learn English vocabulary through classroom instruction. Despite this prediction, however, empirical evidence supporting the impact of learning contexts on L2 lexical sophistication, particularly within the theoretical framework of the usage-based approach, remains scarce.

3 The current study

This study aims to extend previous work on L2 lexical sophistication by investigating the effects of two variables: modality and L2 learning contexts, both assumed to play significant roles in shaping L2 vocabulary usage. Additionally, we expand the work of Choemue and Bram (2021), which primarily focused on the frequency of advanced words as a measure of lexical sophistication in EFL and ESL learners, by incorporating more diverse and nuanced measures of lexical sophistication.

Recent advancements in the field of natural language processing have enabled a detailed and nuanced assessment of lexical sophistication. While earlier approaches predominantly relied on word frequency data (Laufer and Nation 1995; Linnarud 1986), recent studies have utilized computer programs, such as the Tool for the Automatic Analysis of Lexical Sophistication (TAALES; Kyle and Crossley 2015), to evaluate various aspects of lexical sophistication in L2 production (e.g., Crossley et al. 2013; Eguchi and Kyle 2020; Kim et al. 2018; Kyle et al. 2018; Kyle and Crossley 2015). The lexical sophistication indices assessed in these studies include diverse dimensions, including word and n-gram frequency, contextual range of words and n-grams, the use of academic language expressions, psycholinguistic properties, and semantic relationships with other words. Among them, some indices exhibit strong correlations, both conceptually and mathematically. Moreover, not all indices demonstrate reliable validity in accounting for variations in learner proficiency. In this study, we focus on 14 lexical sophistication indices that do not conceptually overlap and have been validated as significant indicators of L2 vocabulary abilities and proficiency (Crossley et al. 2013; Kim et al. 2018; Kyle et al. 2018; Kyle and Crossley 2015).

To explore how modality and L2 learning contexts interact to affect L2 lexical sophistication, the current study formulated the following research questions.

  1. Do L2 learners show different degrees of lexical sophistication in written versus spoken production?

  2. Do L2 learning contexts modulate the extent to which modality affects lexical sophistication in L2 production?

Given the well-established effect of modality on L2 production, we expect L2 learners to demonstrate a higher level of lexical sophistication in written compared to spoken production. Crucially, we predict that this modality effect will interact with L2 learning contexts. Specifically, the gap between written and spoken production is expected to be more pronounced in EFL learners compared to ESL learners, as EFL learners are assumed to have had a more limited exposure to spoken language in their learning contexts, whereas ESL learners are presumed to have received a substantial amount of both written and spoken input.

4 Methods

4.1 Corpus selection

The dataset utilized in this study comprised 400 written essays (200 from EFL learners and 200 from ESL learners) and 400 speech samples (200 from EFL learners and 200 from ESL learners) extracted from the International Corpus Network of Asian Learners of English (ICNALE; Ishikawa 2014). According to the corpus developer’s descriptions, the written and spoken samples were independently collected from college students in various regions of Asia, including six EFL areas, such as China, Indonesia, Japan, South Korea, Taiwan, and Thailand, and four ESL regions, such as Hong Kong, Pakistan, the Philippines, and Singapore.

Although different individuals participated in the written and spoken tasks, the written and spoken samples in ICNALE were closely matched in terms of the genre and topics (Ishikawa 2013, 2014). In highly controlled production tasks, students provided written essays of 200–300 words within a time limit of 40 min and spoken monologues within a time limit of 60 s. Each task presented students with two topics: part-time jobs for college students and banning smoking at restaurants. Each topic was accompanied by a question prompting students to provide their opinions (e.g., Do you agree or disagree with the following statements? Use reasons and specific details to support your opinion. “It is important for college students to have a part-time job”). In the writing task, each student produced one essay per topic, while in the speaking task, two speech monologues were generated per topic. To mitigate potential practice effects associated with repetition, we selected the first round of speech samples from each student for analysis. To maintain consistency in the topic, our data analysis focused on written and speech samples produced under the topic of part-time jobs for college students.

Furthermore, we controlled for participants’ English proficiency in our sample data. The ICNALE categorizes each student’s proficiency level in alignment with the Common European Framework of Reference (CEFR), classifying learners into four levels: A2 (Waystage), B1_1 (Threshold lower), B1_2 (Threshold upper), and B2 (Vantage or Higher). In our analysis, we specifically focused on data from the students in the B1_2 proficiency level, as this category covered a sufficient number of samples across both EFL and ESL groups. Table 1 provides a summary of information regarding the samples in each group.

Table 1:

Summary of sample data.

Context Modality Sample size Number of words
Mean Standard deviation Range
EFL Written 200 234.340 29.521 187–338
Spoken 200 88.035 26.815 28–165
ESL Written 200 239.445 38.108 186–444
Spoken 200 130.995 27.017 69–238

4.2 Lexical sophistication indices

The present study measured the following 14 lexical sophistication indices using TAALES (Kyle et al. 2018) version 2.2. The list of indices is presented in Table 2.

Table 2:

Lexical sophistication indices.

Index Index code
Word frequency BNC_Written_Freq_AW
BNC_Spoken_Freq_AW
Word range BNC_Written_Range_AW
BNC_Spoken_Range_AW

Academic language All_AFL_Normed

Concreteness MRC_Concreteness_AW
Imageability MRC_Imageability_AW
Meaningfulness MRC_Meaningfulness_AW
Familiarity MRC_Familiarity_CW
Age of acquisition Kuperman_AoA_AW

Bigram frequency COCA_Academic_Bigram_Frequency
COCA_spoken_Bigram_Frequency
Bigram range COCA_Academic_Bigram_Range
COCA_spoken_Bigram_Range
Bigram association strength COCA_academic_bi_DP
COCA_spoken_bi_DP

Trigram frequency COCA_Academic_Trigram_Frequency
COCA_spoken_Trigram_Frequency
Trigram range COCA_Academic_Trigram_Range
COCA_spoken_Trigram_Range
Trigram association strength COCA_academic_tri_DP
COCA_spoken_tri_DP

Word frequency and range. The word frequency and range scores were computed using either the BNC written corpus or the BNC spoken corpus. It is noteworthy that content words and function words exhibit distinct features, thus requiring separate analyses (Durrant et al. 2019). However, in our analysis, we found that word frequency and range scores for both content and function words were significantly correlated with those of all words. Consequently, we reported outcomes based on all words (see Appendix B for separate analyses for content and function words). TAALES calculates the average score of each index in a text by dividing the sum of the frequency/range scores for all words by the total number of words in the text (Kim et al. 2018; Kyle and Crossley 2015). To meet the assumption of normality, we applied logarithm transformation to the frequency and range scores (e.g., Eguchi and Kyle 2020).

Academic language. Between the two measures of academic language lists provided by TAALES, namely the AWL (Coxhead 2000) and the AFL (Simpson−Vlach and Ellis 2010), we specifically referred to the AFL list, which included formulaic sequences used in both written and spoken contexts. In our analysis, we utilized the normed scores, calculated by dividing the number of tokens of academic sequences by the total number of word tokens (Kyle and Crossley 2015).

Psycholinguistic word information. TAALES provides scores for word concreteness, imageability, meaningfulness, and familiarity, which are based on native speakers’ judgment ratings derived from the Medical Research Council Psycholinguistic Database (Coltheart 1981). Additionally, the tool offers scores for the age of acquisition based on native speaker judgment, as reported by Kuperman et al. (2012). In our analysis, we normalized the scores from each psycholinguistic index using log transformation.

N-gram frequency, range, and association strength. Using TAALES, we computed the frequency and range scores of bigrams and trigrams with reference to two subcorpora in the Corpus of Contemporary American English (COCA; Davies 2009). Since TAALES employed either the written academic or spoken section as a reference corpus, we conducted separate analyses using each reference corpus for the written and spoken data. The n-gram frequency and range scores were log-transformed for normality of the data. Regarding n-gram association strength, we relied on the Delta P scores derived from the COCA.

4.3 Data analyses

Prior to data analysis, we conducted correlation analyses to examine collinearity among indices within each category. The outcomes of these correlation analyses can be found in Appendix C. In case where we identified indices showing a strong correlation (r ≥ 0.7), we opted to retain one of them (e.g., Kyle et al. 2018). To examine the modulating effects of modality and language learning contexts on each lexical sophistication index, we constructed linear mixed-effects models using the lme4 package (Bates et al. 2015) in R version 4.2.1 (R Core Team 2022). P values were calculated using the lmerTest package (Kuznetsova et al. 2017). Each model included two fixed effects of Modality (spoken, written) and Context (EFL, ESL), alongside a random effect of participant. The fixed factors were centered around the means and deviation coded by assigning –0.5 to spoken and EFL conditions and 0.5 to written and ESL conditions. When the model revealed a significant interaction between Modality and Context, we conducted post-hoc analyses using the emmeans function in R (Lenth 2018), investigating the effect of Modality in each context group. The results of these post-hoc tests are presented with the corrected alpha level implemented via Tukey adjustments. To determine the effect size for each comparison, we calculated Cohen’s d using the eff_size function in R.

5 Results

In this section, we present model outcomes and graphs for each index by dimension, followed by our interpretation of the results. Descriptive statistics of the index scores can be found in Appendix A.

5.1 Word frequency and range

Due to strong correlations between the word frequency and range scores obtained from the written and spoken reference corpora (see Appendix C), we chose to present only the results of word frequency derived from the written reference corpus as a representative measure. The model outcomes are presented in Table 3, with a visual representation provided in Figure 1. The model revealed a significant effect of Modality, indicating higher scores in speaking than in writing. Additionally, there was a significant effect of Context, with higher scores in the EFL than the ESL group. However, no interaction was observed between the two factors, suggesting that the frequency score difference in spoken and written production remained consistent between the EFL and ESL groups. These results suggest that while both groups produced more frequent words in speaking than in writing, the EFL group generally produced higher-frequency words compared to the ESL group.

Table 3:

Model outputs for word frequency.

Index Fixed factor Estimate SE t value p value
Frequency Intercept 0.046 0.004 11.049 <0.001***
Modality −0.058 0.008 −7.040 <0.001***
Context −0.024 0.008 −2.972 0.003**
Modality × Context −0.010 0.016 −0.615 0.539
  1. ***p < 0.001; **p < 0.01.

Figure 1: 
Results of word frequency (log-transformed raw scores).
Figure 1:

Results of word frequency (log-transformed raw scores).

5.2 Academic language

As shown in Table 4, the model demonstrated a main effect of Context, indicating that the ESL group produced more academic phrases than the EFL group. This outcome suggests an enhanced ability of the ESL group to produce formulaic sequences frequently appearing in academic contexts. While there was no main effect of Modality, an interaction was found between Modality and Context. Post-hoc tests using emmeans revealed that this interaction was driven by the two groups’ distinct production patterns depending on the production mode. Specifically, the EFL group produced a greater number of academic language in written than in spoken production (b = −0.013, SE = 0.004, t = −2.960, p = 0.017, Cohen’s d = −0.296). In contrast, the ESL group exhibited the opposite pattern, producing a greater number of academic language in spoken than in written production (b = 0.014, SE = 0.004, t = 3.241, p = 0.007, Cohen’s d = 0.324). As visible in Figure 2, the academic phrases produced by the ESL group in spoken production outnumbered not only those in writing but also those produced by the EFL group in spoken and written production. These findings highlight the strong impact of L2 learning experience on the production of formulaic sequences used in academic contexts.

Table 4:

Model output for academic language.

Index Fixed factor Estimate SE t value p value
Academic language Intercept 0.051 0.002 33.517 <0.001***
Modality −0.001 0.003 −0.198 0.843
Context 0.015 0.003 4.933 <0.001***
Modality × Context −0.027 0.006 −4.385 <0.001***
  1. ***p < 0.001.

Figure 2: 
Result of academic language (normalized raw scores).
Figure 2:

Result of academic language (normalized raw scores).

5.3 Psycholinguistic word information

We excluded two indices: (a) Imageability due to its strong correlation with Concreteness and Meaningfulness and (b) Concreteness, which was strongly associated with Meaningfulness (see Appendix C). The model outcomes are presented in Table 5, followed by graphical illustrations in Figure 3. We first focus on Meaningfulness and Familiarity, which exhibited similar results, and subsequently report the result for Age of acquisition. The models for the first two indices yielded significant effects of Context and a significant interaction with Modality, indicating a differential effect of modality on the EFL and ESL groups. In light of the interaction, we conducted separate by-group comparisons, breaking down these interactions for each of the two indices.

Table 5:

Model outputs for psycholinguistic word information.

Index Fixed factor Estimate SE t value p value
Meaningfulness Intercept 5.889 0.001 4543.862 <0.001***
Modality −0.003 0.002 −1.225 0.221
Context −0.033 0.003 −12.900 <0.001***
Modality × Context 0.012 0.005 2.505 0.013*

Familiarity Intercept 6.395 0.0002 29914.231 <0.001***
Modality −0.005 0.0004 −12.596 <0.001***
Context −0.006 0.0004 −14.634 <0.001***
Modality × Context 0.002 0.001 2.702 0.003**

Age of acquisition Intercept 1.658 0.002 958.801 <0.001***
Modality 0.031 0.003 8.852 <0.001***
Context 0.045 0.003 12.966 <0.001***
Modality × Context −0.001 0.007 −0.186 0.852
  1. ***p < 0.001; **p < 0.01; *p < 0.05.

Figure 3: 
Result of psycholinguistic information (log-transformed raw scores).
Figure 3:

Result of psycholinguistic information (log-transformed raw scores).

For Meaningfulness, the EFL group had a significantly lower score in written than in spoken production (b = 0.009, SE = 0.003, p = 0.043, Cohen’s d = 0.264), while modality did not affect the ESL group’s production (b = −0.003, SE = 0.003, p = 0.802, Cohen’s d = −0.106).

For Familiarity, we found significantly lower scores in written than in spoken production for both the EFL (b = 0.007, SE = 0.001, p < 0.001, Cohen’s d = 1.082) and the ESL group (b = 0.004, SE = 0.001, p < 0.001, Cohen’s d = 0.700). However, as indicated by the effect sizes and the graph in Figure 3, the score difference between written and spoken production was greater for the EFL than the ESL group, suggesting a more pronounced impact of modality on the EFL group.

Turning to the results of Age of acquisition, the model revealed a main effect of Context, indicating significantly higher scores for the ESL than the EFL group, and a main effect of Modality, driven by significantly higher scores in written than in spoken production. Unlike the previous two indices, no significant interaction emerged between the two factors, indicating similar patterns across the two production modes for both groups. However, as shown in Figure 3, the ESL group’s score in spoken production was even higher than the EFL group’s score in written production. These results suggest the ESL group’s increased ability to produce later acquired words.

5.4 N-gram information

We observed a strong correlation between range and frequency in both bigrams and trigrams (see Appendix C). Given that frequency is a more commonly used measure, we excluded n-gram range from our analyses. Additionally, we identified strong correlations between Delta P scores derived from the written reference corpus and those from the spoken reference corpus for both bigrams and trigrams. Therefore, we only used bigram and trigram association strength derived from the written reference corpus.

Table 6 presents the model outputs for bigram information, followed by graphical illustrations of the mean scores in Figure 4. Our analyses revealed different outcomes depending on the type of reference corpus used. First, when analyzing bigram frequency using the written reference corpus, we observed a main effect of Modality and a main effect of Context, without their significant interaction. On the other hand, when analyzing bigram frequency using the spoken reference corpus, we found a main effect of Modality, a main effect of Context, and their significant interaction. Post-hoc tests using emmeans showed a significantly lower score in written than in spoken production for both the EFL group (b = 0.223, SE = 0.036, p < 0.001, Cohen’s d = 0.616) and the ESL group (b = 0.116, SE = 0.036, p = 0.008, Cohen’s d = 0.321), yet the effect size was greater for the EFL group.

Table 6:

Model outputs for bigram information.

Index Fixed Factor Estimate SE t value p value
Bigram frequency (Written reference corpus) Intercept 4.854 0.020 244.387 <0.001***
Modality 0.152 0.038 3.976 <0.001***
Context 0.227 0.040 5.703 <0.001***
Modality × Context 0.032 0.076 0.418 0.676

Bigram frequency (Spoken reference corpus) Intercept 5.175 0.013 399.940 <0.001***
Modality −0.169 0.026 −6.628 <0.001***
Context −0.097 0.026 −3.749 <0.001***
Modality × Context 0.107 0.051 2.089 0.037*

Bigram association Intercept −3.266 0.011 −309.167 <0.001***
Modality 0.174 0.021 8.249 <0.001***
Context 0.093 0.021 4.468 <0.001***
Modality × Context −0.156 0.042 −3.727 <0.001***
  1. ***p < 0.001; *p < 0.05.

Figure 4: 
Result of bigram frequency (log-transformed raw scores) for written reference corpus (left) and for spoken reference corpus (middle) and bigram association (right).
Figure 4:

Result of bigram frequency (log-transformed raw scores) for written reference corpus (left) and for spoken reference corpus (middle) and bigram association (right).

When analyzing the bigram association using the written reference corpus, we observed a main effect of Modality, with higher scores in written than in spoken production. There was also a main effect of Context, with higher scores in the ESL than the EFL group, and a significant interaction between the two factors. Post-hoc tests revealed a significantly higher score in written than in spoken production for the EFL group (b = −0.172, SE = 0.030, p < 0.001) but not for the ESL group (b = −0.016, SE = 0.030, p = 0.953). These results indicate that the EFL group produced more strongly associated bigrams in written than in spoken production, whereas the ESL group produced strongly associated bigrams equally in both written and spoken production.

Turning to the results of the trigram information, Table 7 presents the model outcomes, and Figure 5 displays visual representations. The analysis using the written reference corpus revealed a main effect of Modality, a main effect of Context, and their significant interaction. Post-hoc tests indicated that the EFL group had a significantly higher frequency score in written production than in spoken production (b = −0.336, SE = 0.053, p < 0.001, Cohen’s d = −0.641), whereas the ESL group had similar scores in both modes (b = −0.107, SE = 0.053, p = 0.174, Cohen’s d = −0.204). When using the spoken reference corpus, the model only showed a main effect of Context, with the EFL group producing more frequent trigrams than the ESL group. These results suggest that while the EFL learners tended to produce more frequent trigrams in written than spoken production, the ESL learners did not show this difference.

Table 7:

Model outputs for trigram information.

Index Fixed factor Estimate SE t value p value
Trigram frequency (Written reference corpus) Intercept 2.071 0.019 111.574 <0.001***
Modality 0.222 0.037 5.976 <0.001***
Context 0.273 0.037 7.350 <0.001***
Modality × Context −0.229 0.074 −3.084 0.002**

Trigram frequency (Spoken reference corpus) Intercept 2.928 0.024 119.982 <0.001***
Modality −0.084 0.049 −1.721 0.086
Context −0.219 0.049 −4.494 <0.001***
Modality × Context −0.169 0.098 −1.731 0.084

Trigram association Intercept −5.622 0.025 −227.133 <0.001***
Modality 0.295 0.050 5.956 <0.001***
Context 0.256 0.047 5.424 <0.001***
Modality × Context −0.159 0.094 −1.620 0.106
  1. ***p < 0.001; **p < 0.01; *p < 0.05.

Figure 5: 
Result of trigram frequency (log-transformed raw scores) for written reference corpus (left) and for spoken reference corpus (middle) and bigram association (right).
Figure 5:

Result of trigram frequency (log-transformed raw scores) for written reference corpus (left) and for spoken reference corpus (middle) and bigram association (right).

When analyzing the trigram association using the written reference corpus, we found a main effect of Modality, indicating higher scores in written than in spoken production, and a main effect of Context, with higher scores in the ESL group than the EFL group. However, there was no significant interaction between the two factors. These results suggest that while both groups produced more strongly associated trigrams in written than spoken production, the ESL group exhibited a higher propensity to produce more strongly associated trigrams than the EFL group.

6 Discussion

The purpose of the present study was to investigate the impact of production modality on L2 learners’ lexical sophistication and to explore whether this effect is modulated by L2 learning contexts. Our analysis involved written and spoken samples from proficiency-matched groups of ESL and EFL learners, assessing them across 14 lexical sophistication indices. In this section, we discuss our findings in the context of the two research questions (RQs) formulated for this study.

6.1 RQ1: Effect of modality

Building on previous findings highlighting the influence of modality on L2 production, our study explored whether this modality effect extends to L2 lexical sophistication. The results of our analyses revealed a robust effect of modality, without its interaction with the group factor, in three indices, including word frequency, age of acquisition, and trigram association. Learners in our study demonstrated a tendency to produce less frequent words, later acquired words, and more strongly associated trigrams in writing compared to speaking. These outcomes align with previous findings indicating that writing enables learners to produce more complex and accurate linguistic units compared to speaking (e.g., Ellis and Yuan 2005; Hwang et al. 2020; Kim and Hwang 2022; Kormos 2014; Vasylets et al. 2017). Our findings go further in unveiling the intimate link between production mode and one’s ability to retrieve infrequent and advanced words.

In keeping with the line of research documenting the effect of modality in L2 production, we invoke the different psycholinguistic processes underlying writing and speaking as the major explanation for the modality effect emerging in our sample. As reviewed earlier, the recursive nature of writing confers more cognitive advantages on language users compared to the linear, time-constrained process of speaking (Halliday 2002; Kellogg 1996). From this vantage point, the cognitive burden imposed on our L2 learners may have been less demanding in writing than in speaking, allowing them to produce more sophisticated words in the writing task. In contrast, due to the high cognitive demands during speaking, the learners may have relied more heavily on high-frequency items in the speaking task, which require relatively fewer memory resources for lexical retrieval, leading to reduced lexical sophistication in the speech data.

6.2 RQ2: Interaction of modality and context

Another crucial question we asked was whether the effect of modality would be influenced by L2 learning contexts. For six indices, including academic language, meaningfulness, familiarity, bigram frequency (when analyzed with the spoken reference corpus), bigram association, and trigram frequency (when analyzed with the written reference corpus), we found significant interactions between modality and L2 learning contexts.

We observed similar patterns in terms of meaningfulness, familiarity, bigram frequency, and bigram association, which showed a greater score gap between writing and speaking for the EFL compared to the ESL group. Specifically, the ESL group produced words with similar degrees of meaningfulness and bigrams with similar degrees of association in both writing and speaking, while the EFL group produced significantly less meaningful words and more strongly associated bigrams in writing than in speaking. Additionally, although both groups produced less familiar words and less frequent bigrams in writing than in speaking, this tendency was more pronounced for the EFL group. The EFL learners’ greater lexical sophistication in writing may be attributed to their increased experience with written language. As previously discussed, EFL learners have limited exposure to L2 input, mostly from textbooks, and have limited access to spoken input (Saito 2017). This imbalanced and restricted L2 experience may have resulted in fewer opportunities for EFL learners to encounter and use a variety of words in spoken language, making it more challenging to produce sophisticated words in speaking.

However, we also found an interesting pattern where the EFL group produced more frequent bigrams and trigrams in written production than in spoken production when analyzed with the written reference corpus. This pattern appears inconsistent with the group’s increased sophistication in written production for various indices, as more frequent n-grams suggest reduced sophistication. We speculate that these outcomes may be due to the EFL group’s greater exposure to written input, which includes a variety of n-grams, both frequent and infrequent. In contrast, this group may have employed less frequent n-grams in speech due to their limited experience with spoken language, which restricts them to a small number of infrequent n-grams. Further research is needed to validate this speculation by examining the specific usage of n-grams among EFL learners.

On the other hand, we observed a different pattern of interaction regarding academic language. In this case, the ESL group tended to use more academic phrases in speaking compared to writing, even outnumbering the EFL group’s usage of academic phrases in both written and spoken production. This outcome might be attributed to the ESL group’s language experience, reflecting the influence of their exposure to a significant amount of written and auditory input on a daily basis. It is widely assumed that ESL learners have substantial exposure to both written and spoken language in their everyday lives (Barrot and Gabinete 2021; Tarnopolsky 2000). In some cases, ESL learners may have greater exposure to spoken language, extensively used in various social contexts, compared to written language, which is primarily employed in classroom settings. The ESL group in our study demonstrated a higher use of academic language in spoken production, presumably because they had encountered academic language in diverse social contexts and had numerous opportunities to use it in oral communication.

6.3 Implications of the findings

The findings of the current study, demonstrating the robust effect of modality and its interaction with L2 learning experience in lexical sophistication, contribute novel evidence to the existing literature in the field of L2 lexical production. The previously attested effect of modality has primarily focused on lexical complexity measures such as the number of word tokens and types (e.g., Kormos 2014; Vasylets et al. 2017). While these indices provide useful information with respect to the breadth of L2 vocabulary knowledge (i.e., how many different words one can produce), more in-depth, finer-grained analyses of L2 lexical knowledge may require additional information, such as lexical sophistication, which informs learners’ frequency-based usage of words (Crossley et al. 2016). Our findings thus expand and refine previous work by demonstrating that the degree of L2 lexical sophistication is closely aligned with the usage-based factors related to learners’ language experience, such as modality and L2 learning context. Future research exploring the relationship between L2 learners’ lexical sophistication and the amount of input they receive will advance our understanding of the role of language input in L2 lexical development.

The present study provides empirical evidence supporting the usage-based approach to L2 acquisition (Bybee 2008; Ellis 2006). As outlined earlier, this approach places significant emphasis on the role of L2 experiences in the development of lexical sophistication. Based on this approach, we predicted that the ESL group, benefiting from frequent exposure to words in natural settings, would exhibit a higher level of lexical sophistication compared to the EFL learners, who primarily acquired vocabulary in instructional settings. Consistent with this prediction, our findings furnish evidence of the effect of learning contexts in several lexical sophistication indices. It appears that the ESL learners, through their repeated use of words in diverse contexts, enhanced their ability to process and retrieve words, leading to the use of sophisticated vocabulary during both writing and speaking tasks. In contrast, the EFL learners may have encountered challenges in efficiently entrenching vocabulary knowledge, potentially impeding their ability to produce sophisticated words. Notably, the EFL learners’ restricted experience with spoken language may have contributed to their reduced degree of lexical sophistication in spoken production compared to written production. While these results align with the usage-based approach, further research is required to generalize these findings by exploring the effect of L2 experiences on different aspects of lexical knowledge among learners with diverse L2 learning histories and backgrounds.

6.4 Limitations and suggestions for further work

We acknowledge some limitations that need to be addressed in future research. First, although we controlled for several factors influencing L2 vocabulary use, such as L2 proficiency, genre, and topics, the written and speech samples we analyzed did not come from a homogeneous group, raising the possibility that individual differences might have affected the results. Future studies should address this concern by analyzing written and spoken texts produced by the same learner group to provide a clearer confirmation of the effect of modality.

An additional concern is raised regarding the potential influence of the learners’ L1. As a reviewer pointed out, the learners within the EFL and ESL groups originated from different regions. Given this diversity, we conducted additional analyses where we included the learners’ region as a random factor in our models. However, the model outcomes consistently showed similar results across all the analyzed indices, suggesting that the learners’ region did not significantly affect the observed effects of modality and learning contexts. Nevertheless, it is important to note that this approach does not consider the specific influence of the learners’ individual L1, which can significantly influence L2 production (e.g., Lu and Ai 2015). To address this concern, future studies should incorporate information about the learners’ specific L1 and examine how this factor interacts with modality and learning contexts in shaping L2 lexical sophistication.

In addition, there is a concern regarding the notable difference in the number of words between the written and spoken data. The unbalanced data size reflects the inherently more challenging nature of speaking compared to writing. As pointed out by a reviewer, this disparity could potentially impact the validity of comparisons between these two corpora and introduce biases into the analysis. To address this concern, it is crucial to explore methods that can either balance the data or account for these differences. For instance, future research could consider providing task instructions that limit the number of words in written production, thereby aligning it more closely with the word counts in spoken production.

Another limitation is related to our selection of the ESL and EFL groups. Despite the variable nature of L2 learning experiences, our sample populations were confined to learners in EFL and ESL contexts. While we identified a robust interaction of L2 learning contexts and modality within this specific classification, these learner groups offer a limited representation of L2 learners. Therefore, future work should capture individual variation in language learning experiences by considering individuals’ L2 learning profiles such as the amount of input they receive in diverse contexts.

Furthermore, the exclusive focus on a single topic (i.e., part-time job) in our dataset may have restricted the learners’ specific vocabulary, potentially affecting the outcomes. To enhance the flexibility in vocabulary selection, future studies should involve learner data derived from diverse prompts.

In addition, we recognize a limitation in our study related to the proficiency levels of our participants. In the ICNALE data, students were categorized into different proficiency groups primarily based on their scores in a vocabulary size test. As this test assesses participants’ vocabulary sizes rather than their overall proficiency, students in the same proficiency level may exhibit similar lexical abilities, possibly obscuring variations in lexical sophistication among them. To address this issue, further research should recruit participants with their proficiency fully controlled based on standardized test scores.

7 Conclusions

The current study demonstrates a robust effect of modality and its interplay with L2 learning contexts on the assessment of L2 lexical sophistication. By going beyond traditional lexical measures, this research provides novel insights into the intricate dynamics of L2 lexical production. Moreover, our findings align with the usage-based approach, underscoring the pivotal role of L2 experiences. To advance our understanding of L2 lexical development, future investigations should address concerns related to data homogeneity, L1 influence, and disparities in sample size while exploring diverse linguistic dimensions.


Corresponding author: Hyunwoo Kim, Department of English Language and Literature, Yonsei University, Seoul, South Korea, E-mail:

References

Barrot, Jessie & Mari Karen Gabinete. 2021. Complexity, accuracy, and fluency in the argumentative writing of ESL and EFL learners. International Review of Applied Linguistics in Language Teaching 59(2). 209–232. https://doi.org/10.1515/iral-2017-0012.Search in Google Scholar

Bates, Douglas, Martin Mächler, Bolker Ben & Steve Walker. 2015. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67. 1–48. https://doi.org/10.18637/jss.v067.i01.Search in Google Scholar

Biber, Douglas, Bethany Gray & Shelley Staples. 2016. Predicting patterns of grammatical complexity across language exam task types and proficiency levels. Applied Linguistics 37. 639–668. https://doi.org/10.1093/applin/amu059.Search in Google Scholar

Bybee, Joan. 2008. Usage-based grammar and second language acquisition. In Peter Robinson & Nick C. Ellis (eds.), Handbook of cognitive linguistics and second language acquisition, 216–236. New York: Routledge.Search in Google Scholar

Choemue, Sumit & Barli Bram. 2021. Lexical richness in scientific journal articles: A comparison between ESL and EFL writers. Indonesian Journal of EFL and Linguistics 6(1). 147–167. https://doi.org/10.21462/ijefl.v6i1.349.Search in Google Scholar

Coltheart, Max. 1981. The MRC psycholinguistic database. The Quarterly Journal of Experimental Psychology Section A 33(4). 497–505. https://doi.org/10.1080/14640748108400805.Search in Google Scholar

Coxhead, Averil. 2000. A new academic word list. Tesol Quarterly 34. 213–238. https://doi.org/10.2307/3587951.Search in Google Scholar

Crossley, Scott A., Kristopher Kyle & Tom Salsbury. 2016. A usage-based investigation of L2 lexical acquisition: The role of input and output. The Modern Language Journal 100(3). 702–715. https://doi.org/10.1111/modl.12344.Search in Google Scholar

Crossley, Scott A. & Danielle S. McNamara. 2012. Predicting second language writing proficiency: The roles of cohesion and linguistic sophistication. Journal of Research in Reading 35(2). 115–135. https://doi.org/10.1111/j.1467-9817.2010.01449.x.Search in Google Scholar

Crossley, Scott A., Nicholas Subtirelu & Tom Salsbury. 2013. Frequency effects or context effects in second language word learning: What predicts early lexical production? Studies in Second Language Acquisition 35(4). 727–755. https://doi.org/10.1017/s0272263113000375.Search in Google Scholar

Davies, Mark. 2009. The 385+ million word Corpus of Contemporary American English (1990–2008+): Design, architecture, and linguistic insights. International Journal of Corpus Linguistics 14(2). 159–190. https://doi.org/10.1075/ijcl.14.2.02dav.Search in Google Scholar

Durrant, Philip, Joseph Moxley & McCallum Lee. 2019. Vocabulary sophistication in first-year composition assignments. International Journal of Corpus Linguistics 24(1). 33–66. https://doi.org/10.1075/ijcl.17052.dur.Search in Google Scholar

Eguchi, Masaki & Kristopher Kyle. 2020. Continuing to explore the multidimensional nature of lexical sophistication: The case of oral proficiency interviews. The Modern Language Journal 104(2). 381–400. https://doi.org/10.1111/modl.12637.Search in Google Scholar

Ellis, Nick C. 2002. Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition 24(2). 143–188. https://doi.org/10.1017/s0272263102002024.Search in Google Scholar

Ellis, Nick C. 2006. Cognitive perspectives on SLA: The associative–cognitive CREED. AILA Review 19. 100–121. https://doi.org/10.1075/aila.19.08ell.Search in Google Scholar

Ellis, NickC. & Stefanie Wulff. 2015. Usage-based approaches to SLA. In Bill VanPattern & Jessica Williams (eds.), Theories in second language acquisition: An introduction, 75–93. London, UK: Routledge.Search in Google Scholar

Ellis, Rod & Fangyuan Yuan. 2005. The effects of careful within-task planning on oral and written task performance. In Rod Ellis (ed.), Planning and task performance in a second language, 167–193. Amsterdam: John Benjamins.10.1075/lllt.11.11ellSearch in Google Scholar

Gollan, Tamar H., Rosa I. Montoya, Cynthia Cera & Tiffany C. Sandoval. 2008. More use almost always means a smaller frequency effect: Aging, bilingualism, and the weaker links hypothesis. Journal of Memory and Language 58. 787–814. https://doi.org/10.1016/j.jml.2007.07.001.Search in Google Scholar

Halliday, Michael Alexander Kirkwood. 2002. On grammar. London, UK: Continuum.Search in Google Scholar

Hwang, Haerim, Hyeyoung Jung & Hyunwoo Kim. 2020. Effects of written versus spoken production modalities on syntactic complexity measures in beginning-level child EFL learners. The Modern Language Journal 104(1). 267–283. https://doi.org/10.1111/modl.12626.Search in Google Scholar

Hyland, Ken. 2008. As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes 27(1). 4–21. https://doi.org/10.1016/j.esp.2007.06.001.Search in Google Scholar

Ishikawa, Shinichiro. 2013. The ICNALE and sophisticated contrastive interlanguage analysis of Asian learners of English. Learner Corpus Studies in Asia and the World 1. 91–118.Search in Google Scholar

Ishikawa, Shinichiro. 2014. Design of the ICNALE-spoken: A new database for multi-modal contrastive interlanguage analysis. Learner Corpus Studies in Asia and the World 2. 63–75.Search in Google Scholar

Kachru, Braj B. 1985. Standards, codification and sociolinguistic realism: The English language in the outer circle. In Randolph Quirk & Henry Widdowson (eds.), English in the world: Teaching and learning the language and literatures, 11–30. Cambridge: Cambridge University Press.Search in Google Scholar

Kellogg, Ronald T. 1996. A model of working memory in writing. In C. Michael Levy & Sarah Ransdell (eds.), The science of writing: Theories, methods, individual differences and applications, 57–71. Mahwah, NJ: Lawrence Erlbaum.Search in Google Scholar

Kim, Min-Kyung, Scott A. Crossley & Kristopher Kyle. 2018. Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing quality. The Modern Language Journal 102(1). 120–141. https://doi.org/10.1111/modl.12447.Search in Google Scholar

Kim, Hyunwoo & Haerim Hwang. 2022. Assessing verb-construction integration in young learners of English as a foreign language: Analyses of written and spoken production. Language Learning 72(2). 497–533. https://doi.org/10.1111/lang.12480.Search in Google Scholar

Kormos, Judit. 2014. Differences across modalities of performance. In Heidi Byrnes & Rosa M. Manchón (eds.), Task-based language learning: Insights from and for L2 writing, 193–216. Amsterdam: John Benjamins.10.1075/tblt.7.08korSearch in Google Scholar

Kuperman, Victor, Hans Stadthagen–Gonzales & Marc Brysbaert. 2012. Age-of-acquisition ratings for 30 thousand English words. Behavior Research Methods 44. 978–990. https://doi.org/10.3758/s13428-012-0210-4.Search in Google Scholar

Kuznetsova, Alexandra, Per B. Brockhoff & Rune H. B. Christensen. 2017. lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software 82. 1–26. https://doi.org/10.18637/jss.v082.i13.Search in Google Scholar

Kyle, Kristopher & Scott A. Crossley. 2015. Automatically assessing lexical sophistication: Indices, tools, findings, and application. Tesol Quarterly 49(4). 757–786. https://doi.org/10.1002/tesq.194.Search in Google Scholar

Kyle, Kristopher, Scott A. Crossley & Cynthia Berger. 2018. The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behavior Research Methods 50(3). 1030–1046. https://doi.org/10.3758/s13428-017-0924-4.Search in Google Scholar

Laufer, Batia & Paul Nation. 1995. Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics 16(3). 307–322. https://doi.org/10.1093/applin/16.3.307.Search in Google Scholar

Lenth, Russell. 2018. emmeans: Estimated marginal means, aka least-square means. R package version 1.2.4 [Computer software]. Available at: https://cran.rproject.org/package=emmeans.10.32614/CRAN.package.emmeansSearch in Google Scholar

Linnarud, Moira. 1986. Lexis in composition: A performance analysis of Swedish learners’ written English. Lund, Sweden: CWK Gleerup.Search in Google Scholar

Lu, Xiaofei. 2012. The relationship of lexical richness to the quality of ESL learners’ oral narratives. The Modern Language Journal 96(2). 190–208. https://doi.org/10.1111/j.1540-4781.2011.01232_1.x.Search in Google Scholar

Lu, Xiaofei & Haiyang Ai. 2015. Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds. Journal of Second Language Writing 29. 16–27. https://doi.org/10.1016/j.jslw.2015.06.003.Search in Google Scholar

Morris, Lori & Tom Cobb. 2004. Vocabulary profiles as predictors of the academic performance of Teaching English as a Second Language trainees. System 32(1). 75–87. https://doi.org/10.1016/j.system.2003.05.001.Search in Google Scholar

Nasseri, Maryam & Paul Thompson. 2021. Lexical density and diversity in dissertation abstracts: Revisiting English L1 vs. L2 text differences. Assessing Writing 47. 100511. https://doi.org/10.1016/j.asw.2020.100511.Search in Google Scholar

Paquot, Magali. 2019. The phraseological dimension in interlanguage complexity research. Second Language Research 35(1). 121–145. https://doi.org/10.1177/0267658317694221.Search in Google Scholar

R Core Team. 2022. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Available at: https://www.R-project.org/.Search in Google Scholar

Read, John. 2000. Assessing vocabulary. Cambridge: Cambridge University Press.10.1017/CBO9780511732942Search in Google Scholar

Saito, Kazuya. 2017. Effects of sound, vocabulary, and grammar learning aptitude on adult second language speech attainment in foreign language classrooms. Language Learning 67(3). 665–693. https://doi.org/10.1111/lang.12244.Search in Google Scholar

Salsbury, Tom, Scott A. Crossley & Danielle S. McNamara. 2011. Psycholinguistic word information in second language oral discourse. Second Language Research 27(3). 343–360. https://doi.org/10.1177/0267658310395851.Search in Google Scholar

Simpson–Vlach, Rita & Nick C. Ellis. 2010. An academic formulas list: New methods in phraseology research. Applied Linguistics 31(4). 487–512. https://doi.org/10.1093/applin/amp058.Search in Google Scholar

Tarnopolsky, Oleg. 2000. EFL teaching and EFL teachers in the global expansion of English. Working Papers in Educational Linguistics 16(2). 25–42.Search in Google Scholar

Vasylets, Olena, Roger Gilabert & Rosa M. Manchón. 2017. The effects of mode and task complexity on second language production. Language Learning 67(2). 394–430. https://doi.org/10.1111/lang.12228.Search in Google Scholar

Verspoor, Marjolijn & Norbert Schmitt. 2013. Language and the lexicon in SLA. In Peter Robinson (ed.), The Routledge encyclopedia of second language acquisition, 353–360. New York: Routledge/Taylor & Francis.Search in Google Scholar

Zhang, Chao & Shumin Kang. 2022. A comparative study on lexical and syntactic features of ESL versus EFL learners’ writing. Frontiers in Psychology 13. 1002090. https://doi.org/10.3389/fpsyg.2022.1002090.Search in Google Scholar


Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/iral-2023-0204).


Received: 2023-08-26
Accepted: 2023-11-27
Published Online: 2023-12-12
Published in Print: 2025-09-25

© 2023 the author(s), published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

  1. Frontmatter
  2. Editorial
  3. Young L2 learners in diverse instructional contexts
  4. Research Articles
  5. Impact of post-task explicit instruction on the interaction among child EFL learners in online task-based reading lessons
  6. Can we train young EFL learners to ‘notice the gap’? Exploring the relationship between metalinguistic awareness, grammar learning and the use of metalinguistic explanations in a dictogloss task
  7. Exploring self-regulated learning behaviours of young second language learners during group work
  8. Developmental trajectories of discourse features by age and learning environment
  9. Implementing an oral task in an EFL classroom with low proficient learners: a micro-evaluation
  10. Exploring teacher-student interaction in task and non-task sequences
  11. Children learning Mongolian as an additional language through the implementation of a task-based approach
  12. “Black children are gifted at learning languages – that’s why I could do TBLT”: inclusive Blackness as a pathway for TBLT innovation
  13. Regular Articles
  14. Defining competencies for training non-native Korean speaking teachers: a Q methodology approach
  15. A cross-modal analysis of lexical sophistication: EFL and ESL learners in written and spoken production
  16. Using sentence processing speed and automaticity to predict L2 performance in the productive and receptive tasks
  17. Distance-invoked difficulty as a trigger for errors in Chinese and Japanese EFL learners’ English writings
  18. Exploring Chinese university English writing teachers’ emotions in providing feedback on student writing
  19. General auditory processing, Mandarin L1 prosodic and phonological awareness, and English L2 word learning
  20. Why is L2 pragmatics still a neglected area in EFL teaching? Uncovered stories from Vietnamese EFL teachers
  21. Validation of metacognitive knowledge in vocabulary learning and its predictive effects on incidental vocabulary learning from reading
  22. Anxiety and enjoyment in oral presentations: a mixed-method study into Chinese EFL learners’ oral presentation performance
  23. The influence of language contact and ethnic identification on Chinese as a second language learners’ oral proficiency
  24. An idiodynamic study of the interconnectedness between cognitive and affective components underlying L2 willingness to communicate
  25. “I usually just rely on my intuition and go from there.” pedagogical rules and metalinguistic awareness of pre-service EFL teachers
  26. Development and validation of Questionnaire for Self-regulated Learning Writing Strategies (QSRLWS) for EFL learners
  27. Language transfer in tense acquisition: new evidence from English learning Chinese adolescents
  28. A systematic review of English-as-a-foreign-language vocabulary learning activities for primary school students
  29. Using automated indices of cohesion to explore the growth of cohesive features in L2 writing
  30. The impact of text-audio synchronized enhancement on collocation learning from reading-while-listening: an extended replication of Jung and Lee (2023)
Downloaded on 21.12.2025 from https://www.degruyterbrill.com/document/doi/10.1515/iral-2023-0204/html
Scroll to top button