Abstract
This study investigates longitudinal changes in linguistic complexity and holistic scores in EFL timed argumentative writing among beginning-level learners using Complex Dynamic Systems Theory. It also explores relationships among linguistic complexity over time and with writing scores. Over nine months, 42 Japanese EFL learners completed six timed essays. Linguistic complexity was assessed through lexical, phraseological, and syntactic features. Findings indicate linear growth in writing scores, lexical sophistication, and noun phrase (NP) complexity, alongside stable phrasal complexity and non-linear subordination. Interrelated growth patterns included connections between lexical sophistication and phrasal complexity, and phraseological complexity and subordination. Competitive dynamics were observed between lexical diversity and NP complexity, and phraseological complexity and phrasal sophistication. Linear associations emerged between writing scores and lexical sophistication or syntactic complexity, while a non-linear relationship occurred with phraseological sophistication. This study illuminates the intricate dynamics of linguistic complexity and L2 writing scores among beginning-level learners in argumentative writing.
1 Introduction
The exploration into the ongoing development of language systems among second language (L2) learners stands as one of the focal points in the field of L2 acquisition. One theoretical approach guiding this inquiry is Complex Dynamic Systems Theory (CDST), a meta-theory that aims to provide a comprehensive framework for understanding language use and language development (Han et al. 2022; Hiver et al. 2022). From the CDST perspective, L2 studies over the last two decades have demonstrated longitudinal development patterns in written and oral language production, particularly focusing on complexity, accuracy, and fluency (CAF; e.g. Bulté and Housen 2020; Duan and Shi 2021; Larsen-Freeman 2006; Pfenninger 2020; Rokoszewska 2022; Vyatkina 2012; Zheng 2016). Within the CAF framework, linguistic complexity, defined as “the complexity directly arising from the number of linguistic elements and their interrelationships” (Pallotti 2015: 117), is of importance as it can mirror the complexity inherent in a learner’s language systems (Norris and Manchón 2012). Previous CDST-oriented studies have explored the developmental patterns of linguistic complexity, including lexical and syntactic complexity in L2 writing, underscoring the dynamic and non-linear nature of growth trajectories (e.g., Bulté and Housen 2018; Duan and Shi 2021; Köylü et al. 2023; Pfenninger 2020; Rokoszewska 2022; Verspoor et al. 2008; Vyatkina 2012; Wang 2022; Zheng 2016). Prior studies have also investigated the interrelated dynamics between lexical and syntactic complexity in L2 writing (e.g., Bulté & Housen 2020; Caspi 2010; Larsen-Freeman 2006; Spoelman and Verspoor 2010; Verspoor et al. 2012; Vyatkina 2012).
Situated within the CDST and CAF frameworks and building on prior studies, the current study explores the developmental patterns of linguistic complexity in L2 timed argumentative writing produced by beginning-level Japanese English as a foreign language (EFL) learners enrolled in a university-level general English course, where minimal L2 writing instruction was offered. Specifically, students received a brief introduction to argumentative writing essays at the beginning of the semester, but no additional L2 argumentative writing instruction was provided. In addition, we examine longitudinal changes in holistic L2 writing scores, which serve as indicators of the development of L2 writing ability, broadly defined as an individual’s capacity to effectively convey written content in an L2 (e.g. Cumming et al. 2000). Furthermore, this study investigates not only the developmental relationships among linguistic complexity features (i.e. lexical, phraseological, and syntactic complexity features) but also those between linguistic complexity features and L2 holistic writing scores. Through this approach, the main goal of this study is to provide a more comprehensive understanding of the dynamic patterns of changes in L2 linguistic complexity and writing scores in beginning-level EFL argumentative writing over a nine-month duration.
2 Literature review
2.1 Dynamic patterns of linguistic complexity development in L2 writing
In essence, CDST perceives language learning as an emergent, non-linear process within a dynamic and interconnected system. Two main principles govern the CDST framework: the relational principle and the adaptive principle (Hiver et al. 2022). The relational principle assumes that language systems consist of multiple subsystems, which are interconnected and interact with each other. This means that any changes in one subsystem may lead to changes in other subsystems, and even the entire system (Larsen-Freeman 2006). According to the adaptive principle, language systems exhibit both stability and dynamic change, and this relationship is characterized by non-linear development (Hiver et al. 2022). The non-linear nature of language development reflects that changes may emerge unexpectedly, rather than occurring in a predictable, linear manner. For example, L2 learners may undergo rapid progress in using certain linguistic features, followed by periods of slower development or even regression.
A wealth of CDST-oriented L2 writing research has examined written output using the CAF framework (e.g. Bulté and Housen 2020; Larsen-Freeman 2006; Verspoor et al. 2008; Vyatkina 2012). This research specifically centers on linguistic complexity, which has been extensively explored using the CAF dimensions of linguistic performance. In the remaining text, we use the term “linguistic complexity” in an absolute sense, referring to the objective quantitative aspects of a language feature (Bulté and Housen 2012). The main components of linguistic complexity include lexical, phraseological, and syntactic complexity (e.g. Bulté and Housen 2020; Paquot 2019). In the subsections that follow, we explore how CDST-oriented research has examined lexical, phraseological, and syntactic complexity in L2 writing.
2.1.1 Lexical complexity development in L2 writing
Lexical complexity encompasses various dimensions, including, but not limited to, two key aspects: lexical diversity and lexical sophistication (Read 2000).[1] Lexical diversity involves the use of a variety of word types, while lexical sophistication pertains to the utilization of advanced vocabulary. Several measures have been proposed to quantify lexical diversity, including type-token ratio (TTR) and textual lexical diversity (MTLD; McCarthy and Jarvis 2010). Commonly employed metrics for assessing lexical sophistication include word frequency and age-of-acquisition norms (e.g. Kyle et al. 2018). These measures operate under the assumption that words with lower frequency and those acquired later are indicative of greater sophistication and difficulty (e.g. Brysbaert et al. 2018). Research consistently demonstrates a close association between lexical complexity and L2 writing. In highly rated L2 texts, there is often a broader incorporation of diverse words (e.g. Crossley and McNamara 2012), low-frequency words, and words acquired at a later age (e.g. Kim et al. (2018)).
Recent studies have employed a CDST approach to investigate the development of lexical complexity in L2 writing (e.g. Bulté and Housen 2020; Pfenninger 2020; Rokoszewska 2022; Verspoor et al. 2008; Vyatkina 2012; Zheng 2016). Zheng (2016) examined the one-year development of English lexical use in L2 argumentative writing composed by 15 Chinese university students with an upper-intermediate English proficiency level. The findings revealed that over the course of the year, there was a general increase in lexical sophistication, measured by the proportion of words beyond the most frequent 2,000 words, as well as in lexical diversity, measured by Uber’s index. Pfenninger (2020) conducted a study on the L2 (English) development of 91 children who underwent content and language integrated learning (CLIL) instruction in both German and English in Switzerland for a duration of up to eight school years (ages 5–12). In terms of lexical diversity in written essays, assessed using MTLD, the results revealed no noticeable change during the initial 5.5 years but a significant increase during the last 2.5 years. This suggests that while lexical diversity tends to plateau during the early developmental stages, it has the potential to undergo substantial growth as L2 proficiency levels advance.
2.1.2 Phraseological complexity development in L2 writing
In line with the CAF framework, there has been a growing body of research focused on exploring phraseological complexity in terms of diversity and sophistication (e.g. Paquot 2019; Siyanova-Chanturia and Spina 2020; Vandeweerd et al. 2022). Although a consensus regarding the measurement of phraseological diversity has not been reached (e.g. Vandeweerd et al. 2022), a substantial amount of research has assessed phraseological sophistication through mutual information (MI) scores (e.g. Paquot 2019). This metric gauges the co-occurrence of words, emphasizing combinations that include less commonly used words as well as words paired with a limited pool of potential partners. Accordingly, infrequent combinations of strongly associated words (e.g. artificial intelligence) are related to higher MI scores. MI scores have been found to predict L2 writing scores (e.g. Bestgen and Granger 2014; Kim et al. 2018; Paquot 2019), suggesting that higher-rated L2 texts tend to include more closely associated phraseological units attested by MI scores.
In the surveyed literature, the development of phraseological units among L2 learners from a CDST perspective has been underexplored and thus warrants further investigation (c.f. Duan and Shi 2021; Verspoor and Smiskova 2012; Zheng 2016). In one of the few studies adopting a CDST perspective, Duan and Shi (2021) investigated the use of formulaic sequences in argumentative essays written by 11 Chinese upper-intermediate level EFL learners over two and a half years. The findings of the study revealed a trend towards the increased use of formulaic sequences with higher MI scores over time, indicating that L2 upper-intermediate learners tended to employ progressively more advanced formulaic sequences as time went on.
2.1.3 Syntactic complexity development in L2 writing
Syntactic complexity refers to the sophistication of grammatical structures (Lu 2010; Wolfe-Quintero et al. 1998), and consists of four main components: global complexity, subordination, coordination, and phrasal (subclausal) complexity (Norris and Ortega 2009). Global complexity is often assessed by measures such as mean sentence length and mean length of T-units (MLT), with each T-unit consisting of a main clause along with its embedded subordinate clauses. Subordination is evaluated based on the \presence of dependent clauses, often quantified as dependent clauses per T-unit (DC/T). Coordination involves the utilization of coordinate phrases or clauses. Phrasal complexity is gauged using various metrics including mean length of clause (MLC), complex nominals per T-unit (CN/T), and mean length of noun phrases (NP). Note that MLC functions as an indicator of phrasal complexity since it signifies heightened complexity through the inclusion of phrase-level elements, including pre- or postmodification within a phrase, the use of nominalization, or the transformation of clauses into phrases to condense information (Norris and Ortega 2009: 561).
An increasing body of research has focused on investigating the development of syntactic complexity through the lens of CDST (e.g. Bulté and Housen 2018; Köylü et al. 2023; Wang 2022). Bulté and Housen (2018) examined the creative writing assignments of ten beginning-level Dutch-speaking L2 (English) learners in a secondary school spanning a two-year period. The study revealed a progressive trend in global syntactic complexity as measured by MLT, subordination indicated by the proportion of subclauses, and NP complexity as assessed through mean length of NPs. Notably, these developmental trajectories exhibited non-linear, irregular patterns with periods of advancement and regression. The study also found that phrasal complexity as measured by MLC remained stable. Köylü et al. (2023) examined syntactic complexity within the context of L2 (English) diaries produced by 26 Catalan/Spanish bilingual upper-intermediate-to-advanced learners during study abroad. The findings revealed that several measures of syntactic complexity, including MLC and DC/T, exhibited no significant changes during the semester. This might be because the informal nature of the genre (diaries) potentially failed to evoke complex sentence structures.
2.2 Interconnected dynamics of lexical and syntactic complexity in L2 writing
Although lexical, phraseological, and syntactic complexity in L2 writing display dynamic changes over time, it is important to recognize that they are intricately interconnected (e.g. Bulté and Housen 2020; Larsen-Freeman 2006). Research from a CDST perspective has explored possible interactions among various dimensions of language subsystems, uncovering four distinct longitudinal relationships (i.e., the connections between at least two variables of interest observed over an extended period of time): support (features evolving together, also called connected growers), competition (features developing in opposite directions), precursor (one feature’s growth preceding another’s), and asymmetry (changing relationships between features over time; Verspoor and van Dijk 2011).
Prior studies in the context of CDST and L2 writing have documented longitudinal connections between lexical and syntactic complexity (e.g. Bulté and Housen 2020; Caspi 2010; Larsen-Freeman 2006; Spoelman and Verspoor 2010; Verspoor et al. 2008, 2012; Vyatkina 2012). Multiple studies have reported a supportive relationship between lexical and syntactic complexity (e.g. Spoelman and Verspoor 2010; Vyatkina 2012). Spoelman and Verspoor (2010) conducted an analysis of L2 (Finnish) academic writing produced by a Dutch learner over three years. Their findings reported a supportive relationship between lexical complexity, quantified by the number of morphemes per word, and NP complexity, assessed by the number of words per NP. Vyatkina (2012) delved into a cross-sectional corpus of L2 (German) essays produced by L2 beginning-level learners, supplemented by the writings of two focal students with four-semester longitudinal data. The findings indicated a positive relationship between lexical diversity, measured using the corrected type-token ratio, and global syntactic complexity, measured by the mean length of sentence. This finding suggests that, over time, L2 learners at the beginning level tend to produce longer sentences while also incorporating a more diverse range of vocabulary, thus reflecting a pattern of connected growth.
A competitive relationship between lexical and syntactic complexity has also been documented in the study conducted by O’Leary and Steinkrauss (2022). Their research examined L2 (English) essays produced by three Dutch-speaking advanced-level university students over a span of four years. Their findings revealed a negative correlation between lexical diversity (measured using Vocd-D and the Guiraud index) and phrasal complexity (assessed through MLC and CN per sentences). This observation suggests a competitive dynamic where resources are allocated between lexical diversification and phrasal sophistication.
2.3 Improvement of L2 holistic writing scores over time
In the context of L2 writing, holistic or analytic writing scores evaluated by expert raters are important because they can be indicative of an individual’s L2 writing ability, which often hinges on “the selection of appropriate words and phrases; on facility with the conventions of grammar, punctuation, and spelling; and on the competent use of logic and rhetorical devices to sustain a reader’s attention and direction” (Cumming et al. 2000: 14). Consequently, improvements in writing scores, as determined by scoring rubrics that present “implicit assumptions … about the development of L2 writing skills” can signify the advancement of L2 writing ability (Valdes et al. 1992: 334–335).
From a CDST perspective, prior studies have explored longitudinal changes in L2 holistic (rather than analytic) writing scores (e.g. Huang et al. 2021; Lowie and Verspoor 2019). Lowie and Verspoor (2019) examined timed L2 narrative writing among 22 Dutch learners of English throughout one academic year in a CLIL setting in a secondary school, noting an increase in holistic writing scores, which was expected due to the students’ immersion in a CLIL environment. Huang et al. (2021) investigated untimed L2 writing assignments of various genres completed by 22 Chinese EFL learners at an intermediate level over an academic year at a university. The results showed a significant improvement in L2 holistic writing scores over time, even though the learners had limited exposure to English and did not receive writing-focused instruction. These collective findings suggest that repetitive writing practice can play a pivotal role in enhancing L2 writing scores.
2.4 Current study
Collectively, a growing body of CDST-oriented studies have revealed the dynamic development of lexical, phraseological, and syntactic complexity in L2 writing (e.g. Bulté and Housen 2020; Duan and Shi 2021; Larsen-Freeman 2006; Verspoor et al. 2008; Vyatkina 2012), the interrelated dynamics between lexical and syntactic complexity in L2 writing (e.g. Huang et al. 2021; Lowie and Verspoor 2019), and longitudinal changes in L2 holistic writing scores (e.g. Huang et al. 2021; Lowie and Verspoor 2019).
However, there exist at least four research gaps in our understanding of the development of linguistic complexity requiring more scrutiny. First, the scope of complexity measures used in previous studies and the scope of contexts in which linguistic complexity has been examined appear to be somewhat limited. For instance, although phraseological complexity constitutes an important component of linguistic complexity (e.g. Paquot 2019), the developmental patterns in phraseological complexity features among beginning-level L2 learners have not been extensively explored. In addition, there have not been thorough examinations of the developmental patterns of linguistic complexity in the context of argumentative essays produced by L2 beginning-level learners in a university-level general English course, an area of importance within the academic setting (e.g. Gardner and Nesi 2013) that is investigated in this study. Second, it remains uncertain whether scores in timed L2 argumentative writing, often deemed more challenging than narrative or descriptive writing, would exhibit improvement over time when produced by beginning-level L2 learners with limited L2 exposure and limited writing instruction. Third, it appears that further research into the longitudinal connections among linguistic complexity features, including the relationships of lexical and phraseological complexity with syntactic complexity, is essential to gain a comprehensive understanding of the interconnected development of language subsystems. Finally, the relationship between linguistic complexity features, which can reflect a learner’s language systems, and L2 holistic writing scores, which can reflect a learner’s L2 writing ability, over time, has not been systematically examined. Such examination may be of particular interest among writing assessment researchers and writing teachers.
To fill these gaps, the current study extends the existing research on CDST-oriented L2 writing by examining (1) the longitudinal changes in L2 lexical, phraseological, and syntactic complexity, as well as L2 holistic writing scores, (2) the relationships among lexical, phraseological, and syntactic complexity over time, and (3) the relationships of lexical, phraseological, and syntactic complexity with L2 holistic writing scores in beginning-level Japanese university EFL learners in the context of argumentative writing. The following three questions are addressed:
How do lexical, phraseological, and syntactic complexity features and holistic writing scores in EFL beginning-level learners’ argumentative writing change over two academic semesters?
How are lexical and phraseological complexity features related to syntactic complexity features in EFL beginning-level learners’ argumentative writing that is produced over time?
How are lexical, phraseological, and syntactic complexity features related to holistic writing scores in EFL beginning-level learners’ argumentative writing that is produced over time?
In addressing the research questions, it is important to note that our focus is on examining “the ‘grand sweep of development’ where global structure and similarities across participants can be seen” (Larsen-Freeman 2006: 613). We operate on the assumption that, at the group level, development over time may exhibit relatively regular trends leading towards target-like language use (Bulté and Housen 2018). While this perspective is considered essential in CDST-oriented research (e.g. Bulté & Housen 2020; Han et al. 2022), we are aware of the limitations of applying group-level statistics to individual learners (Lowie and Verspoor 2019). In addition, to address the research questions, we employed generalized additive mixed models (GAMMs), capable of modeling not only linear, but also nonlinear and more dynamic developmental trajectories.
3 Methods
3.1 Research context and participants
Data were collected from two intact English-medium classes in a business-oriented university in Japan over two semesters (nine months). The course was mandatory for students majoring in International Studies in the university. The course covered all four language skills, with a particular emphasis on grammar, listening, and reading. Its goal was to develop students’ English proficiency to reach intermediate levels. Each week, the class gathered for one meeting lasting 200 min (two 100 min sessions in a day). Due to the Covid-19 pandemic, all but one class met fully online; the university partly allowed one class to have face-to-face class meetings during the last seven weeks of the second semester.
Participants were 42 Japanese beginning-level EFL students (M = 19.10, SD = 0.55) who were majoring in International Studies at a university in Japan and consented to participate in this study. All of the participants had taken the institutional TOEIC Listening and Reading test as a requirement from the department. Their TOEIC scores ranged from 275 to 560 out of 990, with a mean of 410.56 (SD = 71.38). Given that TOEIC scores ranging from 226 to 550 are mapped onto the A2 level (beginning level) of the Common European Framework of Reference (Tannenbaum and Wylie 2008), participants were considered beginning-level English learners.[2] As for English learning backgrounds, participants learned English for 9.00 years (SD = 3.03), and most of them aimed to improve their English skills and TOEIC scores. Acknowledging that labeling L2 learners who had studied English for an average of nine years as beginners might be less appropriate, this classification was deemed unavoidable based on their TOEIC scores and their corresponding proficiency level mappings (Tannenbaum and Wylie 2008).
In relation to writing practice, most students reported that prior to taking the course, they had not learned how to write an argumentative essay. During the two semesters, writing was not a primary focus. However, the instructor placed an emphasis on the genre of argumentative writing, using prompts similar to those for the writing test of the TOEIC. This choice was motivated by the recognition of the significance of argumentative writing within the academic context of higher education, along with the potential for students to take the TOEIC writing test in the future. To facilitate this, students received concise instructions on the organization of English argumentative essays at the beginning of the first semester. It is important to note that these instructions were provided solely at the commencement of the data collection. Subsequently, students were asked to write an argumentative essay throughout approximately the two semesters.
3.2 Instruments for data collection: writing prompts
Students produced six argumentative essays in response to prompts similar to those of the TOEIC writing opinion test over a span of nine months in class. TOEIC prompts were used because the participants’ course focused on business-related English. Writing directions were taken from the TOEIC writing opinion task. Each prompt asked students to write an essay in response to a controversial statement by stating, explaining, and supporting their opinion. Students were given 30 min to write an essay. While no specified word limit was given, following the TOEIC writing test instructions, our prompt stated that a well-crafted essay should typically contain a minimum of 300 words. A total of nine prompts with three different topics (i.e., business, education, and technology) were used (see Table 1). The topics were counterbalanced by using three different prompts (one prompt from each topic) at each writing occasion to minimize potential prompt effects.
Writing prompts.
Topic | Prompt |
---|---|
Business 1 | Small companies today have a harder time being successful than those in the past. |
Business 2 | Retirement from a company should be required at the age of 65. |
Business 3 | Working for a large company has more benefits than working for a small company. |
Education 1 | Study abroad should be required for all university students. |
Education 2 | All high school students should go to college. |
Education 3 | Children should begin learning a foreign language as soon as they start school. |
Technology 1 | With the help of technology, life is easier today than in the past. |
Technology 2 | Social Networking Services (Facebook, Instagram, Line) have a negative impact on people’s social life. |
Technology 3 | People are less creative now than in the past because of technology. |
3.3 Data collection procedure
As shown in Table 2, using the first week of the first semester as a starting time point, a total of six essays were collected on the second, fifth, tenth, fifteenth, twenty-fourth, and thirty-eighth weeks. Across the six writing occasions, participants responded to six different prompts. Between the first and second writing occasions, the instructor gave a 40 min in-class lecture on how to structure argumentative essays in English because many of the students had not learned how to write essays before. A total of 259 argumentative essays were collected. We removed 11 essays that did not meet a minimum requirement of 50 words needed to calculate lexical diversity measures (Zenker and Kyle 2021). Then, the number of essays that each participant produced was counted. Two students produced only three essays, which we considered insufficient to track changes over time, and the essays were thus removed. Also, one essay was off topic and was consequently removed. The remaining 241 essays produced by 42 students were included in the analysis. On average, students produced 5.74 essays (SD = 0.59), ranging from four to six. The number of essays produced on each occasion is provided in Table 2. All the essays were written on computers.
Number of essays produced on each occasion.
Writing order (week) | Writing 1 (week 2) | Writing 2 (week 5) | Writing 3 (week 10) | Writing 4 (week 15) | Writing 5 (week 24) | Writing 6 (week 38) |
---|---|---|---|---|---|---|
Number of essays | 38 | 41 | 41 | 41 | 38 | 42 |
Number of essays for each prompt | Business 1 (n = 12) | Business 2 (n = 14) | Business 3 (n = 11) | Business 2 (n = 14) | Business 3 (n = 14) | Business 3 (n = 15) |
Education 1 (n = 14) | Education 2 (n = 12) | Education 3 (n = 15) | Education 2 (n = 15) | Education 3 (n = 12) | Education 3 (n = 15) | |
Technology 1 (n = 12) | Technology 2 (n = 15) | Technology 3 (n = 15) | Technology 2 (n = 12) | Technology 3 (n = 12) | Technology 3 (n = 12) |
3.4 Data analysis
3.4.1 Essay scoring
The essays were scored by two raters: One rater holds a PhD degree in applied linguistics, and the other was an MA student in applied linguistics. Both raters had various experience teaching English as a second or foreign language, including English writing. The rubric was holistic and was concerned with four main aspects of writing: task fulfilment, organization with the appropriate use of supporting details, coherence, and language use. Scores ranged from zero (lowest) to five (highest).[3] Prior to rating the essays, the raters had a training session. After the training section, the two raters independently evaluated the essays collected for this study without knowing the order in which the essays were written. When score differences between the two raters were higher than one point, the raters adjudicated the final score through discussion. Using the adjudicated scores for 242 essays, inter-rater reliability was acceptable (Cohen’s Kappa = 0.792). Average holistic scores between the raters for each essay were used for the data analysis.
3.4.2 Linguistic complexity measures
We used six linguistic complexity measures. Each index is described below (see Table 3 for a summary of the measures).
Linguistic complexity measures used in this study.
Complexity construct | Sub-construct | Measure |
---|---|---|
Lexical complexity | Lexical diversity | Measure of textual lexical diversity (MTLD) |
Lexical complexity | Lexical sophistication | CW age of acquisition (no prompt words) |
Phraseological complexity | Phraseological sophistication | Bigram mutual information (MI) |
Syntactic complexity | Noun phrasal complexity | Complex nominals per t-unit (CN/T) |
Syntactic complexity | Phrasal complexity | Mean length of clause (MLC) |
Syntactic complexity | Subordination complexity | Dependent clauses per t-unit (DC/T) |
For lexical complexity, we focused on lexical diversity and sophistication measures. Lexical diversity was measured using the MTLD (measured by the mean length of sequential words in a text that maintain a predetermined TTR value, 0.72; McCarthy and Jarvis 2010). MTLD was chosen because it tends not to be influenced by text length with texts containing a minimum of 50 tokens (Zenker and Kyle 2021) and has been widely used in L2 research (e.g. Pfenninger 2020). MTLD was calculated using the automatic analysis of lexical diversity (TAALED; version 1.4.1; Kyle et al. 2021). Lexical sophistication for content words (CWs) was measured using age-of-acquisition scores because they are widely used as a proxy measure for word difficulty (e.g. Brysbaert et al. 2018). Lexical sophistication for function words was not considered because function word use is more related to grammatical-syntactic functions (Read 2000). In addition, CW lemmas (i.e., base forms of words, such as go for went and gone) were used because we were not interested in the use of inflections. Furthermore, to control for lexical use influenced by prompt wording, CWs (lemmas) shown in a prompt were removed in each text written for that prompt, using R. Age-of-acquisition scores were computed based on Kuperman et al. (2012) in which more than 30,000 English words were assessed. Age-of-acquisition scores are displayed as mean ages in years (mostly ranging from ages five to 14) at which native speakers of English thought that they had acquired the word. The age-of-acquisition scores for each sample were computed utilizing the tool for the automatic analysis of lexical sophistication (TAALES; version 2.2; Kyle et al. 2018). This involves calculating the average age-of-acquisition score by dividing the total age-of-acquisition scores for the lemmas in a sample by the number of lemmas in that sample that were assessed. Lemmas not found in the Kuperman et al.’s (2012) age-of-acquisition word list (e.g. misspelled words) were not counted toward the mean scores.
For phraseological complexity, we focused on phraseological sophistication, which was operationalized using bigram MI scores (Paquot 2019). We included all bigrams irrespective of the parts of speech of individual words, following previous studies (Bestgen and Granger 2014; Saito and Liu 2021). MI-based scores were chosen because they have been reported to be useful in distinguishing learners in different L2 writing proficiency levels (Kim et al. 2018; Paquot 2019). MI scores were computed by taking the logarithm of the observed co-occurrence of two items and divining it by the expected co-occurrence of those two items (Kyle et al. 2018: 1036 for an equation for calculating MI scores). We used MI scores calculated based on the academic subsections of the Corpus of Contemporary American English (COCA; Davies 2008), which were provided in the TAALES output (Kyle et al. 2018). Considering differences between spoken and written discourses, we chose the academic subsection, which is more closely related to L2 learners’ argumentative essays (which are part of academic writing). MI scores typically emphasize the significance of less common items, as n-grams containing lower-frequency words tend to yield higher MI scores. For example, using MI scores calculated based on the COCA academic subsection, an MI score of emotionally disturbed is 9.00, while that of only as is 0.25. The MI scores for each text were computed as the average score, obtained by dividing the total MI scores for the n-grams in a text by the number of n-grams in that text. N-grams not found in the TAALES n-gram list were not counted towards the mean scores.
Syntactic complexity was operationalized using three syntactic measures computed by the syntactic complexity analyzer (SCA; Lu 2010): complex nominals per t-unit (CN/T), mean length of clause (MLC), and dependent clauses per t-unit (DC/T). Rationales for choosing these variables were based on previous studies, such that these measures were developed based on L2 writing development studies (Wolfe-Quintero et al. 1998) and are considered important in assessing L2 writing (Bulté and Housen 2018; Norris and Ortega 2009). In addition, we included both phrase-level (i.e. MLC and CN/T) and clause-level (i.e. DC/T) complexity measures (e.g. Bulté and Housen 2018). CN/T was chosen for proxy measures of noun phrase complexity as the use of complex noun phrases is favored in academic writing (Biber et al. 2011). These measures were calculated using an R implementation version of the SCA (Gaillat and Ballier 2019).
3.4.3 Statistical analysis
To answer the research questions, we constructed generalized additive mixed models (GAMMs) using the mgcv package (Wood 2006) in R. While the use of GAMMs is a relatively new method in applied linguistics (Pfenninger 2020), we used GAMMs for two main reasons. First, GAMMs can model both linear and nonlinear developmental trajectories. Following the principles of CDST, which presumes that language development is nonlinear and variable, it was crucial to model nonlinear trajectories. Second, GAMMs include random effects, such as participants, which allows for fitting differing smooths to the trajectories of the participants. This is also related to CDST, which assumes that trajectories for each learner are different (inter-individual variation).
To answer research question 1 (how holistic writing scores and lexical, phraseological, and syntactic complexity features changed over time), we created a total of seven GAMMs. Each GAMM shared a common fixed-effects variable, i.e., time in weeks (treated as a continuous variable), while having different response variables. Among these GAMMs, one was formulated with holistic writing scores as the response variable, while the remaining six were tailored to include each of the six chosen linguistic complexity features as their respective response variables. In each GAMM, random effects were participants.
To answer research question 2 (how lexical and phraseological complexity features were related to syntactic complexity features), we formulated three GAMMs. Within each GAMM, the fixed-effects variables consisted of three lexical and phraseological features (i.e., MTLD, CW age-of-acquisition, and bigram MI). The response variable in each GAMM was one of the chosen syntactic complexity features (i.e., CN/T, MLC, and DC/T).
To answer research question 3 (how lexical, phraseological, and syntactic complexity features were related to holistic writing scores), a set of six GAMMs was developed. Each GAMM shared a common response variable, namely holistic writing scores. Within this framework, the six GAMMs were structured, each utilizing one of the six chosen linguistic complexity features as its respective fixed-effect variable. Random effects were participants.
To illustrate the creation of GAMMs in general, the following R code was utilized for model specification, employing the mgcv package in R:
In this R code, a fixed-effect variable was wrapped in the smooth function, s(), which allows for estimating a nonlinear effect. For each model, random effects (participants) were estimated using bs (standing for smoothing basis) and re (standing for random effects). The specification of a random intercept was s(Participant, bs = “re”), while that of a random smooth for the fixed-effects variable per participant was s(Fixed effect, Participant, bs = “re”). Random smooths considered that individual participants may have idiosyncratic trajectories. We used restricted maximum likelihood (REML) to estimate smooth parameters (Wood 2006).
The wiggliness of a GAMM is determined by two main parameters: a smoothing parameter and the number of basis functions (k) in a smooth. The degree of wiggliness in GAMs is primarily determined by the smoothing parameter. In contrast, setting k serves to establish an upper limit on the effective degrees of freedom (EDF), and its specific value has a minimal impact on the fitted model (Wood 2006). The maximum allowable value for k is equal to the number of data points, potentially leading to an increase in the degree of wiggliness with a greater number of data points.[4]
Smoothing parameters, which determine the degree of wiggliness, were automatically estimated by the mgcv package (Wood 2006). These parameters test the significance of the smooth through the effective degrees of freedom (EDF) value along with the corresponding F- and p-values. An EDF value of one indicates a straight line, and higher EDF values indicate wigglier curves. A significant smooth term (p < 0.05) indicates that a horizontal line through the 95 % confidence interval is not drawn. In addition, the value of k (representing basis dimensions) also indicates the wiggliness with higher k values indicating greater wiggliness.
For research question 1, the value of k was set to six, the maximum possible value given the amount of our data (i.e., in GAMMs, the fixed-effects variable had six data points of time). For research questions 2 and 3, since GAMMs were created with fixed-effects variables having wider ranges of values, the value of k was automatically estimated by the mgcv package. To check whether the k value was enough to indicate wiggliness, the function of gam.check() in the mgcv R package was run on GAMMs.[5]
This function shows a k-index along with its p-value for each spline with significant p-values indicating that k is too low. Assumptions about residual independence (non-autocorrelation) were tested using the Durbin-Watson test (Field 2009). Model assumptions related to residuals (i.e., normality of the residuals and homoscedasticity) were checked through visual inspections.
4 Results
Table 4 shows descriptive statistics for writing scores, essay lengths, and linguistic complexity features over time. We provided means and SDs for essay lengths to present the overview of the data set. For the GAMMs created to answer all of the research questions, note that (1) the number of basis dimensions (k) set for each GAMM was tested and considered sufficient to indicate wiggliness; and (2) assumptions for each GAMM (i.e., non-autocorrelations, normality of the residuals, and homoscedasticity) were met. In addition, no concurvity was observed in the GAMMs created to answer research question 2.
Means (SDs) of writing scores, essay lengths, and linguistic complexity variables over time.
Variable | Writing 1 | Writing 2 | Writing 3 | Writing 4 | Writing 5 | Writing 6 |
---|---|---|---|---|---|---|
Writing score | 2.49 (0.69) | 2.80 (0.71) | 2.60 (0.63) | 2.89 (0.84) | 3.16 (0.93) | 3.26 (0.9) |
Essay length | 97.13 (37.34) | 130.22 (41.77) | 128.39 (44.36) | 135.27 (41.26) | 107.97 (34.95) | 136.69 (39.15) |
MTLD | 50.29 (16.05) | 48.73 (13.21) | 47.32 (14.96) | 52.96 (15.62) | 49.61 (13.45) | 46.15 (15.51) |
CW age of acquisition | 5.57 (0.43) | 5.75 (0.44) | 5.73 (0.35) | 5.83 (0.34) | 5.81 (0.41) | 5.87 (0.39) |
Bigram MI | 1.70 (0.26) | 1.74 (0.21) | 1.67 (0.27) | 1.75 (0.23) | 1.78 (0.27) | 1.78 (0.27) |
CN/T | 0.95 (0.37) | 1.16 (0.42) | 1.27 (0.6) | 1.12 (0.36) | 1.37 (0.55) | 1.42 (0.54) |
MLC | 7.18 (1.26) | 7.01 (0.92) | 6.78 (1.01) | 6.9 (0.96) | 6.86 (0.93) | 7.25 (1.18) |
DC/T | 0.38 (0.22) | 0.53 (0.28) | 0.59 (0.26) | 0.57 (0.25) | 0.56 (0.26) | 0.57 (0.30) |
4.1 Changes in linguistic complexity features and holistic writing scores
Research question 1 addressed how holistic L2 writing scores and lexical, phraseological, and syntactic complexity features in EFL beginning-level learners’ argumentative writing changed over time. The results of the six GAMMs, in each of which a fixed-effect variable was writing score or one of linguistic complexity features and a response variable was time in weeks, are presented in Table 5. To illustrate, Table 5 first presents a fitted intercept for the response variable. Next, estimates for three terms are presented: (1) the fixed effects, s(fixed effect); (2) the random intercepts, s(participant); and (3) the random smooths, s(fixed effect, participant). For each of these terms, each model provided an EDF value along with the maximum degrees of freedom (in the Ref.df column), which were used to calculate the F- and p-value. The two most important pieces of information to answer research question 1 (as well as research questions 2 and 3) were the EDF and its p-value for the fixed effects (which were shown in bold in Table 5). This was because EDF values indicated whether the fixed effects were linear (EDF = 1.00) or curvy (EDF > 1.00), and p-values showed whether the estimated effects (line or curve) were statistically significant. On the other hand, the estimates of random intercepts and smooths indicate the variance of random effects. The significant p-values for random effects indicate that there is evidence in support of the random effects. However, we do not further discuss random effects because we included them to consider inter-individual variation in models, but whether random effects were evident or not was beyond the scope of this study.
Results of seven GAMMs to track changes in writing scores and linguistic complexity features.
Response variable | Fixed- effect variable | Intercept | s (Fixed effect) | s (Participant) | s (Fixed effect, participant) | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
EDF | Ref.df | F | EDF | Ref.df | F | EDF | Ref.df | F | |||
Writing score | Time | 2.86 | 1.00 | 1.00 | 25.84 *** | 25.43 | 41.00 | 3.90*** | 18.99 | 41.00 | 2.45** |
MTLD | Time | 49.06 | 1.30 | 1.53 | 1.03 | 14.19 | 41.00 | 0.58* | 5.96 | 41.00 | 0.23 |
CW age of acquisition | Time | 5.76 | 1.57 | 1.91 | 5.22 ** | 22.87 | 41.00 | 1.30*** | 2.10 | 41.00 | 0.06 |
Bigram MI | Time | 1.74 | 1.00 | 1.00 | 2.20 | 12.07 | 41.00 | 0.49 | 15.94 | 41.00 | 0.77* |
CN/T | Time | 1.21 | 1.00 | 1.00 | 16.75 *** | 0.01 | 41.00 | 0.51 | 11.42 | 41.00 | 0.39 |
MLC | Time | 7.00 | 1.95 | 2.36 | 2.30 | 24.06 | 41.00 | 2.26*** | 14.13 | 41.00 | 0.94 |
DC/T | Time | 0.53 | 3.48 | 4.01 | 3.93 ** | 8.97 | 41.00 | 0.38 | 19.67 | 41.00 | 1.18** |
-
Note. EDF = effective degrees of freedom; Ref.df = reference number of degrees of freedom; ***p < 0.001; **p < 0.01,*p < 0.05. The fixed-effects variables are shown in bold.
Based on the GAMM results and the corresponding visualizations for research question 1 (see Figure 1), the following observations were made. Writing scores significantly increased over time in a linear manner (EDF = 1.00, Ref.df = 1.00, F = 25.84, p < 0.001). In terms of lexical complexity, no longitudinal change was found in MTLD (EDF = 1.30, Ref.df = 1.53, F = 1.03, p > 0.05), while CW age-of-acquisition scores increased significantly in a quite linear manner (EDF = 1.57, Ref.df = 1.91, F = 5.22, p < 0.01). For phraseological complexity, bigram MI scores showed no significant change (EDF = 1.00, Ref.df = 1.00, F = 2.20, p > 0.05). With respect to syntactic complexity, CN/T significantly increased over time in a linear manner (EDF = 1.00, Ref.df = 1.00, F = 16.75, p < 0.001); MLC showed no significant change (EDF = 1.95, Ref.df = 2.36, F = 2.30, p > 0.05); and DC/T showed significant non-linear changes (EDF = 3.48, Ref.df = 4.01, F = 3.93, p < 0.01), such that DC/T tended to increase until week 10 (Writing 3) and stagnate afterwards.

GAMM plots for showing changes in writing scores and linguistic complexity features. Notes. Red lines indicate significant fitted lines or curves of time effects, while red shades represent their corresponding confidence intervals. Dashed blue lines connect mean scores for each time point. Thin colored lines connect each participant’s scores across time.
To illustrate the linear increases in writing scores, Figure 2 presents writing samples from one student (Student A) alongside their corresponding linguistic complexity and holistic L2 writing scores across three occasions: the first essay, second essay, and seventh essay. In the initial writing session, limited content development and language use were observed, resulting in a holistic score of one. Following a brief instruction on English essays between the first and second writing occasions, the second essay showed enhanced organization by incorporating three reasons to support the ideas, although content development and language use remained limited. Consequently, the second essay received a holistic score of two. The seventh essay exhibited progress in content, language use, and organization, resulting in a holistic writing score of four. Hence, it appears that the student improved their writing ability, evident in the increasing holistic writing scores over time. Notably, this improvement occurred despite the L2 courses taken, which did not specifically emphasize L2 argumentative writing.

Student A’s writing samples with linguistic complexity and writing scores over time.
4.2 Relationships between lexical and phraseological complexity features and syntactic complexity features
Research question 2 addressed how lexical and phraseological complexity features were related to syntactic complexity features in EFL beginning-level learners’ argumentative writing. Three GAMMs (in which one of the three syntactic features [CN/T, MLC, and DC/T] was the response variable) were created. Fixed-effect variables were MTLD, CW age of acquisition, and bigram MI. The results of the GAMM in which a response variable was CN/T are presented in Table 6 and Figure 3. Results indicated that CN/T was significantly related to MLTD in a quite linear manner (EDF = 1.18, Ref.df = 1.33, F = 7.15, p < 0.01), such that higher MTLD scores tended to relate to lower CN/T. Results also indicated that CN/T was significantly related to CW age-of-acquisition scores in a non-linear manner (EDF = 2.39, Ref.df = 3.03, F = 4.99, p < 0.01), such that until CW age-of-acquisition scores reached around 5.7, they were not associated with CN/T, but when CW age-of-acquisition scores reached around 5.7 or higher, higher CW age-of-acquisition scores tended to relate to greater CN/T. On the other hand, CN/T was not related to bigram MI scores (EDF = 1.58, Ref.df = 1.98, F = 0.58, p > 0.05).
Results of the GAMM to predict CN/T using lexical and phraseological features.
Smooth terms | EDF | Ref.df | F |
---|---|---|---|
Fixed effects | |||
|
|||
s(MTLD) | 1.18 | 1.33 | 7.15 ** |
s(CW age of acquisition) | 2.39 | 3.03 | 4.99 ** |
s(Bigram MI) | 1.58 | 1.98 | 0.58 |
|
|||
Random effects | |||
|
|||
s(participant) | 0.01 | 41.00 | 0.49 |
s(MTLD, participant) | 0.01 | 41.00 | 0.70 |
s(CW age of acquisition, participant) | 0.01 | 41.00 | 0.49 |
s(Bigram MI, participant) | 5.78 | 41.00 | 0.23 |
-
Note. Intercept = 1.21; EDF = effective degrees of freedom; Ref.df = reference number of degrees of freedom; **p < 0.01. The fixed effects are shown in bold.

GAMM plots to predict CN/T using lexical and phraseological complexity features. Notes. Red lines indicate significant fitted lines or curves of the fixed effects, while red shades represent their corresponding confidence intervals. Thin colored lines connect each participant’s scores across time.
The results of the GAMM whose response variable was MLC are presented in Table 7 and Figure 4. Results indicated that MLC was not related to MLTD (EDF = 1.00, Ref.df = 1.00, F = 2.25, p > 0.05). On the other hand, MLC was significantly related to CW age-of-acquisition scores in a linear manner (EDF = 1.00, Ref.df = 1.00, F = 10.41, p < 0.01), such that higher CW age-of-acquisition scores tended to relate to longer MLC. MLC was also significantly related to bigram MI scores in a linear manner (EDF = 1.00, Ref.df = 1.00, F = 14.78, p < 0.01), but inversely, such that higher MI scores tended to relate to shorter MLC.
Results of the GAMM to predict MLC using lexical and phraseological features.
Smooth terms | EDF | Ref.df | F |
---|---|---|---|
Fixed effects | |||
|
|||
s(MTLD) | 1.00 | 1.00 | 2.25 |
s(CW age of acquisition) | 1.00 | 1.00 | 10.41 ** |
s(Bigram MI) | 1.00 | 1.00 | 14.78 *** |
|
|||
Random effects | |||
|
|||
s(participant) | 25.04 | 41.00 | 1.57*** |
s(MTLD, participant) | 0.01 | 41.00 | 0.49 |
s(CW age of acquisition, participant) | 0.01 | 41.00 | 0.54 |
s(Bigram MI, participant) | 0.01 | 41.00 | 0.62 |
-
Note. Intercept = 6.99; EDF = effective degrees of freedom; Ref.df = reference number of degrees of freedom; **p < 0.01, ***p < 0.001. The fixed-effects are shown in bold.

GAMM plots to predict MLC using lexical and phraseological complexity features. Notes. Red lines indicate significant fitted lines or curves of the fixed effects, while red shades represent their corresponding confidence intervals. Thin colored lines connect each participant’s scores across time.
The results of the GAMM in which a response variable was DC/T are presented in Table 8 and Figure 5. Results indicated that DC/T was not related to either MTLD (EDF = 1.00, Ref.df = 1.00, F = 1.12, p > 0.05) or CW age of acquisition (EDF = 1.00, Ref.df = 1.00, F = 0.00, p > 0.05). Results, however, indicated that DC/T was significantly related to bigram MI scores in a non-linear manner (EDF = 2.30, Ref.df = 2.89, F = 4.91, p < 0.01), such that greater DC/T tended to relate to higher bigram MI scores until MI scores reached around two, but afterwards, as MI scores increased, DC/T tended to decrease slightly.
Results of the GAMM to predict DC/T using lexical and phraseological features.
Smooth terms | EDF | Ref.df | F |
---|---|---|---|
Fixed effects | |||
|
|||
s(MTLD) | 1.00 | 1.00 | 1.12 |
s(CW age of acquisition) | 1.00 | 1.00 | 0.00 |
s(Bigram MI) | 2.30 | 2.89 | 4.91 ** |
|
|||
Random effects | |||
|
|||
s(participant) | 0.01 | 41.00 | 0.01 |
s(MTLD, participant) | 0.01 | 41.00 | 0.01 |
s(CW age of acquisition, participant) | 0.01 | 41.00 | 0.01 |
s(Bigram MI, participant) | 19.84 | 41.00 | 0.97*** |
-
Note. Intercept = 6.99; EDF = effective degrees of freedom; Ref.df = reference number of degrees of freedom; **p < 0.01, ***p < 0.001. The fixed effects are shown in bold.

GAMM plots to predict DC/T using lexical and phraseological complexity features. Notes. Red lines indicate significant fitted lines or curves of the fixed effects, while red shades represent their corresponding confidence intervals. Thin colored lines connect each participant’s scores across time.
4.3 Relationships between linguistic complexity features and writing scores
Research question 3 addressed how lexical, phraseological, and syntactic complexity features were related to writing scores in EFL beginning-level learners’ argumentative writing. The results of the six GAMMs, in each of which a response variable was writing score and a fixed-effect variable was one of the six lexical, phraseological, and syntactic complexity features, are presented in Table 9 and Figure 6. With respect to lexical complexity, writing scores were not related to MTLD (EDF = 1.00, Ref.df = 1.00, F = 0.85, p > 0.05). On the other hand, writing scores were significantly related to CW age-of-acquisition scores in a slightly non-linear manner (EDF = 1.95, Ref.df = 2.47, F = 3.61, p < 0.05), such that overall, writing scores tended to increase as CW age-of-acquisition scores increased, but the degree of increases in writing scores became slightly lessened after CW age-of-acquisition scores reached around 5.70. For phraseological complexity, bigram MI scores were significantly associated with writing scores in a non-linear manner (EDF = 3.30, Ref.df = 4.13, F = 5.03, p < 0.001), such that writing scores tended to increase until bigram MI scores reached around 1.80 but decrease afterwards. In terms of syntactic complexity, writing scores were significantly related to CN/T (EDF = 1.00, Ref.df = 1.00, F = 20.45, p < 0.001) and MLC (EDF = 1.00, Ref.df = 1.00, F = 7.84, p < 0.01), respectively, in a linear manner, such that as CN/T and MLC increased, writing scores also tended to increase. Writing scores were also significantly related to DC/T (EDF = 2.54, Ref.df = 3.29, F = 2.47, p < 0.05), but in a non-linear manner, such that writing scores tended to increase until DC/T reached around 0.40, stagnate when DC/T was between around 0.40 and 0.70, and increase again after DC/T reached around 0.70.
Results of six GAMMs to predict writing scores using linguistic complexity features.
Response variable | Fixed-effect variable | Intercept | s(Fixed effect) | s(participant) | s(Fixed effect, participant) | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
EDF | Ref.df | F | EDF | Ref.df | F | EDF | Ref.df | F | |||
Writing score | MTLD | 2.86 | 1.00 | 1.00 | 0.85 | 18.14 | 41.00 | 1.67** | 12.05 | 41.00 | 1.04 |
Writing score | CW age of acquisition | 2.86 | 1.95 | 2.47 | 3.61 * | 0.01 | 41.00 | 0.01 | 27.71 | 41.00 | 2.16*** |
Writing score | Bigram MI | 2.87 | 3.30 | 4.13 | 5.03 *** | 26.78 | 41.00 | 1.93*** | 0.01 | 41.00 | 0.01 |
Writing score | CN/T | 2.86 | 1.00 | 1.00 | 20.45 *** | 22.53 | 41.00 | 2.02*** | 7.65 | 41.00 | 0.17 |
Writing score | MLC | 2.86 | 1.00 | 1.00 | 7.84 ** | 27.46 | 41.00 | 2.06*** | 0.01 | 41.00 | 0.01 |
Writing score | DC/T | 2.85 | 2.54 | 3.29 | 2.47 * | 16.62 | 41.00 | 1.06* | 11.41 | 41.00 | 0.74 |
-
Note. EDF = effective degrees of freedom; Ref.df = reference number of degrees of freedom; ***p < 0.001; **p < 0.01, *p < 0.05. The fixed effects are shown in bold.

GAMM plots to predict writing scores using linguistic complexity features. Notes. Red lines indicate significant fitted lines or curves of the fixed effects, while red shades represent their corresponding confidence intervals. Thin colored lines connect each participant’s scores across time.
5 Discussion
5.1 Changes in linguistic complexity and writing scores over time
The first research question aimed to understand the changes in lexical, phraseological, and syntactic complexity features and writing scores in the argumentative writing of EFL beginning-level learners over two academic semesters (nine months). Despite the considerable intra- and inter-variability and the fluctuations in the changing patterns of linguistic complexity measures, our primary aim was to identify shared developmental patterns among L2 beginning-level learners. Overall, significant changes were observed in holistic writing scores, CW age-of-acquisition scores, CN/T, and DC/T, while no changing patterns were noted in MTLD, bigram MI scores, or MLC.
The observation that L2 holistic scores for argumentative writing tended to increase over time, signifying the progression of their L2 writing ability (e.g. Cumming et al. 2000; Valdes et al. 1992), is noteworthy, given that this study was conducted in a general English class with limited instruction about argumentative writing over time. It is possible that, through repetitive engagement in the same genre of writing, L2 learners may have become more familiar with the task and gained more knowledge in English words and structures in class, thereby challenging their output to produce more well-written essays with more logical arguments and more complex linguistic features. This could explain the observed increases in holistic scores for argumentative writing, broadening the scope beyond prior studies that focused on improvements in a different genre, such as L2 narrative writing, achieved through repetitive practice (e.g. Lowie and Verspoor 2019).
Regarding lexical complexity, L2 beginning-level students displayed improvements in their lexical sophistication, as indicated by increases in CW age-of-acquisition scores. This observation aligns with prior CDST-oriented research (e.g. Verspoor et al. 2008; Zheng 2016), suggesting that L2 beginning-level learners tended to use more sophisticated words as they produced L2 argumentative writing over time. It is possible that L2 learners aimed to convey more intricate ideas in their arguments by employing more sophisticated words, although further research is needed to substantiate this hypothesis. On the contrary, L2 beginning-level students did not exhibit improvement in their lexical diversity, as evidenced by the absence of changes in MTLD over time. This finding contrasts with Zheng (2016), who reported a general increase in lexical diversity over one year in L2 intermediate-level learners’ argumentative writing. One plausible explanation for this divergence is that L2 beginning-level learners might not prioritize diversifying their vocabulary in their texts due to their focus on other aspects of writing and their limited L2 lexicon. As their proficiency advances to an intermediate level, L2 learners may become more capable of incorporating a wider range of words over time, as observed in Zheng (2016) and Pfenninger (2020). Therefore, potentially, the notion that L2 beginning-level learners are “especially busy learning words” (Verspoor et al. 2012: 257) could be more specifically framed as them being occupied with complexifying their lexicon rather than diversifying it.
In terms of phraseological complexity, as measured by bigram MI scores, there was no improvement observed. This suggests that L2 beginning-level learners did not increasingly employ more strongly associated bigrams. This finding complements earlier research indicating that the development of phraseological complexity tends to occur in at least intermediate-level learners (Duan and Shi 2021; Vandeweerd et al. 2022). Therefore, given that the “growth of phraseological sophistication is very gradual” (Vandeweerd et al. 2022: 18), it is possible that L2 beginning-level learners may not yet be developmentally prepared to notice and use more sophisticated phraseological units.
In terms of syntactic complexity, three measures employed to track the development of syntactic complexity, namely CN/T, MLC, and DC/T, exhibited distinct developmental trajectories over time. Firstly, CN/T, which assesses NP complexity, a crucial aspect of academic writing (Biber et al. 2020), demonstrated a linear increase over time. This observation of a progressive trend in NP complexity aligns with Bulté and Housen (2018), but not with Wang (2022) where changes were not observed. It is worth noting that while Bulté and Housen (2018), Wang (2022), and the present study focused on L2 beginning-level learners’ writing, they differed in terms of writing genres. Bulté and Housen (2018) and the present study delved into school-based writing genres, specifically creative writing and argumentative writing, respectively. These genres fall within the domain of formal academic writing (Gardner and Nesi 2013). In contrast, Wang (2022) investigated informal diaries that might not necessarily encourage the use of academic language, including complex NPs. Thus, it is conceivable that L2 NP complexity may increase even in the writing of L2 beginning-level learners, but this trend seems to be pronounced in the context of academic writing.
Next, MLC, serving as an indicator of phrasal complexity, which is another crucial aspect of academic writing (Biber et al. 2020), exhibited stability over time. This observation of a lack of change in MLC aligns with prior research that investigated the writing of L2 beginning-level learners (Bulté and Housen 2018; Wang 2022). It is plausible that, due to their limited L2 proficiency, L2 beginning-level learners’ use of phrasal elaboration remains within “attractors, regions in state space” (Larsen-Freeman 2019: 64).
Lastly, DC/T, used as a measure of subordination, displayed noteworthy non-linear changes in this study. Specifically, it exhibited a trend of increasing subordination up to the first two months, followed by a plateau. This finding partially aligns with earlier studies that reported an increase in subordination over time (Bulté and Housen 2018; Köylü et al. 2023; Wang 2022), but this study contributes to the existing body of research by highlighting the non-linear nature of this developmental trajectory. Possibly, L2 beginning-level learners may have initially made efforts to incorporate more subordination into their early writing attempts. Still, over time, they may have become trapped within an attractor state (Larsen-Freeman 2019), potentially lacking the need to displace the system from this attractor.
5.2 Relationships between lexical and phraseological complexity and syntactic complexity
The second research question investigated the relationship between lexical and phraseological complexity features and syntactic complexity features in argumentative writing over time by EFL beginning-level learners. To achieve this, the three syntactic complexity measures, CN/T, MLC, and DC/T, were predicted, respectively, with the predictors being MTLD, CW age-of-acquisition scores, and bigram MI scores.
The findings unveiled that CN/T, as a measured of NP complexity, exhibited a negative linear relationship with MTLD, which aligns with the findings of a prior case study (O’Leary and Steinkrauss 2022). The competition observed between MTLD and CN/T suggests a trade-off effect between lexical diversity and NP complexity. In their texts, if L2 beginning-level learners incorporated a wider range of words (resulting in an increase in MTLD), they might have used fewer CNs (leading to a decrease in CN/T). This could be attributed to the possibility that L2 beginning-level learners may encounter challenges in employing both diverse words and CNs simultaneously in their writing, potentially due to limitations in the available cognitive resources at their disposal. Conversely, CN/T had a positive non-linear relationship with CW age-of-acquisition scores. Specifically, the connection between CW age-of-acquisition and CN/T exhibited a plateau until a certain point, followed by a positive relationship. This finding indicates that, as L2 beginning-level learners employed more sophisticated CWs in their texts, they also tended to use more CNs once they surpassed a particular threshold. This finding underscores the partially supportive relationship between lexical sophistication and NP complexity, corroborating the results of a previous case study (Spoelman and Verspoor 2010). On the other hand, CN/T did not exhibit a significant relationship with bigram MI scores. This absence of a relationship may indicate that phraseological sophistication and NP complexity represent relatively independent aspects within the linguistic phenomena of L2 beginning-level learners’ argumentative writing.
The findings demonstrated that MLC, as a measure of phrasal complexity, exhibited no relationship with MTLD. The absence of a relationship between MTLD and MLC implies that the utilization of diverse words and phrasal elaboration are distinct and separate aspects of linguistic complexity within the context of argumentative writing by L2 beginning-level learners. Next, MLC had a positive linear relationship with CW age-of-acquisition scores. This observed positive relationship suggests that, among L2 beginning-level learners, incorporating more sophisticated CWs into their writing was associated with a tendency to produce longer clauses via phrasal elaboration. Furthermore, since CW age-of-acquisition scores also showed a positive relationship with CN/T, it implies that using more sophisticated CWs may facilitate the creation of more complex NPs as well as longer clauses. This underscores a pattern of “connected growth” between lexical sophistication and phrasal complexity (Verspoor and van Dijk 2011). Conversely, MLC showed a negative linear relationship with bigram MI scores. This competitive relationship suggests that when L2 beginning-level learners strived to use more strongly associated bigrams, it may have led to the creation of shorter clauses with limited additions of phrase-level elements. There could potentially be a trade-off effect between phraseological sophistication, assessed through bigram MI scores, and phrasal complexity, evaluated via MLC. However, considering that, to our knowledge, there is no previous research on the relationship between phraseological sophistication and phrasal complexity, further research is needed to substantiate this finding.
The findings revealed that DC/T, a measure of subordination, showed no relationship with MTLD or CW age-of-acquisition scores. This absence of a relationship suggests that, in the context of L2 beginning-level learners’ argumentative writing, subordination, which reflects clause-level complexity, is not associated with lexical complexity. On the other hand, DC/T demonstrated a non-linear relationship with bigram MI scores. Among L2 beginning-level learners, using more strongly associated bigrams was associated with greater subordination, but beyond a certain point, this relationship reached a plateau. It is possible that employing more strongly associated bigrams may facilitate increased subordination up to a certain threshold. However, the precise mechanisms underlying this finding are beyond the scope of our study, and further research is warranted. In any case, this study contributes to our understanding by highlighting a supportive relationship between phraseological sophistication and subordination in the argumentative writing of L2 beginning-level learners.
5.3 Relationships between linguistic complexity and writing scores
The third research question aimed to explore the relationship between lexical, phraseological, and syntactic complexity features and holistic writing scores in argumentative writing by EFL beginning-level learners over time. Overall, the results indicated that L2 holistic writing scores exhibited significant associations with CW age-of-acquisition scores, bigram MI scores, CN/T, MLC, and DC/T, but not with MTLD.
In terms of lexical complexity, our findings indicate that over time, higher-rated L2 texts tended to incorporate more sophisticated words, but not necessarily a more diverse range of words. This observation concerning lexical sophistication aligns with prior studies that have reported a positive relationship between lexical sophistication and L2 writing scores (e.g. Kim et al. 2018). However, the finding related to lexical diversity presents a perplexing contrast, especially considering that the scoring rubric emphasizes the use of a wide range of vocabulary. Our findings did not support the notion, as suggested in previous research (e.g. Crossley and McNamara 2012), that greater lexical diversity is associated with higher-rated L2 texts. One possible explanation for this divergence could be that, in line with our finding showing no improvement in lexical diversity among L2 beginning-level learners, the limited diversification of their lexical choices may not have had a noticeable impact on the raters’ assessments of their texts.
In the realm of phraseological complexity, the significant association between bigram MI scores and holistic writing scores displayed a non-linear pattern, characterized by an initial increase until reaching a certain point followed by a decrease, forming an omega-shaped relationship. This finding is intriguing, particularly in light of previous research that has established a positive relationship between the utilization of phraseological units with higher MI scores and higher L2 writing scores (e.g. Bestgen and Granger 2014; Paquot 2019). The unexpected negative relationship between MI scores and L2 writing scores, identified in our study after a certain threshold, prompts questions. As L2 beginning-level learners made efforts to use more strongly associated bigrams in their writing, it is conceivable that beyond a certain point, this effort might have introduced a trade-off with other aspects of writing, such as syntactic complexity and idea development. Another possible explanation, more specifically, is that given the negative relationship between bigram MI scores and MLC, it is plausible that as L2 beginning-level learners introduced more strongly associated bigrams into their texts with a certain threshold MI score surpassed, they might have produced shorter clauses. This, in turn, could have negatively impacted raters’ evaluation of their texts. However, verifying these hypotheses would require additional research.
Regarding syntactic complexity, our study found that holistic writing scores exhibited a relatively linear increase as CN/T, MLC, and DC/T increased. This finding is consistent with prior research that has demonstrated that L2 writing scores have a positive relationship with NP complexity, phrasal complexity, and subordination (e.g. Lu 2010).
6 Conclusions
Adopting a CDST perspective, this study examined the longitudinal changes in L2 lexical, phraseological, and syntactic complexity, along with holistic L2 writing scores, investigating their interconnected dynamics over time among beginning-level Japanese university English learners, particularly in the context of argumentative writing. The first research question revealed longitudinal changes in lexical, phraseological, and syntactic complexity features and holistic writing scores, revealing linear growth in writing scores, lexical sophistication, and NP complexity. Additionally, it was observed that subordination exhibited a non-linear trajectory. The second research question, which explored relationships among different complexity measures, identified interrelated growth patterns between lexical sophistication and NP complexity, between lexical sophistication and phrasal complexity, and between phraseological complexity and subordination. Conversely, competitive dynamics were observed between lexical diversity and NP complexity, as well as between phraseological complexity and phrasal sophistication. The findings of the third research question revealed linear associations between writing scores and lexical sophistication or syntactic features, but a non-linear relationship with phraseological sophistication.
In essence, this study offers insights into the complex dynamics of linguistic complexity and holistic writing scores among beginning-level EFL learners engaged in argumentative writing over time. Specifically, the present study contributes to our understanding of the development patterns in terms of various dimensions of linguistic complexity (lexical, phraseological, and syntactic complexity) in L2 writing among beginning-level Japanese EFL learners. In addition, by demonstrating developmental patterns using GAMMs, this study shares methodological implications of modeling the development of linguistic complexity and holistic L2 writing scores, as well as their interrelationships in a nonlinear manner. Ultimately, further examination of both nonlinear and linear developmental patterns in L2s may be necessary to enhance our understanding of the dynamic development in L2s.
The findings of the current study also bear pedagogical implications. First, the results suggest that L2 (English) beginner-level students demonstrated improvement in holistic writing scores, as well as in lexical sophistication and phrasal complexity – critical components in academic writing (Biber et al. 2020) – over time, despite the absence of explicit emphasis on academic writing in the students’ course. Consequently, L2 practitioners, particularly those working with adult beginning-level L2 learners, could potentially enhance their students’ linguistic complexity and writing ability by providing writing opportunities regularly as a part of the regular general English curriculum. Second, the absence of evidence for the development of lexical diversity and phraseological sophistication implies that a significant amount of time is required to incorporate more diverse words and acquire more strongly associated bigrams in academic writing or that these features may need explicit instructions. Therefore, explicit integration of these aspects into the course curriculum may be necessary. Lastly, considering the interrelated growth patterns found between lexical sophistication and phrasal complexity (including NP complexity), instructors for L2 beginning-level learners could elicit synergic effects on students’ learning of more complex L2 usage by introducing more sophisticated, advanced content words and, simultaneously, linking them at larger phrasal levels, such as noun phrases and verb phrases.
Despite new contributions of the current study, there are limitations that need acknowledgment in future research. Although we collected a total of six essays over nine months, in order to fully understand the longitudinal dynamics of L2 writing development, the data collection periods could have been longer, especially acknowledging that L2 writing development takes time. Additionally, the current study focused on one type of writing genre. Therefore, the findings should be interpreted with this limitation in mind. Subsequent studies could potentially collect writing samples in at least two different genres to examine potential differences in L2 developmental patterns across genres. Furthermore, we did not conduct an in-depth analysis for individual learners. Future studies could examine both group-based statistics and individual learner case studies to complement the findings. Finally, although the writing samples were collected in classroom contexts, we did not examine the role of instruction in writing development. Thus, future research may be warranted to investigate the effects of L2 instruction on L2 complexity development. A more systematic examination is required to understand how instruction influences the patterns of L2 complexity development in L2 writing.
References
Bestgen, Yves & Sylviane Granger. 2014. Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing 26. 28–41. https://doi.org/10.1016/j.jslw.2014.09.004.Search in Google Scholar
Biber, Douglas, Bethany Gray & Kornwipa Poonpon. 2011. Should we use characteristics of conversation to measure grammatical complexity in L2 writing development? Tesol Quarterly 45(1). 5–35. https://doi.org/10.5054/tq.2011.244483.Search in Google Scholar
Biber, Douglas, Shelley Staples & Jesse Egbert. 2020. Investigating grammatical complexity in L2 English writing research: Linguistic description versus predictive measurement. Journal of English for Academic Purposes 46. 100869-1-100869-15. https://doi.org/10.1016/j.jeap.2020.100869.Search in Google Scholar
Brysbaert, Marc, Paweł Mandera & Emmanuel Keuleers. 2018. The word frequency effect in word processing: An updated review. Current Directions in Psychological Science 27(1). 45–50. https://doi.org/10.1177/0963721417727521.Search in Google Scholar
Bulté, Bram & Alex Housen. 2012. Defining and operationalising L2 complexity. In Alex Housen, Folkert Kuiken & Ineke Vedder (eds.), Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA, 21–46. Amsterdam: John Benjamins Publishing Company.10.1075/lllt.32.02bulSearch in Google Scholar
Bulté, Bram & Alex Housen. 2018. Syntactic complexity in L2 writing: Individual pathways and emerging group trends. International Journal of Applied Linguistics 28(1). 147–164. https://doi.org/10.1111/ijal.12196.Search in Google Scholar
Bulté, Bram & Alex Housen. 2020. A DUB-inspired case study of multidimensional L2 complexity development: Competing or connecting growers? In Wander Lowie, Marije Michel, Merel Keijzer & Rasmus Steinkrauss (eds.), Usage-based dynamics in second language development, 50–86. Bristol, Blue Ridge Summit: Multilingual Matters.10.21832/9781788925259-006Search in Google Scholar
Caspi, Tal. 2010. A dynamic perspective on second language development. University of Groningen dissertation.Search in Google Scholar
Crossley, Scott A. & Danielle S. McNamara. 2012. Predicting second language writing proficiency: The roles of cohesion and linguistic sophistication. Journal of Research in Reading 35(2). 115–135. https://doi.org/10.1111/j.1467-9817.2010.01449.x.Search in Google Scholar
Cumming, Alister, Robert Kantor, Donald Powers, Terry Santos & Carol Taylor. 2000. TOEFL 2000 writing framework: A working paper. Princeton: Educational Testing Service.Search in Google Scholar
Davies, Mark. 2008. The corpus of contemporary American English: 520 million words. Available at: http://corpus.byu.edu/coca/.Search in Google Scholar
Duan, Shiping & Zhiliang Shi. 2021. A longitudinal study of formulaic sequence use in second language writing: Complex dynamic systems perspective. Language Teaching Research 28(2). 497–530. https://doi.org/10.1177/13621688211002942.Search in Google Scholar
Field, Andy. 2009. Discovering statistics using SPSS, 3rd edn. London: Sage.Search in Google Scholar
Gaillat, Thomas & Nicolas Ballier. 2019. Expérimentation de Feedback Visuel Des Productions crites d’apprenants Francophones de l’anglais Sous MOODLE [Experimentation with visual feedback of written productions by French-speaking English learners using MOODLE]. In Actes de La Conférence EIAH2019 [In Proceedings of the EIAH2019 Conference]. Paris, France: Association des Technologies de l’Information pour l’Education et la Formation [Association of Information Technologies for Education and Training].Search in Google Scholar
Gardner, Sheena & Hilary Nesi. 2013. A classification of genre families in university student writing. Applied Linguistics 34(1). 25–52. https://doi.org/10.1093/applin/ams024.Search in Google Scholar
Han, ZhaoHong, Eun Young Kang & Sarah Sok. 2022. The complexity epistemology and ontology in second language acquisition: A critical review. Studies in Second Language Acquisition 45. 1388–1412. https://doi.org/10.1017/S0272263122000420.Search in Google Scholar
Hiver, Phil, Ali H. Al-Hoorie & Reid Evans. 2022. Complex dynamic systems theory in language learning: A scoping review of 25 years of research. Studies in Second Language Acquisition 44(4). 913–941. https://doi.org/10.1017/S0272263121000553.Search in Google Scholar
Huang, Ting, Rasmus Steinkrauss & Marjolijn Verspoor. 2021. Variability as predictor in L2 writing proficiency. Journal of Second Language Writing 52. 100787. https://doi.org/10.1016/j.jslw.2020.100787.Search in Google Scholar
Kim, Minkyung, Scott Crossley & Kristopher Kyle. 2018. Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing quality. The Modern Language Journal 102(1). 120–141. https://doi.org/10.1111/modl.12447.Search in Google Scholar
Köylü, Zeynep, Nurullah Eryılmaz & Carmen Pérez-Vidal. 2023. A dynamic usage-based analysis of L2 written complexity development of sojourners. Journal of Second Language Writing 60. 101002. https://doi.org/10.1016/j.jslw.2023.101002.Search in Google Scholar
Kuperman, Victor, Hans Stadthagen-Gonzalez & Marc Brysbaert. 2012. Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods 44(4). 978–990. https://doi.org/10.3758/s13428-012-0210-4.Search in Google Scholar
Kyle, Kyle, Scott A. Crossley & Jarvis Scott. 2021. Assessing the validity of lexical diversity indices using direct judgements. Language Assessment Quarterly 18(2). 154–170. https://doi.org/10.1080/15434303.2020.1844205.Search in Google Scholar
Kyle, Kyle, Scott A. Crossley & Cindy Berger. 2018. The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behavior Research Methods 50(3). 1030–1046. https://doi.org/10.3758/s13428-017-0924-4.Search in Google Scholar
Larsen-Freeman, Diane. 2006. The emergence of complexity, fluency, and accuracy in the oral and written production of five Chinese learners of English. Applied Linguistics 27(4). 590–619. https://doi.org/10.1093/applin/aml029.Search in Google Scholar
Larsen-Freeman, Diane. 2019. On language learner agency: A complex dynamic systems theory perspective. The Modern Language Journal 103. 61–79. https://doi.org/10.1111/modl.12536.Search in Google Scholar
Lowie, Wander M. & Marjolijn H. Verspoor. 2019. Individual differences and the ergodicity problem: Individual differences and ergodicity. Language Learning 69. 184–206. https://doi.org/10.1111/lang.12324.Search in Google Scholar
Lu, Xiaofei. 2010. Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics 15(4). 474–496. https://doi.org/10.1075/ijcl.15.4.02lu.Search in Google Scholar
McCarthy, Philip M. & Scott Jarvis. 2010. MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods 42(2). 381–392. https://doi.org/10.3758/brm.42.2.381.Search in Google Scholar
Norris, John M. & Rosa M. Manchón. 2012. Investigating L2 writing development from multiple perspectives: Issues in theory and research. In Rosa Manchón (ed.), L2 writing development: Multiple perspectives, 221–244. Berlin: de Gruyter.10.1515/9781934078303.221Search in Google Scholar
Norris, John M. & Lourdes Ortega. 2009. Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics 30(4). 555–578. https://doi.org/10.1093/applin/amp044.Search in Google Scholar
O’Leary, John & Rasmus Steinkrauss. 2022. Syntactic and lexical complexity in L2 English academic writing: Development and competition. Ampersand 9. 100096-1-100096-10. https://doi.org/10.1016/j.amper.2022.100096.Search in Google Scholar
Pallotti, Gabriele. 2015. A simple view of linguistic complexity. Second Language Research 31(1). 117–134. https://doi.org/10.1177/0267658314536435.Search in Google Scholar
Paquot, Magali. 2019. The phraseological dimension in interlanguage complexity research. Second Language Research 35(1). 121–145. https://doi.org/10.1177/0267658317694221.Search in Google Scholar
Pfenninger, Simone E. 2020. The dynamic multicausality of age of first bilingual language exposure: Evidence from a longitudinal content and language integrated learning study with dense time serial measurements. The Modern Language Journal 104(3). 662–686. https://doi.org/10.1111/modl.12666.Search in Google Scholar
Read, John A. 2000. Assessing vocabulary. Cambridge: Cambridge University Press.10.1017/CBO9780511732942Search in Google Scholar
Rokoszewska, Katarzyna J. 2022. The dynamics of monthly growth rates in the emergence of complexity, accuracy, and fluency in L2 English writing at secondary school – A learner corpus analysis. System 106. 102775. https://doi.org/10.1016/j.system.2022.102775.Search in Google Scholar
Saito, Kazuya & Yuwei Liu. 2021. Roles of collocation in L2 oral proficiency revisited: Different tasks, L1 vs. L2 raters, and cross-sectional vs. longitudinal analyses. Second Language Research 38(3). 531–554, 026765832098805. https://doi.org/10.1177/0267658320988055.Search in Google Scholar
Siyanova Chanturia, Anna & Stefania Spina. 2020. Multi‐word expressions in second language writing: A large‐scale longitudinal learner corpus study. Language Learning 70(2). 420–463. https://doi.org/10.1111/lang.12383.Search in Google Scholar
Spoelman, Marianne & Marjolijn Verspoor. 2010. Dynamic patterns in development of accuracy and complexity: A longitudinal case study in the acquisition of Finnish. Applied Linguistics 31(4). 532–553. https://doi.org/10.1093/applin/amq001.Search in Google Scholar
Tannenbaum, Richard J. & E. Caroline Wylie. 2008. Linking English‐language test scores onto the common European framework of reference: An application of standard‐setting methodology. ETS Research Report Series 2008(1). i–75. https://doi.org/10.1002/j.2333-8504.2008.tb02120.x.Search in Google Scholar
Valdes, Guadalupe, Haro Paz & Maria Paz Echev Arriarza. 1992. The development of writing abilities in a foreign language: Contributions toward a general theory of L2 writing. The Modern Language Journal. 333–352. https://doi.org/10.2307/330163.Search in Google Scholar
Vandeweerd, Nathan, Alex Housen & Magali Paquot. 2022. Comparing the longitudinal development of phraseological complexity across oral and written tasks. Studies in Second Language Acquisition. 1–25. https://doi.org/10.1017/S0272263122000389.Search in Google Scholar
Verspoor, Marjolijn & Hana Smiskova. 2012. Foreign language writing development from a dynamic usage based perspective. In Rosa Manchón (ed.), L2 writing development: Multiple perspectives, 17–46. Berlin: De Gruyter.10.1515/9781934078303.17Search in Google Scholar
Verspoor, Marjolijn H. & Marijn Van Dijk. 2011. Visualizing interaction between variables. In Rosa Manchón (ed.), A dynamic approach to second language development: Methods and techniques, 85–98. Amsterdam: John Benjamins Publishers.10.1075/lllt.29.05verSearch in Google Scholar
Verspoor, Marjolijn, Wander Lowie & Marijn Van Dijk. 2008. Variability in second language development from a dynamic systems perspective. The Modern Language Journal 92(2). 214–231. https://doi.org/10.1111/j.1540-4781.2008.00715.x.Search in Google Scholar
Verspoor, Marjolijn, Monika S. Schmid & Xiaoyan Xu. 2012. A dynamic usage based perspective on L2 writing. Journal of Second Language Writing 21(3). 239–263. https://doi.org/10.1016/j.jslw.2012.03.007.Search in Google Scholar
Vyatkina, Nina. 2012. The development of second language writing complexity in groups and individuals: A longitudinal learner corpus study. The Modern Language Journal 96(4). 576–598. https://doi.org/10.1111/j.1540-4781.2012.01401.x.Search in Google Scholar
Wang, Zhihong. 2022. Dynamic development of syntactic complexity in second language writing: A longitudinal case study of a young Chinese EFL learner. Frontiers in Psychology 13. 974481. https://doi.org/10.3389/fpsyg.2022.974481.Search in Google Scholar
Wieling, M. 2018. Analyzing dynamic phonetic data using generalized additive mixed modeling: A tutorial focusing on articulatory differences between L1 and L2 speakers of English. Journal of Phonetics 70. 86–116. https://doi.org/10.1016/j.wocn.2018.03.002.Search in Google Scholar
Wolfe-Quintero, Kate, Kathryn Elizabeth, Shunji Inagaki & Hae-Young Kim. 1998. Second language development in writing: Measures of fluency, accuracy and complexity. Honolulu: Second Language Teaching and Curriculum Center, University of Hawaii.Search in Google Scholar
Wood, Simon. 2006. R-manual: The MGCV package. Technical Report.Search in Google Scholar
Zenker, Fred & Kristopher Kyle. 2021. Investigating minimum text lengths for lexical diversity indices. Assessing Writing 47. 100505. https://doi.org/10.1016/j.asw.2020.100505.Search in Google Scholar
Zheng, Yongyan. 2016. The complex, dynamic development of L2 lexical use: A longitudinal study on Chinese learners of English. System 56. 40–53. https://doi.org/10.1016/j.system.2015.11.007.Search in Google Scholar
© 2024 the author(s), published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.