The impact of using DeepL Translator on Chinese EFL students’ story writing

Lijin Liang

doi:10.1515/jccall-2024-0009

Article Open Access

The impact of using DeepL Translator on Chinese EFL students’ story writing

Lijin Liang
Lijin Liang earned her MA in Applied Linguistics from Beijing Foreign Studies University. Her research interests focus on language acquisition and AI-assisted learning. In her spare time, she enjoys learning new languages and crafting stories.

Published/Copyright: November 1, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Journal of China Computer-Assisted Language Learning Volume 5 Issue 1

Abstract

Story writing is a complex task for EFL students. With the rise of various writing assistance tools enabled by the development of AI, it has become crucial to explore how AI-assisted tools can alleviate the challenges students face in story writing. This study investigated the impact of using DeepL Translator, a cutting-edge AI-assisted tool, on Chinese EFL students’ story writing, evaluating both form and content, and analyzing students’ attitudes toward these tools. Based on the translanguaging approach, thirty university students first wrote their stories in Chinese, then drafted them in English. They used DeepL Translator to translate their Chinese stories into English and compared these translations with their original English drafts to create revised versions. Computational analysis and human raters were employed to evaluate students’ writing products. Surveys and interviews were used to obtain students’ attitudes. Results showed significant improvements in linguistic form, including syntactic and lexical complexity, fluency, and accuracy, assessed using the CALF framework. Additionally, content quality was significantly improved. This study also found that DeepL Translator provided significant support for lower proficiency students. Participants expressed overall satisfaction with DeepL Translator despite some challenges. This study highlights the role of AI-assisted tools in language learning and offers practical suggestions for language pedagogy and future research.

Keywords: computer-assisted language learning; AI-assisted tools; story writing; second language acquisition; DeepL Translator

1 Introduction

Story writing is a multifaceted task that demands not only language skills like vocabulary and grammar but also organizational abilities, creativity, critical thinking, and cultural understanding (Graham and Perin 2007). This complexity is particularly challenging for EFL students. They face numerous obstacles, including difficulty using dictionaries effectively (Rahmat et al. 2021), limited access to instructor support, and cognitive overload from having to handle multiple challenging elements at once (Lee 2019). Consequently, there is a significant need for assistive tools to support students in their story writing efforts.

EFL students typically cannot write in the same manner as native speakers. However, there is a natural connection between first language (L1) and second language (L2) in the minds of EFL learners, and this cognitive link should be utilized rather than ignored (Tsai 2020). The translanguaging approach proposes that multilingual individuals can effectively combine and utilize different linguistic resources to develop their unique voice (Canagarajah 2011). Earlier studies have also shown that incorporating L1 during L2 writing can reduce cognitive overload and assist in idea generation (Lee 2019).

Given the recent advancements in AI technology, which have significantly improved the accuracy of machine translation (MT) tools and their ability to differentiate nuances in text, it is imperative for research to investigate how this new generation of MT contributes to more challenging aspects of language learning. This study aims to fill this gap by investigating the impact of using DeepL Translator, a cutting-edge AI-assisted translation tool, on Chinese EFL students’ story writing, assessing both the form and the content of students’ writing products before and after DeepL use, as well as investigating students’ attitudes toward using AI-assisted tools in EFL story writing.

2 Literature review

EFL students often find story writing challenging due to the multifaceted skills and cognitive requirements it demands, thus highlighting the importance of integrating AI-assisted tools to help them with story writing. This section begins with an overview of story writing and the challenges embedded in this task for EFL learners, followed by an introduction to the translanguaging approach. Subsequently, it delves into existing research exploring the utilization of MT tools in language learning. Specifically, it outlines studies examining learners’ and instructors’ perceptions of MT use and prominent empirical research on the impact of MT use in EFL writing, bringing attention to the lack of empirical research on the use of AI-assisted translation tools in story writing.

2.1 Story writing and its challenges for EFL students

Language education aims to equip learners with effective communication skills, making story writing an invaluable tool for self-expression and cultural understanding (Graham and Perin 2007). However, this complex task presents challenges for EFL students, who may struggle with language proficiency, organizational skills, and cultural knowledge. As such, there is a pressing need for assistive tools to support students in their writing endeavors.

Story writing unfolds through various stages: planning, drafting, revising, and editing. Each stage requires careful attention to details and a focus on clarity, coherence, and correctness (Graham and Perin 2007). Despite its importance, story writing often receives inadequate attention in foreign language education, even though evidence suggests its benefits in enhancing language acquisition, communicative competence, and motivation (Albert and Kormos 2004; Smith 2013).

EFL students encounter several challenges in their writing process. They may struggle with effective dictionary use (Rahmat et al. 2021), limited teacher assistance, and cognitive overload due to managing multiple aspects simultaneously (Lee 2019). Addressing these challenges is crucial for fostering language development and improving writing proficiency.

2.2 Translanguaging approach

The translanguaging approach proposes that multilingual individuals have the capacity to leverage various language resources to express themselves effectively (Canagarajah 2011). Previous research suggests that the translanguaging approach can be advantageous for developing language skills (Canagarajah 2011; García and Lin 2017). García and Lin (2017) argued that bilingual education should provide students with opportunities to utilize their entire linguistic repertoire to the fullest extent without being constrained by the socially and politically constructed boundaries of specific languages and the associated ideologies of linguistic purity. Only then can educators effectively empower students to make informed choices about when to embrace their full linguistic capabilities, thus freeing their minds and imagination.

EFL students usually do not possess the same level of composition abilities as native writers. Nonetheless, there is a natural connection between the first language (L1) and the second language (L2) in the minds of EFL writers, and this cognitive process should be utilized rather than disregarded (Tsai 2020). When composing in L2, writers draw upon their L1 to differing degrees, leveraging their linguistic resources to overcome challenges and express themselves effectively. Previous research also indicates that using L1 during L2 writing helps to alleviate cognitive overload and facilitates the generation of ideas (Lee 2019).

2.3 Using machine translation as AI-assisted translanguaging tools

Machine translation (MT) is a critical component of modern AI technology, serving as a valuable tool to alleviate cognitive overload and enhance students’ expression through accurate and concise semantic translation. Early MT systems, such as rule-based machine translation (RBMT) and statistical machine translation (SMT), were followed by neural machine translation (NMT), which utilizes artificial neural networks to achieve improved accuracy in text translation (Stasimioti et al. 2020; Volkart et al. 2018). DeepL Translator, an NMT launched in 2017, has gained popularity for its high translation accuracy, outperforming other machine translators in various studies (Stasimioti et al. 2020; Volkart et al. 2018; Yulianto and Supriatnaningsih 2021). Despite its capabilities, there is limited research on using DeepL Translator in language education, particularly for the Chinese-English language pair. This study utilized DeepL Translator as an AI-assisted tool to aid Chinese students in their English story writing, aiming to provide insights and contribute to existing research.

MT in language education has been extensively studied, with research exploring students’ perceptions and the impact on their language learning. Many students view MT positively, believing it helps them with language learning and increases their confidence (An et al. 2023; Niño 2009; Song and Song 2023). However, students also recognize its limitations, such as inaccurate grammar and difficulty handling longer sentences (Jolley and Maimone 2015; Tsai 2019). Previous studies have examined students’ perceptions of MT, but there is a gap in understanding how these perceptions relate to students’ actual performance and proficiency. This study aimed to bridge that gap by linking students’ attitudes toward AI-assisted tools with their performance and proficiency, contributing valuable data to the field.

Early MT systems faced criticism for their poor performance, prompting concerns about their use in language learning. Despite these flaws, Anderson (2013) argued that MT could serve as a pedagogical tool to facilitate language learning by being used as a bad model for students to identify and correct errors. With technological advancements, MT has become more accurate and effective. Some instructors view MT use as plagiarism and prohibit its use (Clifford et al. 2013), while others advocate for its potential in facilitating language learning (Tsai 2020). Lee (2019) categorized the beneficial effects of MT from cognitive, linguistic, and affective perspectives, highlighting its role in reducing cognitive load, supporting lexico-grammatical knowledge, and increasing motivation and confidence in learners.

Despite its advantages, the impact of MT on writing performance has yielded inconsistent findings in previous research. While some studies have shown improvements in fluency and accuracy (Garcia and Pena 2011; Tsai 2020), others found mixed results, particularly in syntactic and lexical complexity (Chung and Ahn 2021). To address these inconsistencies, this study investigated the impact of DeepL use on Chinese EFL students’ story writing, focusing on both form and content aspects. It also explored students’ attitudes toward using DeepL for story writing, aiming to provide comprehensive insights into the potential benefits and challenges of AI-assisted tools in language education.

AI-assisted tools have shown promise in facilitating language learning, yet further research is needed to fully understand their impact on students’ language skills and explore their applications beyond traditional academic writing.

3 Research design

This section provides a comprehensive overview of the research design, presenting the methodologies applied for both data collection and analysis. To address the research questions effectively, it outlines the approach for assessing the impact of DeepL Translator, an AI-assisted tool, on students’ story writing regarding both form and content. This evaluation encompasses the use of computational analysis and human raters. Additionally, this section introduces the methodology for evaluating students’ attitudes, which involves the utilization of surveys and interviews.

3.1 Research questions

The following research questions (RQs) were addressed in this study:

RQ1:

What influence does DeepL Translator have on Chinese EFL students’ story writing in terms of form?

RQ2:

What influence does DeepL Translator have on Chinese EFL students’ story writing in terms of content?

RQ3:

What are Chinese EFL students’ attitudes toward using DeepL Translator for story writing?

3.2 Participants

Thirty non-English major university students (16 female, 14 male; aged between 19 and 22) participated in the study. All participants were native Chinese speakers.

The CET-6 is an English proficiency test administered by the Chinese Ministry of Education, widely recognized by universities and employers. Participants’ English proficiency levels were determined by their CET-6 scores. The mean score (M) of all participants was 446.60; the standard deviation (SD) was 86.77. The participants were divided into two proficiency groups based on their CET-6 test scores: the higher proficiency group and the lower proficiency group. Fifteen students who scored higher than 425, which is over 60 % of the total score (710), were placed in the higher proficiency group (M = 516.33, SD = 61.06); fifteen students who scored lower than 425 were placed in the lower proficiency group (M = 376.87, SD = 38.04).

In addition to their proficiency levels, it is important to consider the participants’ difficulties in writing, particularly in storytelling. At Chinese universities, there are generally no specific courses dedicated to English story writing, which means that students interested in this area often lack sufficient guidance and instruction. This gap can significantly impact their writing skills and overall performance.

3.3 Data collection

To address the research questions, data were collected from multiple sources, including students’ writing products, surveys, and interviews.

Before starting the main writing task, students were briefed on the research objectives and how to use DeepL Translator. They were informed that their stories would be evaluated anonymously, their form would be assessed using automated computational tools by the researcher, and their content would be scored by a panel of three expert writers. Participants were assured that their work would be used solely for research purposes and would be reported in a way that maintains their anonymity.

Participants were given ample time to choose their story topics and engage in preliminary preparations. They were encouraged to outline their stories beforehand to facilitate a smoother writing process. This deliberate approach aimed to ensure that their performance was not compromised due to lack of preparation.

Various writing prompts were provided to students from the “self-publishing school” website, which offers challenging materials designed to enhance writing skills. This approach aimed to prevent participants from feeling underestimated by assigning tasks that were too simple for their age and intelligence. Students were instructed to use these prompts freely to inspire their stories. Topic familiarity significantly influences students’ writing performance, so keeping familiarity consistent among participants helped ensure score reliability and study validity. Students were encouraged to select prompts that interested them the most from a variety of genres, including fantasy, sci-fi, dystopian, contemporary, romance, horror/thriller, and mystery. Participants were also allowed to manipulate prompts – combining different ones, changing genres, or even ignoring prompts altogether to write any story they wished. This flexibility was designed to encourage creativity and minimize the potential impact of unfamiliar prompts on the study’s outcomes.

Based on the translanguaging approach and inspired by Lee (2019) and Tsai (2020), in the main procedure, each student was asked to complete the following tasks:

Write their stories in Chinese (Chinese version, CH). The word count was advised to be approximately 800–1,000 Chinese characters.
Write the same story in English by themselves (self-written version, SW) without using any resources. The word count was advised to be approximately 500–800 words.
Translate their Chinese version into English only using DeepL Translator (DeepL version, DL).
Revise the self-written version by comparing it with the DeepL version (revised self-written version, RSW), using DeepL Translator as the only resource.
A follow-up online survey was handed out to students to learn about their attitudes toward using DeepL as an AI-assisted tool for story writing.
Individual interviews were conducted with eight willing participants to elicit more in-depth thoughts about their writing process and their attitudes toward AI-assisted tools.

Students were provided with blank online documents for each step. The main steps of the research took approximately 3 h for each student, with the actual time logged by students and verified by timestamps on the shared documents. The researcher monitored students’ progress throughout. All English versions of students’ work (except for the DeepL version) were reviewed by the researcher for English proficiency and research performance. To ensure integrity, an AI Content Detector (https://crossplag.com/ai-content-detector/) was used to detect AI-related cheating in both the Chinese and self-written versions. This tool can detect over 100 languages and has access to 300 million documents and 70 billion web articles for comparison, making it a leading AI detection tool for businesses, marketers, and educators who need to verify content originality. Extensive testing against established datasets has validated the tool’s accuracy in detecting AI-generated content. Furthermore, user feedback confirms its reliability and effectiveness in various applications (Elkhatat et al. 2023).

The time advised for each step was approximately 1 h, allowing flexibility based on individual writing needs and preferences. This approach aimed to reduce stress and optimize performance. Fluency was measured as words per minute.

The survey utilized a 5-point Likert scale immediately after the writing tasks, with responses ranging from (1) strongly disagree to (5) strongly agree (see Appendix A for survey questions). The questions were inspired by the survey questions from Tsai (2020). The interviews followed specific guidelines with room for improvisation and were conducted in Chinese to efficiently explore participants’ attitudes toward DeepL (see Appendix B for interview guidelines). Each interview lasted approximately 20 min. Each student is labeled using a unique identifier (e.g., L-S01, H-S09) in reporting the interview data. The letter “L” indicates lower proficiency students, while “H” indicates higher proficiency students.

3.4 Data analysis

This research used a mixed method, including both quantitative analysis of students’ writing products and the returned surveys, and qualitative analysis of students’ writing samples and the interviews.

3.4.1 Analysis of students’ story writing products

This research divided the analysis of students’ writing products into two parts: analyzing the form and analyzing the content.

A statistical analysis was conducted utilizing SPSS version 26.0 to investigate the impact of DeepL use on each linguistic measure examined in the study. For all the data as well as the data for two proficiency groups (higher and lower) separately, the mean and standard deviation were calculated. Additionally, independent-samples T tests were conducted to determine whether there were significant differences in scores between the higher proficiency group and the lower proficiency group, and paired-samples T tests were conducted to determine the significance of the changes before and after DeepL use. In this study, a 95 % confidence interval was selected, and a p-value of less than 0.05 was considered significant.

3.4.1.1 Analysis of form

This research utilized the CALF framework to assess students’ writing based on syntactic complexity, lexical complexity, fluency, and accuracy. The effectiveness of the CALF framework in evaluating linguistic performance has been well-documented in numerous studies (Chung and Ahn 2021; Ellis and Yuan 2004; Yang and Kim 2018).

Syntactic Complexity: measured using the L2 Syntactical Complexity Analyzer (L2SCA, https://sites.psu.edu/xxl13/l2sca/), created by Xiaofei Lu at Pennsylvania State University. L2SCA generates syntactic complexity indices for written English texts and is widely used in corpus linguistics (Lu 2010, 2011). It offers a range of measures to assess various syntactic features, as outlined in Table 1.
Lexical Complexity: measured using the Lexical Complexity Analyzer (https://sites.psu.edu/xxl13/lca/) developed by Xiaofei Lu (Lu 2012). This widely-used tool evaluates the lexical complexity of written texts and offers a range of measures for assessment, as outlined in Table 2.
Fluency: measured based on the number of words written per minute (W/M), calculated by dividing the total number of words by the total number of minutes spent on writing.
Accuracy: measured by error analysis using Grammarly, an online grammar-checking tool. Grammarly employs advanced algorithms and natural language processing to provide feedback, identifying grammatical and mechanical errors like subject-verb agreement, punctuation, and spelling (Dodigovic and Tovmasyan 2021). While Grammarly may have limitations in addressing nuanced issues, it is well-suited for identifying grammar mistakes and evaluating grammatical accuracy.

Table 1:

Measures of syntactic complexity.

Category	Measure	Formula
Length of production	Mean length of sentence (MLS)	# of words/# of sentences
	Mean length of T-unit (MLT)	# of words/# of T-units
	Mean length of clause (MLC)	# of words/# of clauses
Subordination	Clause per T-unit (C/T)	# of clauses/# of T-units
	Dependent clause per clause (DC/C)	# of dependent clauses/# of clauses
	Dependent clause per T-unit (DC/T)	# of dependent clauses/# of T-units
Coordination	T-unit per sentence (T/S)	# of T-units/# of sentences
	Coordinate phrase per clause (CP/C)	# of coordinate phrases/# of clauses
	Coordinate phrase per T-unit (CP/T)	# of coordinate phrases/# of T-units
Particular structures	Complex nominal per T-unit (CN/T)	# of complex nominals/# of T-units
	Complex nominal per clause (CN/C)	# of complex nominals/# of clauses
	Verb phrase per T-unit (VP/T)	# of verb phrases/# of T-units

Table 2:

Measures of lexical complexity.

Category	Measure	Formula
Lexical density	Lexical density (LD)	# of lexical words/# of words
Lexical sophistication	Lexical sophistication-I (LS1)	# of sophisticated lexical words/# of lexical words
	Lexical sophistication-II (LS2)	# of sophisticated word types/# of word types
	Verb sophistication-I (VS1)	# of sophisticated verb types/# of verbs
	Verb sophistication-II (VS2)	(# of sophisticated verb types)^2/# of verbs
	Corrected VS1 (CVS1)	(# of sophisticated verb types)/√(2 * # of verbs)
Lexical variation	Number of different words (NDW)	# of word types
	Corrected type/token ratio (CTTR)	# of word types/√(2 * # of words)
	Root type/token ratio (RTTR)	# of word types/√(# of words)
	Lexical word variation (LV)	# of lexical word types/# of lexical words
	Verb variation-I (VV1)	# of verb types/# of verbs
	Squared VV1 (SVV1)	(# of verb types)∧2/# of verbs
	Corrected VV1 (CVV1)	# of verb types/√(2 * # of verbs)
	Verb variation-II (VV2)	# of verb types/# of lexical words
	Noun variation (NV)	# of noun types/# of lexical words
	Modifier variation (MODV)	(# of adjective types + # of adverb types)/# of lexical words

Students’ works were imported into Grammarly, and identified errors were reviewed by the researcher. Errors were categorized into several main types (Wu and Garza 2014). The number of errors in each type and total errors were counted, and error density was calculated according to Table 3.

Table 3:

Measures of error density.

Measure	Formula
Conjunction use error density	# of conjunction use errors/# of words
Determiner use error density	# of determiner use errors/# of words
Noun-related error density	# of noun-related errors/# of words
Preposition use error density	# of preposition use errors/# of words
Punctuation error density	# of punctuation errors/# of words
Spelling error density	# of spelling errors/# of words
Verb-related error density	# of verb-related errors/# of words
Total error density	# of total errors/# of words

Higher error densities indicate more errors and lower accuracy, while lower error densities indicate fewer errors and higher accuracy.

3.4.1.2 Analysis of content

The Consensual Assessment Technique (CAT) (D’Souza 2021) was used to evaluate the content of students’ writing. CAT is renowned for assessing creative performance, making it ideal for evaluating story writing.

In this technique, expert raters independently and subjectively score the writing products. Each rater provides an individual assessment, and then a consensus among the raters is sought to ensure the validity of the evaluation. This consensus is achieved through measuring inter-rater reliability, which assesses the degree of agreement among the raters.

The judges were instructed to score the students’ writing pieces based solely on content, rather than on grammatical features. To facilitate this process, a guideline outlining the aspects the judges should consider when scoring was synthesized by the researcher based on the judges’ feedback, and subsequently approved by them. This approach ensures a structured and consistent evaluation of the students’ creative outputs.

Three experts, alias Sandy, Fiona, and Elliot, were invited for the assessment. Sandy, a part-time writer and ESL teacher in Southeast Asia, brings insights from her experience in ESL writing. Fiona, a bilingual individual who grew up in China and currently resides in the US, offers a cross-cultural perspective. Elliot, an American writer specializing in speculative fiction, provides expertise in storytelling techniques.

A guideline for assessment was established and agreed upon by the experts, covering readability, plot, characters, setting, theme, and tone. Readability focuses on clarity and flow; plot involves the sequence of events; characters assess depth and development; setting evaluates the story’s environment; theme looks at the underlying message; and tone examines the overall mood. These factors are widely recognized as critical elements in narrative construction and evaluation, and they are consistently cited in both academic and practical discussions about what constitutes effective and engaging storytelling (Brooks 2011; Truby 2008).

Due to recent objections from writers about AI-generated content, the experts were only asked to evaluate the original self-written versions and their revised counterparts. Each writing product was scored independently on a scale of 1–5 across the agreed aspects without knowing which version was original or revised. This ensured unbiased evaluations based solely on their expertise in story writing (Cseh and Jeffries 2019). Inter-rater reliability was calculated to ensure consensus among judges, enhancing the reliability and credibility of the findings.

In addition, a detailed analysis of several writing samples was conducted. This analysis focused on how the integration of DeepL has facilitated improvements in various aspects of students’ narratives. Each sample consists of the original Chinese version (CH), the original self-written version (SW), the DeepL translation (DL), and the revised self-written version (RSW). By comparing these versions, the specific enhancements achieved through the use of the DeepL are highlighted.

3.4.2 Analysis of the survey and interview

The survey and interview analysis involved using statistical software to analyze the survey responses, along with manual transcription and coding of the interview data to identify key themes.

The survey responses were subject to descriptive analysis, wherein the percentages of answers to the questions were calculated, and then independent-samples T tests were conducted to determine if the different attitudes between two proficiency groups were of any significance. Cronbach’s alpha reliability was calculated to ensure the consistency and reliability of students’ responses regarding their perceptions of AI-assisted tools.

Individual interviews were conducted with a focus on understanding the participants’ attitudes. Each interview lasted approximately 20 min, resulting in transcriptions of about 2,000–2,500 Chinese characters per interview (excluding filler words). The interviews were carried out in Chinese to ensure participants could express themselves comfortably and naturally. To translate the interviews into English while maintaining the accuracy of meaning, a combination of AI-assisted translation tools and manual revisions by the researcher was employed. This approach ensured that the translations were both accurate and reflective of the participants’ original intent. For the interviews, a thematic analysis approach was employed. The researcher manually identified, categorized, and labeled key themes that not only emerged frequently but also held significant importance (Creswell 2009).

4 Results

This section presents key findings obtained through both qualitative and quantitative research methods.

4.1 Form and content of students’ story writing products

Overall, the results indicate that the revised version exhibited significant improvements in various aspects compared to the students’ self-written stories.

4.1.1 Form

The improvements in form were notably prominent in the significant enhancement of length of production, coordination, particular structures, use of sophisticated verbs, lexical variation (in terms of verbs, adverbs, and adjectives), fluency, and overall accuracy. Additionally, variations were observed among different proficiency groups.

4.1.1.1 Syntactic complexity

In terms of syntactic complexity, the use of DeepL led to significant improvements in length of production (MLS: p < 0.001, MLT: p < 0.001, MLC: p < 0.001), coordination (T/S: p = 0.001, CP/T: p = 0.009, CP/C: p = 0.007), and particular structures (CN/T: p < 0.001, CN/C: p < 0.001, VP/T: p = 0.004). However, improvements in subordination (C/T: p = 0.086, DC/C: p = 0.439, DC/T: p = 0.215) were insignificant (see Table 4).

Table 4:

Results of the changes in syntactic complexity (all participants).

Version		Length of production			Subordination			Coordination			Particular structures
Version		MLS	MLT	MLC	C/T	DC/C	DC/T	T/S	CP/T	CP/C	CN/T	CN/C	VP/T
SW	Mean	14.694	12.061	7.954	1.512	0.283	0.447	1.216	0.257	0.168	1.087	0.711	1.930
	N	30	30	30	30	30	30	30	30	30	30	30	30
	SD	4.929	3.806	1.861	0.243	0.096	0.217	0.159	0.200	0.116	0.481	0.258	0.425
RSW	Mean	20.304	14.856	9.347	1.580	0.294	0.482	1.366	0.328	0.206	1.303	0.816	2.108
	N	30	30	30	30	30	30	30	30	30	30	30	30
	SD	7.732	4.629	2.021	0.282	0.083	0.217	0.324	0.195	0.115	0.507	0.252	0.451
Sig. (2-tailed)		0.000	0.000	0.000	0.086	0.439	0.215	0.001	0.009	0.007	0.000	0.000	0.004

4.1.1.2 Lexical complexity

In terms of lexical complexity, the use of DeepL Translator led to an insignificant decline in lexical density (p = 0.869). In lexical sophistication, there were significant improvements in VS1 (p = 0.014), VS2 (p < 0.001), and CVS1 (p < 0.001), while the improvements were not significant in LS1 (p = 0.111) and LS2 (p = 0.095). There were significant improvements in lexical variation in terms of NDW (p < 0.001), CTTR (p < 0.001), RTTR (p < 0.001), verb diversity (VV1: p = 0.005, SVV1: p < 0.001, CVV1: p < 0.001), LV (p = 0.004), and MODV (p = 0.011), while the improvements in VV2 (p = 0.361) and NV (p = 0.342) were not significant (see Table 5).

Table 5:

Results of the changes in lexical complexity (all participants).

Version		LD	Lexical sophistication					Lexical variation
Version		LD	LS1	LS2	VS1	VS2	CVS1	NDW	CTTR	RTTR	VV1	SVV1	CVV1	LV	VV2	NV	MODV
SW	M	0.522	0.259	0.242	0.116	1.620	0.787	231.933	6.476	9.159	0.614	33.296	4.009	0.569	0.171	0.548	0.172
	N	30	30	30	30	30	30	30	30	30	30	30	30	30	30	30	30
	SD	0.036	0.078	0.070	0.054	1.940	0.445	88.748	0.954	1.350	0.115	13.627	0.773	0.091	0.028	0.113	0.034
RSW	M	0.521	0.271	0.255	0.134	2.393	0.982	277.133	7.130	10.084	0.648	42.388	4.544	0.588	0.173	0.557	0.183
	N	30	30	30	30	30	30	30	30	30	30	30	30	30	30	30	30
	SD	0.026	0.074	0.063	0.057	2.361	0.490	83.926	0.777	1.098	0.107	14.233	0.752	0.084	0.033	0.111	0.030
Sig. (2-tailed)		0.869	0.111	0.095	0.014	0.000	0.000	0.000	0.000	0.000	0.005	0.000	0.000	0.004	0.361	0.342	0.011

4.1.1.3 Fluency

In terms of fluency, the use of DeepL Translator resulted in a significant increase (p < 0.001) in the words students produced in a certain period of time, which indicates a significant improvement in fluency (see Table 6).

Table 6:

Results of the changes in fluency (all participants).

Version		Fluency
SW	Mean	7.527
	N	30
	SD	3.010
RSW	Mean	21.871
	N	30
	SD	11.042
Sig. (2-tailed)		0.000

4.1.1.4 Accuracy

In terms of accuracy, the use of DeepL Translator led to a significant decline of overall error density (p < 0.001), which indicates a significant improvement in accuracy. In specific error types, the use of DeepL Translator led to significant declines in determiner use error density (p = 0.001), noun-related error density (p < 0.001), preposition use error density (p < 0.001), spelling error density (p < 0.001), and verb-related error density (p < 0.001), while leading to insignificant declines in conjunction use error density (p = 0.221) and punctuation error density (p = 0.419) (see Table 7).

Table 7:

Results of the changes in accuracy (all participants).

Version		Conjunction use error density	Determiner use error density	Noun-related error density	Preposition use error density	Punctuation error density	Spelling error density	Verb-related error density	Total error density
SW	Mean	0.004	0.014	0.003	0.008	0.005	0.035	0.021	0.094
	N	30	30	30	30	30	30	30	30
	SD	0.004	0.014	0.003	0.006	0.004	0.026	0.020	0.051
RSW	Mean	0.003	0.005	0.001	0.003	0.004	0.012	0.005	0.035
	N	30	30	30	30	30	30	30	30
	SD	0.004	0.006	0.002	0.003	0.003	0.015	0.005	0.027
Sig. (2-tailed)		0.221	0.001	0.000	0.000	0.419	0.000	0.000	0.000

4.1.1.5 Regarding different proficiency groups

When analyzing the measures of the two proficiency groups (higher vs. lower), the two groups exhibited varied results.

When comparing the scores of the CALF measures between the higher and lower groups, in terms of lexical variation, the higher proficiency group performed significantly better than the lower proficiency group in terms of NDW (p = 0.010), CTTR (p = 0.004), RTTR (p = 0.004), SVV1 (p = 0.005), and CVV1 (p = 0.004) in the SW version, but no significant difference was found between the two groups in these measures in the RSW version. In terms of fluency, the higher proficiency group performed significantly better than the lower proficiency group in the SW version (p = 0.043), but no significant difference was found between the two groups in the RSW version (p = 0.469). Interestingly, the mean fluency score of the lower proficiency group even surpassed that of the higher proficiency group in the RSW version (lower = 23.361; higher = 20.381). In terms of accuracy, the higher proficiency group had significantly lesser error density than the lower proficiency group in preposition use error (p = 0.032), spelling error (p < 0.001), verb-related error (p = 0.015), and total error (p < 0.001) in the SW version; in the RSW version, no significant difference was found between the two groups in preposition use error (p = 0.261) while the significance still remained in spelling error density (p = 0.033), verb-related error density (p = 0.017), and total error density (p = 0.013) (see Table 8).

Table 8:

Results comparing CALF between two proficiency groups.

Version	Proficiency group	Measure	Lexical complexity					Fluency	Accuracy
			Lexical variation					Fluency	Error density
			NDW	CTTR	RTTR	SVV1	CVV1	W/M	Preposition use error density	Spelling error density	Verb-related error density	Total error density
SW	Higher	Mean	273.800	6.955	9.837	40.171	4.409	8.649	0.006	0.016	0.012	0.061
	Lower	Mean	190.067	5.997	8.481	26.421	3.609	6.404	0.011	0.054	0.030	0.128
	Higher vs. lower	Sig. (2-tailed)	0.010	0.004	0.004	0.005	0.004	0.043	0.032	0.000	0.015	0.000
RSW	Higher	Mean	298.067	7.227	10.219	45.873	4.706	20.381	0.003	0.006	0.003	0.023
	Lower	Mean	256.200	7.034	9.948	38.903	4.382	23.361	0.004	0.017	0.007	0.047
	Higher vs. lower	Sig. (2-tailed)	0.176	0.507	0.508	0.185	0.245	0.469	0.261	0.033	0.017	0.013

Note. Only measures with notable effects or interactions are reported due to space constraints.

According to the results of the changes after DeepL use, in terms of syntactic complexity, the improvements in all three measures of particular structures were significant in the lower proficiency group – CN/T: p = 0.001; CN/C: p < 0.001; VP/T: p = 0.022. However, in the higher proficiency group, improvements in CN/C (p = 0.141) and VP/T (p = 0.080) were insignificant. Interestingly, the mean scores of the lower proficiency group after DeepL use even surpassed those of the higher proficiency group in all three measures of particular structures. In terms of lexical variation, there was an insignificant decrease in MODV in the higher proficiency group (SW = 0.181; RSW = 0.179; p = 0.638), while there was a significant improvement in MODV in the lower proficiency group (SW = 0.163; RSW = 0.186; p < 0.001). In terms of accuracy, the decrease in determiner use error density was insignificant in the higher proficiency group (p = 0.076), while it appeared to be significant in the lower proficiency group (p < 0.001) (see Table 9).

Table 9:

Results of the changes in CALF (comparing two proficiency groups).

Proficiency group	Version	Measure	Syntactic complexity			Lexical complexity	Accuracy
			Particular structures			Lexical variation	Error density
			CN/T	CN/C	VP/T	MODV	Determiner use error density
Higher	SW	Mean	1.150	0.744	1.977	0.181	0.013
	RSW	Mean	1.242	0.781	2.076	0.179	0.003
	SW vs. RSW	Sig. (2-tailed)	0.011	0.141	0.080	0.638	0.076
Lower	SW	Mean	1.023	0.678	1.882	0.163	0.015
	RSW	Mean	1.364	0.850	2.141	0.186	0.006
	SW vs. RSW	Sig. (2-tailed)	0.001	0.000	0.022	0.000	0.000

Note. Only measures with notable effects or interactions are reported due to space constraints.

4.1.2 Content

The results of the changes in content indicate that DeepL positively affected various aspects of story content, including readability, plot, characters, setting, theme, and tone. Differences were also noted within various proficiency groups.

The inter-rater reliability among judges was 0.821 (lower bound = 0.792, upper bound = 0.848 at 95 % confidence interval), which is a reliable level as it exceeded 0.7, indicating that there was a consensus among the judges.

Overall, the use of DeepL Translator resulted in significant improvements in all six aspects – readability (p < 0.001), plot (p = 0.007), characters (p = 0.004), setting (p = 0.019), theme (p < 0.001) and tone (p < 0.001) (see Table 10).

Table 10:

Results of the changes in content (all participants).

Version		Readability	Plot	Characters	Setting	Theme	Tone
SW	Mean	2.533	2.578	2.378	2.167	2.600	2.744
	N	30	30	30	30	30	30
	SD	0.617	0.600	0.665	0.552	1.019	0.725
RSW	Mean	3.233	2.900	2.689	2.456	3.122	3.356
	N	30	30	30	30	30	30
	SD	0.632	0.489	0.643	0.597	0.780	0.593
Sig. (2-tailed)		0.000	0.007	0.004	0.019	0.000	0.000

When comparing the two proficiency groups (higher vs. lower), according to the content scores between two proficiency groups, the higher proficiency group performed significantly better than the lower proficiency group in terms of readability (p = 0.008), plot (p = 0.004), and characters (p = 0.047) in the SW version, but no significant difference was found between the two groups in these measures in the RSW version – readability (p = 0.152), plot (p = 0.543), characters (p = 0.854). Furthermore, in the RSW version, the mean scores of the lower proficiency group even surpassed those of the higher proficiency group in terms of setting (higher = 2.333; lower = 2.578) and tone (higher = 3.333; lower = 3.378) (see Table 11).

Table 11:

Results comparing the content between two proficiency groups.

Version	Proficiency group	Measure	Readability	Plot	Characters	Setting	Theme	Tone
SW	Higher	Mean	2.822	2.889	2.622	2.311	2.844	2.978
	Lower	Mean	2.244	2.267	2.133	2.022	2.356	2.511
	Higher vs. lower	Sig. (2-tailed)	0.008	0.004	0.047	0.155	0.194	0.078
RSW	Higher	Mean	3.400	2.956	2.711	2.333	3.156	3.333
	Lower	Mean	3.067	2.844	2.667	2.578	3.089	3.378
	Higher vs. lower	Sig. (2-tailed)	0.152	0.543	0.854	0.269	0.820	0.842

According to the results of the changes after DeepL use, in the higher proficiency group, the improvements in plot (p = 0.715), characters (p = 0.573), setting (p = 0.872), and theme (p = 0.058) were insignificant, while the improvements in all six aspects remained significant in the lower proficiency group – readability (p < 0.001), plot (p < 0.001), characters (p < 0.001), setting (p = 0.005), theme (p = 0.001) and tone (p < 0.001) (see Table 12).

Table 12:

Results of the changes in content (comparing two proficiency groups).

Proficiency group	Version	Measure	Readability	Plot	Characters	Setting	Theme	Tone
Higher	SW	Mean	2.822	2.889	2.622	2.311	2.844	2.978
	RSW	Mean	3.400	2.956	2.711	2.333	3.156	3.333
	SW vs. RSW	Sig. (2-tailed)	0.001	0.715	0.573	0.872	0.058	0.023
Lower	SW	Mean	2.244	2.267	2.133	2.022	2.356	2.511
	RSW	Mean	3.067	2.844	2.667	2.578	3.089	3.378
	SW vs. RSW	Sig. (2-tailed)	0.000	0.000	0.000	0.005	0.001	0.000

The following writing samples show in detail how students’ writing improved with the help of the DeepL translator.

Sample 1:

CH: 父亲皱眉, 用筷子大力戳在我手上。

SW: My father frowns, hits my hand by chopistiks.

DL: My father frowned and poked my hand vigorously with his chopsticks.

RSW: My father frowned and poked my hand vigorously with his chopsticks.

The revised self-written version (RSW) matches the DeepL version (DL), showing improved grammar and vocabulary. The original self-written version (SW) contained errors such as “chopistiks” instead of “chopsticks” and awkward phrasing. The revised version corrected these issues, resulting in a more accurate and natural sentence.

In term of readability, the revised self-written version (RSW) is clearer and grammatically correct, making it more readable. Regarding characterization, the revised version more accurately depicts the father’s action and emotion, enhancing the character’s portrayal. Additionally, the tone is more precise in the revised version, reflecting a more vivid and intense interaction.

Sample 2:

CH: 包的拉链卡在中间, 像一张嘴露出别扭的微笑。

SW: The bag have some error.

DL: The bag’s zipper was stuck in the middle like a mouth with a twisted smile.

RSW: The bag’s zipper was stuck in the middle like a mouth with a twisted smile.

The revised self-written version (RSW) also matches the DeepL version (DL). The original self-written version (SW) was vague and incorrect. The revised version, influenced by DeepL, provides a vivid and accurate description, showing a clear improvement in descriptive language and accuracy.

In terms of readability, the RSW is significantly more descriptive and clearer than the original SW. Regarding the setting, the imagery of the zipper as a twisted smile effectively conveys a sense of drama and awkwardness. Regarding the theme, this depiction subtly highlights the owner’s poverty, making the narrative richer and more poignant. Furthermore, the tone is dramatic with a hint of awkwardness, reflecting the owner’s struggles and adding depth to the scene.

Sample 3:

CH: 睁开眼一看, 一张巨大的手抓住我的身躯将我举起, 除了我的枝丫, 现在我是一个根光秃秃的树枝了。

SW: when i open my eyes, a lange hand catch my body, take me far away from the ground and remove the leaf from my body, i am a blank branch.

DL: I opened my eyes to see a huge hand grabbed my body and picked me up, removed my branches, and now I was a bare branch.

RSW: when i open my eyes, a huge hand grabbed my body and picked me up, removed my leafs, now, i was a bare branch.

The revised self-written version (RSW) shows some improvements but still contains errors such as “leafs” instead of “leaves” and some issues with capitalization and punctuation. It incorporates better phrasing from the DeepL version (DL), but the student could further refine their writing for full accuracy and naturalness.

In terms of readability, while the RSW is improved, it still contains minor grammatical errors. However, it is more coherent than the SW. Regarding the plot, the revised version offers a clearer sequence of events, enhancing the plot’s coherence. Additionally, the revised version maintains a surreal and mystical tone, enhancing the story’s overall atmosphere.

Sample 4:

CH: 在小孩子眼里, 仿佛这一切在平凡不过的日常都变得有趣的, 甚至天上偶然飘过的塑料袋都会逗得她咯咯大笑。

SW: In her eyes, it was great for many simple thing. Even, she could laugh by a simple thing.

DL: In the eyes of a small child, as if all the ordinary daily routine become interesting, even the occasional floating plastic bags in the sky will make her giggle.

RSW: In the eyes of a small child, as if all the ordinary daily routine become interesting, even the occasional floating plastic bags in the sky will make her giggle.

The revised self-written version (RSW) is identical to the DeepL version (DL), showing significant improvement from the original self-written version (SW). The initial version was unclear and grammatically incorrect, while the revised version is coherent and grammatically correct, demonstrating better sentence structure and descriptive language.

In terms of readability, the RSW is much clearer and more engaging than the SW. Regarding the setting, the revised version vividly depicts the child’s perspective, making the setting more relatable and immersive. In terms of characterization, the RSW better captures the innocence and wonder of the child. The theme of finding joy in everyday life is also conveyed more effectively. Additionally, the revised version maintains a light-hearted tone, enhancing the narrative’s appeal.

Overall, these examples give a glimpse of how the use of DeepL has helped students improve the quality of their content. The revised self-written versions (RSW) demonstrate enhanced readability, clearer plots, more vivid characters and settings, well-conveyed themes, and consistent tones. These improvements make the narratives more engaging and compelling for readers.

4.2 Students’ attitudes

The follow-up surveys and interviews suggest that participants expressed satisfaction with the use of DeepL for their English story writing, although certain challenges remained. After analyzing key interview themes from eight participants, this study categorized students’ attitudes toward using DeepL into two main categories: benefits and challenges. The interviews were conducted in Chinese for efficiency and precision in gathering students’ thoughts. The excerpts below have been translated by the researcher.

4.2.1 Positive attitudes

The survey demonstrated Cronbach’s alpha reliability of 0.861 (all participants), 0.751 (higher proficiency group), and 0.889 (lower proficiency group), indicating a reliable level as they all exceeded 0.7. The completion rate of the survey was 100 %.

When the survey responses were analyzed, 80 % of the participants were satisfied with the English story translated by DeepL Translator (Q1); 80 % of the participants agreed that DeepL Translator was helpful for content improvement (Q2); 90 % of the participants agreed that DeepL Translator was helpful for vocabulary use (Q3); 86.67 % agreed that DeepL was helpful for sentence pattern (Q4); and 90 % agreed that DeepL was helpful for the completion of the story (Q6). 90 % of the participants also stated their willingness to continue using DeepL Translator (Q8). Additionally, 73.33 % of the participants agreed that DeepL Translator was helpful for expression (Q5) and 66.67 % agreed that DeepL was helpful for grammar accuracy (Q7) (see Table 13).

Table 13:

Survey results (all participants).

	Q1	Q2	Q3	Q4	Q5	Q6	Q7	Q8
(1) Strongly disagree	0.00 %	0.00 %	0.00 %	0.00 %	0.00 %	0.00 %	0.00 %	0.00 %
(2) Disagree	3.33 %	6.67 %	3.33 %	3.33 %	3.33 %	3.33 %	6.67 %	0.00 %
(3) Neutral	16.67 %	13.33 %	6.67 %	10.00 %	23.33 %	6.67 %	26.67 %	10.00 %
(4) Agree	63.33 %	46.67 %	56.67 %	46.67 %	53.33 %	56.67 %	56.67 %	66.67 %
(5) Strongly agree	16.67 %	33.33 %	33.33 %	40.00 %	20.00 %	33.33 %	10.00 %	23.33 %

When comparing the difference in attitude towards DeepL between two groups, the lower proficiency group was significantly more satisfied with the English story translated by DeepL Translator than the higher proficiency group (Q1; lower = 4.200; higher = 3.667; p = 0.032). The lower proficiency group found DeepL Translator to be significantly more beneficial for content improvement in English story writing compared to the higher proficiency group (Q2; lower = 4.467; higher = 3.667; p = 0.010). The lower proficiency group perceived DeepL Translator as significantly more helpful for enhancing the completion of English story writing compared to the higher proficiency group (Q6; lower = 4.467; higher = 3.933; p = 0.039). In the meantime, there were no significant differences between the two groups in terms of DeepL’s contribution to vocabulary (Q3; p = 0.315), sentence patterns (Q4; p = 0.818), expression (Q5; p = 0.235), grammar accuracy (Q7; p = 0.230), and students’ willingness to continue using DeepL (Q8; p = 0.207) (see Table 14).

Table 14:

Survey results comparing the attitude between two proficiency groups.

Proficiency group	Measure	Q1	Q2	Q3	Q4	Q5	Q6	Q7	Q8
Higher	Mean	3.667	3.667	4.067	4.200	3.733	3.933	3.533	4.000
Lower	Mean	4.200	4.467	4.333	4.267	4.067	4.467	3.867	4.267
Sig. (2-tailed)		0.032	0.010	0.315	0.818	0.235	0.039	0.230	0.207

According to the interview, participants highlighted several benefits of using DeepL Translator, which can be grouped into distinct themes: user-friendliness, confidence and motivation, language learning and fluency, and content and quality improvement.

4.2.1.1 User-friendliness

Participants found DeepL user-friendly and accessible, particularly appreciating its feature of providing interchangeable words. One participant highlighted the ease of access and simplicity of using DeepL:

Using DeepL is super simple. If you’re connected to the internet, then you’re good to go! (L-S01)

This statement underscores the tool’s straightforward interface, which lowers the barrier to entry for users who may not be tech-savvy. Another participant emphasized the usefulness of DeepL’s feature that offers synonyms or interchangeable words:

When you click on a word in DeepL, it gives you other words you could use instead. It’s like having an instant upgrade for my story! (H-S09)

This feature is particularly beneficial for writers seeking to enhance their vocabulary and find the most appropriate words to convey their ideas effectively. These responses highlight a significant technical advantage of DeepL: its ability to simplify the writing process for users of varying technical proficiencies. The straightforward interface makes the tool accessible, reducing the learning curve associated with more complex software. This accessibility is crucial for students who may lack advanced technical skills, allowing them to focus more on their writing rather than navigating the tool itself. Furthermore, the feature of providing interchangeable words directly impacts the quality of writing by expanding the users’ vocabulary. This function not only aids in finding suitable synonyms but also encourages writers to experiment with different word choices, thereby enriching their language skills. The ability to quickly and easily explore alternatives can lead to more nuanced and precise writing, which is especially beneficial in an educational context where vocabulary development is a key learning objective.

4.2.1.2 Confidence and motivation

Participants expressed that DeepL increased their confidence and interest in writing. One participant expressed newfound confidence after using DeepL:

Seeing that I can complete an English story with nearly 1,000 words has made me full of confidence! (L-S01)

This sentiment highlights the motivational boost that DeepL provides to users by enabling them to accomplish substantial writing tasks in English. The increase in confidence and motivation is a crucial benefit of using DeepL. By enabling participants to successfully complete extensive writing tasks, DeepL fosters a sense of achievement and encourages further engagement with English writing. This psychological boost can be pivotal in maintaining long-term interest and continuous improvement in language skills. Confidence and motivation are essential components in language learning. When students feel confident in their abilities, they are more likely to take on challenging tasks and persist in their efforts. The sense of accomplishment from completing a substantial writing task, such as a 1,000-word story, can significantly enhance a student’s self-efficacy. This, in turn, can lead to increased motivation to continue writing and improving their language skills.

4.2.1.3 Language learning and fluency

Participants learned new vocabulary and sentence structures, and noted enhanced fluency. One participant remarked on the educational value of comparing translations:

I’ve actually picked up some new words and ways to say things by comparing my original English version with the one from DeepL. (H-S06)

This practice of comparing translations helps users learn new vocabulary and more natural ways of expressing ideas. Another participant appreciated the time-saving aspect and the quality of sentence structures provided by DeepL:

Using DeepL saved me so much time since it gave me these well-structured English sentences. (L-S06)

This suggests that DeepL not only aids in learning but also enhances writing fluency by providing well-formed sentences that users can adopt and learn from. These responses highlight the significant role that DeepL plays in language learning and fluency. The ability to compare one’s own translations with those generated by DeepL offers a unique learning opportunity, allowing students to see practical examples of improved vocabulary usage and sentence structuring. This comparison can lead to a deeper understanding of language mechanics and more natural expression. Moreover, the time-saving aspect of using DeepL cannot be overstated. By providing well-formed sentences, DeepL allows students to focus more on content creation and creativity, rather than being bogged down by grammatical concerns. This efficiency can enhance the overall learning experience, making language learning more enjoyable and less stressful. AI-assisted tools like DeepL serve as practical learning aids that complement traditional language learning methods. By exposing learners to a variety of vocabulary and syntactical structures, these tools help bridge the gap between classroom learning and practical application. The time saved by using these tools means that learners can devote more energy to refining their ideas and improving the quality of their writing.

4.2.1.4 Content and quality improvement

DeepL’s translations helped improve the content, accuracy, and overall quality of the stories. One participant noted how DeepL allowed them to focus more on the creative aspects of their writing:

DeepL helped me concentrate more on shaping the story and making it interesting. (L-S10)

This indicates that DeepL’s assistance with language allows writers to devote more attention to the narrative and creative elements of their work. Another participant observed improvements in the flow and vibrancy of their stories:

My story flowed better and became more polished and vibrant after using DeepL. (L-S11)

This suggests that DeepL’s translations contribute to a more coherent and engaging writing style. These responses highlight the significant impact of DeepL on the quality of students’ writing. The tool’s ability to handle grammatical accuracy allows students to invest more effort into the creative process, which is crucial for story writing. By alleviating the burden of language mechanics, DeepL enables students to focus on narrative elements such as plot development, character creation, and thematic depth. Additionally, the improvement in story flow and vibrancy indicates that DeepL aids in creating more polished and engaging content. This enhancement is particularly valuable for non-native English writers, who might struggle with both linguistic accuracy and creative expression. DeepL’s assistance ensures that their stories are not only correct in form but also rich and dynamic in content.

4.2.2 Negative attitudes

The challenges include technical problems, problems due to language differences, cultural and contextual challenges, source text dependency, and self-control and dependence.

4.2.2.1 Technical problems

Participants encountered several technical issues with DeepL. One participant compared DeepL with Baidu Translator, noting,

The way DeepL Translator is designed isn’t as user-friendly as Baidu Translator. When you look up an English word in Baidu Translator, it provides example sentences that help you understand how to use the word in different contexts. It’s like it gives you a peek into how that word fits naturally into a sentence. But DeepL doesn’t do that; it only gives you the word’s meaning without any context. Learning words like this, without any real-life examples, is pretty tough. (H-S02)

Another participant observed that translations seemed off when applied to entire passages at once. However, breaking down the text and translating sentence by sentence improved accuracy. They stated,

… It’s a bit strange. When I put the whole story into the translator, the output feels a bit off. So I tried putting the parts that seemed wrong into DeepL separately, and surprisingly, it turns out much better. (H-S13)

Furthermore, while DeepL offers multiple translation options for individual words, participants desired more flexibility in choosing suitable translations for complete sentences. One participant mentioned,

One thing I noticed is that DeepL offers a bunch of options for single words, which is cool. But when it comes to whole sentences, it’s a bit different. It only gives you one choice. Personally, I think it’d be awesome if they could give us a few different ways to translate a sentence, with different word orders. That way, I could pick the one that fits my style the best. Just having that extra flexibility would make it even more handy. (L-S01)

Overall, these insights indicate that DeepL would benefit from more user-friendly features, such as providing contextual example sentences and offering flexible translation options for entire sentences. Implementing these improvements could significantly enhance learners’ understanding and use of translated content, ultimately fostering greater language acquisition.

4.2.2.2 Problems due to language differences

Participants identified challenges stemming from the differences between Chinese and English. One common issue was related to tense. A participant noted,

DeepL struggled with tenses because Chinese doesn’t convey them the way English does. It ended up jumping around different tenses, making the whole story a bit of a mess. (H-S02)

This observation underscores the critical role tense plays in narrative structure and how AI tools may not adequately address language-specific nuances. Additionally, excessive comma usage in Chinese also led to punctuation errors in translations. A participant explained,

In Chinese, we tend to use commas frequently. But when I translated my story, it returned with incorrect punctuation in English. (H-S09)

This example illustrates how direct translation can lead to grammatical inaccuracies, highlighting the need for caution when using AI tools. Pronouns also posed a challenge. A participant shared,

Like, in my story: “薛郎, 老朽有事想与你相商。” (Xue Lang, I have something to discuss with you.) “老者所为何事, 我们是否见过面?” (What is your business? Have we met before?) DeepL Translator turned “老朽” and “老者” into “the old man”, but actually, it should be “I” and “you” in this context. (L-S01)

In Chinese, it is common for individuals to refer to themselves or others in the third person rather than using the pronouns “I” or “you”. This practice differs from English conventions. These issues highlight the inherent challenges of translating between languages with different grammatical structures and conventions. Students should be advised to use these tools critically and thoughtfully.

4.2.2.3 Cultural and contextual challenges

Participants encountered difficulties due to errors in translating terms specific to certain cultural contexts. One participant shared a notable example of this issue, explaining how DeepL misinterpreted a culturally significant event in their story:

So, my whole story revolved around this girl’s big moment at her “coming-of-age ceremony (成人礼)”. But guess what DeepL did? It translated that into “bar mitzvah”. This word just didn’t feel right to me, and there was no way I was sticking with that in my final version … I was right because it turns out, “bar mitzvah” is this Jewish ceremony for boys when they hit 13, and there I was, talking about an 18-year-old Chinese girl’s milestone. The whole vibe and meaning were just completely off. It kind of got me a bit riled up too. I mean, I want my story to reflect my culture and heritage, not some mix-up with another tradition. (L-S06)

This example highlights the inadequacies of AI translation tools in handling culturally specific references. The participant’s frustration underscores the importance of accurately conveying cultural nuances in storytelling. The incorrect translation not only misrepresented the significance of the event but also detracted from the authenticity of the narrative. This misalignment illustrates the broader challenge of ensuring that AI tools can effectively capture the essence of culturally specific terms. It is essential for students to critically evaluate AI-generated translations and to be aware of the potential for misinterpretations that can arise from cultural differences. By recognizing these limitations, learners can use AI tools more judiciously, ensuring their stories authentically represent their cultural backgrounds.

Additionally, participants encountered difficulties due to errors in translating terms specific to cultural contexts and idiomatic expressions. For example, one participant shared their experience with the idiom “刀子嘴豆腐心”, which means someone who is tough on the outside but soft on the inside. Initially, DeepL translated it literally as “knife mouth and bean curd heart”, which was not meaningful in English. However, when the same idiom was translated separately, DeepL provided the correct translation: “have a sharp tongue but a soft heart”. This discrepancy highlighted the tool’s limitation in handling idiomatic expressions within the context of a longer narrative. The participant noted,

Maybe DeepL got overloaded with my 1,000-word story somehow? Or maybe it’s just not cut out for handling a whole bunch of context. I was kinda disappointed. I thought it would ace it with the full picture. Turns out, not so much. (H-S06)

This indicates that while DeepL can accurately translate idioms in isolation, it struggles with maintaining accuracy in longer texts, especially when cultural and contextual nuances are involved. This underscores the need for educators to caution learners to engage in careful revision when using AI-assisted translation tools.

4.2.2.4 Source text dependency

Participants discussed DeepL’s reliance on the accuracy of the source Chinese text. One participant noted how a simple omission affected the translation:

Yeah, I actually forgot to throw in the punctuation here in the original Chinese version, and it messed up the translation on DeepL … The way the sentence was split, DeepL just couldn’t figure it out. (H-S02)

This comment illustrates how even minor errors in the source text can lead to substantial misunderstandings in the translated output. The participant’s experience underscores the necessity for meticulous attention to detail when interacting with AI tools, as any ambiguity can compromise their effectiveness. Another participant noted,

DeepL seems really picky about grammar details. Like, here I wrote: “长的水灵”. The right term should be “长得水灵”, but DeepL thought it was “long water spirit”. But then, when I fixed the Chinese version to “长得水灵”, suddenly DeepL got it right and spat out “lovely-looking”. (H-S13)

These observations indicate that the accuracy of DeepL’s translations is highly dependent on the precision of the input text. The participant’s correction demonstrates how a small change in the input can lead to a vastly improved translation. Learners must be cautioned to review the AI output and make revisions accordingly, emphasizing the importance of AI literacy.

4.2.2.5 Self-control and dependence

Participants expressed concerns about overreliance on MT. One participant described MT as a potential shortcut, cautioning that it might lead to complacency in developing their own English writing skills:

I kinda see MT as a shortcut sometimes. Like, I might start slacking off on practicing my own English writing skills. If I’m not careful and I give in to the convenience of MT, it could really mess me up in the long run, especially if my English isn’t that good. It’s a real temptation to just let the machine do all the work. (L-S01)

Another participant echoed this sentiment, emphasizing the importance of independent thinking in language learning:

When we hit a tough spot, it’s probably a good idea to give our brains a workout before just checking the answer, right? If we’re trying to put together an English sentence, it’s better to give it a shot ourselves before leaning on MT. (H-S13)

These reflections underscore the potential drawbacks of excessive reliance on AI tools. While they can enhance the learning experience, overdependence may impede the development of essential language skills. Therefore, it is crucial to encourage learners to strike a balance, using AI tools to supplement their efforts without allowing them to replace active learning and practice.

5 Discussion

To help Chinese EFL students improve in the complex and challenging task of story writing, this study delved into the effects of employing AI-assisted tools on Chinese EFL students’ story writing in terms of both form and content, and examined students’ attitudes toward utilizing AI-assisted tools. The investigation yielded insightful findings that shed light on the intricate dynamics between AI technology, language proficiency, and creative expression. These findings contribute fresh perspectives to the CALL field, and underscore the need for a balanced integration of technology in language education.

5.1 Enhancing story writing quality by using AI-assisted tools

The results of this study suggest that the use of AI-assisted tools has a notably positive influence on various aspects of the quality of students’ story writing, both in terms of form and content, underscoring AI-assisted tools’ capability to enhance narrative. However, the impact of AI-assisted tools on certain measures remains uncertain, which will be explained in this section.

5.1.1 Enhancing linguistic form

The significant improvements in length, coordination, and specific structures align with Cancino and Panes (2021), likely due to AI tools’ ability to generate complex sentences and suggest fluent structures, which are crucial for storytelling. The lack of improvement in subordination, as noted by Chung and Ahn (2021), may be due to AI tools’ limitations or students’ preference for simpler sentences.

The minor decline in lexical density suggests AI tools do not significantly impact the balance between content and function words. The informal nature of storytelling might explain this. In addition, AI tools correct grammar mistakes, inadvertently increasing function words.

Significant gains in verb sophistication (VS1, VS2, CVS1) indicate AI tools help students use more varied verbs, enhancing narratives. However, no significant improvement in sophisticated lexical words (LS1, LS2) suggests storytelling prioritizes vivid verbs over other complex words.

Increased lexical variety (NDW, CTTR, RTTR, VV1, SVV1, CVV1, LV, MODV) shows that AI tools enhance lexical richness. However, no significant change in noun variation (NV) suggests storytelling focuses more on verbs and adjectives.

Improved fluency, consistent with Garcia and Pena (2011) and Tsai (2020), is attributed to AI tools aiding revisions, which boost students’ confidence and proficiency. Reduced error density aligns with previous studies (Cancino and Panes 2021; Chung and Ahn 2021; Lee 2019), indicating that AI tools help correct mistakes, though punctuation and conjunction errors may persist due to language differences.

This research also reveals that AI tools benefit lower proficiency students more, narrowing gaps in lexical variation, fluency, and accuracy compared to higher proficiency peers. Additionally, AI tools particularly aid lower proficiency students in syntactic complexity, lexical variation of adverbs and adjectives, and determiner use accuracy.

5.1.2 Enhancing story content

This study fills a gap in previous research by evaluating both the form and content of students’ stories, offering insights for language learning and creative expression. The results show that AI-assisted tools significantly improve the overall content quality of students’ stories, including readability, plot, characters, setting, theme, and tone.

AI tools enhance readability by simplifying complex sentences and improving clarity. They help construct more coherent plots and richer character development. Settings become more immersive with accurate descriptions, and themes are expressed more clearly. Tone is conveyed more precisely, maintaining emotional consistency.

While AI tools have limitations and might not capture all creative nuances, their positive impact on content quality is clear. The study also considers students’ proficiency levels, revealing that lower proficiency students benefit more significantly. Initially, higher proficiency students outperformed lower ones, but after using AI tools, lower proficiency students improved to similar levels, even surpassing their higher proficiency peers in setting and tone.

Further analysis shows that AI tools significantly enhance content for lower proficiency students across all six aspects, while higher proficiency students see less pronounced improvements. This suggests AI tools are particularly beneficial for students still developing their language skills, helping them create more engaging stories.

5.2 Students’ attitudes toward using AI-assisted tools

The survey and interview findings reveal that participants generally expressed satisfaction with the use of AI-assisted tools for their English story writing. However, it is worth noting that certain challenges persisted. While AI-assisted tools demonstrate their utility in EFL story writing, their integration requires careful consideration and discretion.

5.2.1 Students’ satisfaction and perceived benefits of using AI-assisted tools

Survey results indicate that most participants were satisfied with DeepL Translator and found it helpful for various aspects of story writing, including content, vocabulary, sentence patterns, expression, and grammar. They expressed a positive willingness to continue using AI-assisted tools. This contradicts White and Heidrich (2013), who found students struggled to express their voices despite trusting MT tools. The discrepancy may be due to advancements in AI technology and increased user familiarity.

This study also analyzed responses from two proficiency groups, finding that lower proficiency students showed higher satisfaction and perceived more benefits from AI-assisted tools. This aligns with content measures indicating greater benefits for lower proficiency students, highlighting AI tools’ significant impact on story writing, especially for those with lower proficiency.

The study’s innovative approach, incorporating interviews alongside surveys, provided a deeper understanding of students’ attitudes. Interviews revealed multifaceted benefits of AI tools, emphasizing their potential to enhance language learning and creative writing. Participants appreciated user-friendly interfaces and alternative word suggestions, which could encourage reluctant learners and enrich their vocabulary.

AI tools also boosted participants’ confidence and interest, serving as motivational tools and scaffolding for vocabulary learning and sentence structure. This supports findings by Tsai (2019) and Clifford et al. (2013). Participants observed enhanced translation of longer sentences, a contrast to the findings of Jolley and Maimone (2015), which could be attributed to recent advancements in AI technology.

Fluency and accuracy emerged as significant advantages, freeing students from syntax and structure concerns to focus on narrative content. This aligns with the translanguaging approach, advocating for integrating the first language in second language learning, and supports the role of AI tools in cognitive offloading.

The integration of AI-assisted tools into language education holds immense potential not only for students but also for teachers, educators, and policymakers. For students, AI tools serve as personal tutors, providing instant feedback and helping them improve their writing skills in real-time. Teachers can leverage AI to personalize learning experiences, addressing the diverse needs of students and enabling more efficient classroom management. Educators are presented with an opportunity to redesign curricula that incorporate AI, promoting a more interactive and engaging learning environment. Policymakers, on the other hand, can facilitate this integration by supporting the development and deployment of AI technologies in educational institutions, ensuring equitable access to these resources, and fostering digital literacy. By embracing AI-assisted tools, stakeholders at all levels can contribute to a more inclusive and effective educational ecosystem, where technology and human expertise synergize to enhance learning outcomes and prepare students for the demands of the future (Weng and Chiu 2023).

5.2.2 Perceived challenges of using AI-assisted tools

Participants’ observations about DeepL highlight potential issues with AI-assisted tools. Overreliance may lead to unawareness of inaccuracies. Language educators should emphasize critical evaluation and caution against blind acceptance of AI outputs, developing learners’ discernment skills.

Discrepancies in punctuation between Chinese and English align with CALF measures showing unclear improvements in conjunction and punctuation accuracy. AI translations can introduce such issues due to differing language rules. Educators should teach effective AI tool use, stressing that while helpful, these tools are not infallible. Understanding linguistic nuances and cultural contexts is crucial, and AI tools should be aids, not substitutes, for language learning.

Challenges related to language differences and cultural nuances highlight the complexity of language. Educators can use these as opportunities to discuss language variation and cultural sensitivity, akin to Anderson’s (2013) method of using MT as a model for error correction. This can develop learners’ appreciation for linguistic subtleties and encourage culturally appropriate communication.

Concerns about overreliance on AI tools underscore the need for promoting self-control among learners (Jolley and Maimone 2022). AI tools should assist, not replace, active learning efforts. Encouraging independent attempts before seeking AI help fosters active learning, which is essential for language development (Darvishi et al. 2024; Vargas-Murillo et al. 2023).

These challenges reveal pedagogical gaps AI tools cannot bridge. Language educators should leverage these challenges as opportunities to emphasize the importance of language learning as a holistic process that involves grasping grammatical nuances, cultural context, and communicative strategies. Integrating technology with traditional language teaching methods could help learners develop a well-rounded understanding of language use.

6 Conclusion

With the motivation to help EFL students alleviate the difficulties of story writing, this study explored the impact of using AI-assisted tools on Chinese EFL students’ story writing. It enriches our understanding of the intricate interactions between AI technology and language learning, offering valuable insights into how AI-assisted tools like DeepL can empower language learners to refine their writing and enhance their creative expression in EFL story writing. AI-assisted tools have shown promise in supporting language learners by providing lexical-grammatical support and enhancing narrative quality, as evidenced by the improvements observed in the students’ writing products and students’ attitudes.

This study’s implications extend to pedagogical practices, suggesting the integration of AI-assisted tools to enhance language learning and creative writing. Educators can strategically incorporate these tools to scaffold students’ story writing tasks, allowing them to focus more on creativity and narrative coherence. However, a balanced approach is essential. Learners must be encouraged to exercise caution and critically evaluate AI-generated content to ensure accuracy and maintain their own voice in story writing. Educators can guide students in using AI-assisted tools as aids rather than substitutes for language learning efforts, promoting active learning and deeper engagement with language use.

The present study is not without limitations. Firstly, the sample size could be expanded. Future research could involve a larger sample size. Additionally, alternative methodologies, such as think-aloud protocols or stimulated recall, could be considered to reconstruct learners’ cognitive processes during the use of AI-assisted tools. These approaches may provide deeper insights into how learners’ cognitive processes contribute to the final writing product, offering a more comprehensive understanding of the effects of using AI-assisted tools on language learning. Finally, this study primarily focused on the immediate effects of AI tool use on story writing, without investigating potential long-term impacts on language acquisition or writing skill development. Future research could explore the longitudinal effects of the use of AI-assisted tools.

Corresponding author: Lijin Liang, Wuchan Zhongda Ecommerce Co., Ltd., Hangzhou, China, E-mail: lijin_liang@foxmail.com

About the author

Lijin Liang

Lijin Liang earned her MA in Applied Linguistics from Beijing Foreign Studies University. Her research interests focus on language acquisition and AI-assisted learning. In her spare time, she enjoys learning new languages and crafting stories.

Appendix A:

Survey questions

This survey intends to collect your feedback regarding DeepL’s performance when you used it to assist with your story writing. Please answer the questions truthfully. All your answers will be kept anonymous and used for research purposes only. Thank you for your cooperation.

Questions	(1) Strongly disagree	(2) Disagree	(3) Neutral	(4) Agree	(5) Strongly agree
Q1. I’m satisfied with the English story translated by DeepL Translator.
Q2. DeepL Translator is helpful for content improvement in English story writing.
Q3. DeepL Translator is helpful for vocabulary use in English story writing.
Q4. DeepL Translator is helpful for the use of sentence patterns in English story writing.
Q5. DeepL Translator is helpful for expression in English story writing.
Q6. DeepL Translator enhances the completion of English story writing.
Q7. DeepL Translator is accurate in the grammar of English story writing.
Q8. I will continue using DeepL Translator.

Appendix B:

Interview guidelines

These interview guidelines were designed to elicit more in-depth insights regarding participants’ attitudes toward using DeepL for their story writing. The interviews were conducted with eight willing participants, and each interview had a duration of approximately 20 min. The researcher had the freedom to improvise with follow-up questions and ask questions specific to the students’ unique writing products. Students were informed that the interviews were solely for research purposes.

请问您之前用过翻译软件吗? (Have you used machine translation before?)
如果用过的话, 请告知翻译软件的名称(比如有道、百度、谷歌等)。 (If so, which MT have you used?)
您是因为什么原因使用翻译软件的呢? 日常生活需要, 还是学术需要? (Did you use MT in your daily life or in academic-related scenarios?)
您使用翻译软件的频率是? (How often have you used MT?)
您认为翻译软件在写作中起到的帮助和阻碍分别有哪些? (What do you think are the advantages and disadvantages of the use of MT in writing?)
您认为DeepL翻译器有帮助到您写的这个故事吗? (Do you think you have gained improvement in writing your story with the help of DeepL Translator?)
如果有帮助, 请详细说明一下在哪些方面有帮助。可以展开举例说一下吗? (If so, what are the improvements? I’d appreciate it if you could give some examples. Any examples from your story?)
您在写作过程中, 在使用DeepL翻译器时有没有遇到什么困难? (Have you encountered any difficulties when using DeepL Translator as a tool to write your story?)
如果遇到了困难, 请详细说明。可以展开举例说一下吗? (If so, please elaborate on the difficulties. I’d appreciate it if you could give some examples. Any examples from your story?)
如果您愿意的话, 谈谈您的故事的创作心得。构思、想表达的东西, 等等，什么都可以谈。 (Please share something about the story you just created if you’d like! Anything would be fine!)

References

Albert, Ágnes & Judit Kormos. 2004. Creativity and narrative task performance: An exploratory study. Language Learning 54(2). 277–310. https://doi.org/10.1111/j.1467-9922.2004.00256.x.Search in Google Scholar

An, Xin, Ching Sing Chai, Yushun Li, Ying Zhou & Bingyu Yang. 2023. Modeling students’ perceptions of artificial intelligence assisted language learning. Computer Assisted Language Learning. 1–22. https://doi.org/10.1080/09588221.2023.2246519.Search in Google Scholar

Anderson, Don D. 2013. Machine translation as a tool in second language learning. CALICO Journal 13(1). 68–97. https://doi.org/10.1558/cj.v13i1.68-97.Search in Google Scholar

Brooks, Larry. 2011. Story engineering: Mastering the 6 core competencies of successful writing. Ohio: Writer’s Digest Books.Search in Google Scholar

Canagarajah, Suresh. 2011. Codemeshing in academic writing: Identifying teachable strategies of translanguaging. The Modern Language Journal 95(3). 401–417. https://doi.org/10.1111/j.1540-4781.2011.01207.x.Search in Google Scholar

Cancino, Marco & Jaime Panes. 2021. The impact of Google Translate on L2 writing quality measures: Evidence from Chilean EFL high school learners. System 98. 102464. https://doi.org/10.1016/j.system.2021.102464.Search in Google Scholar

Chung, Eun Seon & Soojin Ahn. 2021. The effect of using machine translation on linguistic features in L2 writing across proficiency levels and text genres. Computer Assisted Language Learning. 1–26. https://doi.org/10.1080/09588221.2020.1871029.Search in Google Scholar

Clifford, Joan, Lisa Merschel & Joan Munné. 2013. Surveying the landscape: What is the role of machine translation in language learning. Research in Education and Learning Innovation Archives 10. 108–121.Search in Google Scholar

Creswell, John. 2009. Research design: Qualitative, quantitative and mixed methods approaches, 3rd edn. Thousand Oaks, California: SAGE Publications.Search in Google Scholar

Cseh, Genevieve M. & Karl K. Jeffries. 2019. A scattered CAT: A critical evaluation of the consensual assessment technique for creativity research. Psychology of Aesthetics, Creativity, and the Arts 13(2). 159–166. https://doi.org/10.1037/aca0000220.Search in Google Scholar

Darvishi, Ali, Hassan Khosravi, Shazia Sadiq, Dragan Gašević & George Siemens. 2024. Impact of AI assistance on student agency. Computers and Education/Computers & Education 210. 104967. https://doi.org/10.1016/j.compedu.2023.104967.Search in Google Scholar

Dodigovic, Marina & Artak Tovmasyan. 2021. Automated writing evaluation: The accuracy of Grammarly’s feedback on form. International Journal of TESOL Studies 3(2). 71–87.Search in Google Scholar

D’Souza, Richard. 2021. What characterises creativity in narrative writing, and how do we assess it? Research findings from a systematic literature search. Thinking Skills and Creativity 42. 100949. https://doi.org/10.1016/j.tsc.2021.100949.Search in Google Scholar

Elkhatat, Ahmed M., Khaled Elsaid & Saeed Almeer. 2023. Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text. International Journal for Educational Integrity 19(1). https://doi.org/10.1007/s40979-023-00140-5.Search in Google Scholar

Ellis, Rod & Fangyuan Yuan. 2004. The effects of planning on fluency, complexity, and accuracy in second language narrative writing. Studies in Second Language Acquisition 26(1). 59–84. https://doi.org/10.1017/s0272263104261034.Search in Google Scholar

García, Ofelia & Angel M. Y. Lin. 2017. Translanguaging in bilingual education. In Springer eBooks, 117–130. New York: Springer.10.1007/978-3-319-02258-1_9Search in Google Scholar

Garcia, Ignacio & María Isabel Pena. 2011. Machine translation-assisted language learning: Writing for beginners. Computer Assisted Language Learning 24(5). 471–487. https://doi.org/10.1080/09588221.2011.582687.Search in Google Scholar

Graham, Steve & Dolores Perin. 2007. Writing next: Effective strategies to improve writing of adolescents in middle and high schools – A report to Carnegie Corporation of New York. Washington, DC: Alliance for Excellent Education.Search in Google Scholar

Jolley, Jason R. & Luciane Maimone. 2015. Free online machine translation: Use and perceptions by Spanish students and instructors. In Aleidine J. Moeller (ed.), Learn languages, explore cultures, transform lives, 181–200. Egg Harbor, WI: Central States Conference on the Teaching of Foreign Languages.Search in Google Scholar

Jolley, Jason R. & Luciane Maimone. 2022. Thirty years of machine translation in language teaching and learning: A review of the literature. L2 Journal 14(1). 26–44. https://doi.org/10.5070/l214151760.Search in Google Scholar

Lee, Sangmin-Michelle. 2019. The impact of using machine translation on EFL students’ writing. Computer Assisted Language Learning 33(3). 157–175. https://doi.org/10.1080/09588221.2018.1553186.Search in Google Scholar

Lu, Xiaofei. 2010. Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics 15(4). 474–496. https://doi.org/10.1075/ijcl.15.4.02lu.Search in Google Scholar

Lu, Xiaofei. 2011. A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. TESOL Quarterly 45(1). 36–62. https://doi.org/10.5054/tq.2011.240859.Search in Google Scholar

Lu, Xiaofei. 2012. The relationship of lexical richness to the quality of ESL learners’ oral narratives. The Modern Language Journal 96(2). 190–208. https://doi.org/10.1111/j.1540-4781.2011.01232_1.x.Search in Google Scholar

Niño, Ana. 2009. Machine translation in foreign language learning: Language learners’ and tutors’ perceptions of its advantages and disadvantages. ReCALL 21(2). 241–258. https://doi.org/10.1017/s0958344009000172.Search in Google Scholar

Rahmat, Yurike Nadiya, Andri Saputra, M. Arif Rahman Hakim, Eko Saputra & Reko Serasi. 2021. Learning L2 by utilizing dictionary strategies: Learner autonomy and learning strategies. Lingua Cultura 15(2). 175–181. https://doi.org/10.21512/lc.v15i2.7339.Search in Google Scholar

Smith, Cameron. 2013. Creative writing as an important tool in second language acquisition and practice. The Journal of Literature in Language Teaching 2(1). 11–18.Search in Google Scholar

Song, Cuiping & Yanping Song. 2023. Enhancing academic writing skills and motivation: Assessing the efficacy of ChatGPT in AI-assisted language learning for EFL students. Frontiers in Psychology 14. https://doi.org/10.3389/fpsyg.2023.1260843.Search in Google Scholar

Stasimioti, Maria, Vilelmini Sosoni, Katia Lida Kermanidis & Despoina Mouratidis. 2020. Machine translation quality: A comparative evaluation of SMT, NMT and tailored-NMT outputs. ACL Anthology 11. 441–450.Search in Google Scholar

Truby, John. 2008. The anatomy of story. New York: Farrar, Straus and Giroux.Search in Google Scholar

Tsai, Shu-Chiao. 2019. Using google translate in EFL drafts: A preliminary investigation. Computer Assisted Language Learning 32(5–6). 510–526. https://doi.org/10.1080/09588221.2018.1527361.Search in Google Scholar

Tsai, Shu-Chiao. 2020. Chinese students’ perceptions of using Google Translate as a translingual CALL tool in EFL writing. Computer Assisted Language Learning 35(5–6). 1250–1272. https://doi.org/10.1080/09588221.2020.1799412.Search in Google Scholar

Vargas-Murillo, Alfonso Renato, Ilda Nadia Monica De La Asuncion Pari-Bedoya & Francisco De Jesús Guevara-Soto. 2023. Challenges and opportunities of AI-assisted learning: A systematic literature review on the impact of CHATGPT usage in higher education. International Journal of Learning, Teaching and Educational Research/International Journal of Learning, Teaching and Educational Research 22(7). 122–135. https://doi.org/10.26803/ijlter.22.7.7.Search in Google Scholar

Volkart, Lise, Pierrette Bouillon & Sabrina Girletti. 2018. Statistical versus neural machine translation: A comparison of MTH and DeepL at swiss post’s language service. In Proceedings of the 40th conference translating and the computer, 145–150. Geneva: Archive ouverte UNIGE.Search in Google Scholar

Weng, Xiaojing & Thomas K. F. Chiu. 2023. Instructional design and learning outcomes of intelligent computer assisted language learning: Systematic review in the field. Computers and Education. Artificial Intelligence 4. 100117. https://doi.org/10.1016/j.caeai.2022.100117.Search in Google Scholar

White, Kelsey & Emily Heidrich. 2013. Our policies, their text: German language students’ strategies with and beliefs about web-based machine translation. Die Unterrichtspraxis/Teaching German 46(2). 230–250. https://doi.org/10.1111/tger.10143.Search in Google Scholar

Wu, Hsiao-Ping & Esther V. Garza. 2014. Types and attributes of English writing errors in the EFL context – A study of error analysis. Journal of Language Teaching and Research 5(6). https://doi.org/10.4304/jltr.5.6.1256-1262.Search in Google Scholar

Yang, Weiwei & YouJin Kim. 2018. The effect of topic familiarity on the complexity, accuracy, and fluency of second language writing. Applied Linguistics Review 11(1). 79–108. https://doi.org/10.1515/applirev-2017-0017.Search in Google Scholar

Yulianto, Ahmad & Rina Supriatnaningsih. 2021. Google translate versus DeepL: A quantitative evaluation of close-language pair translation (French to English). AJELP: Asian Journal of English Language and Pedagogy 9(2). 109–127.10.37134/ajelp.vol9.2.9.2021Search in Google Scholar

Received: 2024-06-17

Accepted: 2024-08-28

Published Online: 2024-11-01

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/jccall-2024-0009

Keywords for this article

computer-assisted language learning; AI-assisted tools; story writing; second language acquisition; DeepL Translator

Creative Commons

BY 4.0