Abstract
The development of generative AI presents both opportunities and challenges for language teaching. Understanding the linguistic features of AI-generated texts is essential, as it supports users in engaging with AI critically and appropriately for writing tasks. While existing studies have predominantly focused on English writing, the present study examines German argumentative essays produced by ChatGPT, DeepSeek, L1 speakers, and L2 learners, with a focus on linguistic complexity. The results reveal that AI-generated essays generally exhibit higher linguistic complexity. Specifically, DeepSeek essays demonstrate greater lexical complexity, whereas ChatGPT essays are characterized by more complex syntax. In comparison to human-authored essays, AI-generated essays tend to be more formal, marked by frequent nominalizations and a more extensive use of conjunctions. The findings are further interpreted in light of prior research and the underlying mechanisms of generative AI. Based on these results, pedagogical implications for foreign language writing instruction are proposed.
Literature
Abdel Latif, Muhammad M. Mahmoud. 2013. What do we mean by writing fluency and how can it be validly measured? Applied Linguistics 34(1). 99–105. https://doi.org/10.1093/applin/ams073.Search in Google Scholar
Alamleh, Hosam, Ali Abdullah, S. & AbdElRahmanElSaid. 2023. Distinguishing human-written and ChatGPT-generated text using machine learning. In IEEE symposium on systems and information engineering design, SIEDS, 154–158. Charlottesville: University of Virginia.10.1109/SIEDS58326.2023.10137767Search in Google Scholar
Amirjalili, Forough, Masoud Neysani & Ahmadreza Nikbakht. 2024. Exploring the boundaries of authorship: A comparative analysis of AI-generated text and human academic writing in English literature. Frontiers in Education 9. 1347421. https://doi.org/10.3389/feduc.2024.1347421.Search in Google Scholar
Auswärtiges Amt. 2020. Deutsch als Fremdsprache weltweit. Datenerhebung 2020. Berlin: Auswärtiges Amt.Search in Google Scholar
Berriche, Lamia & Souad Larabi-Marie-Sainte. 2024. Unveiling ChatGPT text using writing style. Heliyon 10. e32976. https://doi.org/10.1016/j.heliyon.2024.e32976.Search in Google Scholar
Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever & Dario Amodei. 2020. Language models are few-shot learners. In Proceedings of the 34th international conference on neural information processing systems, 1877–1901. Red Hook: Curran Associates Inc.Search in Google Scholar
Bulté, Bram & Alex Housen. 2012. Defining and operationalising L2 complexity. In Alex Housen, Folkert Kuiken & Ineke Vedder (eds.), Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA, 21–46. Amsterdam: John Benjamins. https://benjamins.com/catalog/lllt.32.02bul (accessed 14 May 2025).10.1075/lllt.32.02bulSearch in Google Scholar
Casal, J. Elliott & Matt Kessler. 2023. Can linguists distinguish between ChatGPT/AI and human writing? A study of research ethics and academic publishing. Research Methods in Applied Linguistics 2. 100068. https://doi.org/10.1016/j.rmal.2023.100068.Search in Google Scholar
Chen, Xiaobin & Detmar Meurers. 2016. CTAP: A web-based tool supporting automatic complexity analysis. In Proceedings of the workshop on computational linguistics for linguistic complexity (CL4LC), 113–119. Osaka: The COLING 2016 Organizing Committee.Search in Google Scholar
Cohen, Jacob. 1988. Statistical power analysis for the behavioral sciences (2nd edition. Hillsdale: Lawrence Erlbaum Associates.Search in Google Scholar
DeepSeek-AI. 2025. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv 2501.12948. https://arxiv.org/abs/2501.12948 (accessed 14 May 2025).Search in Google Scholar
Desaire, Heather, Aleesa E. Chua, Madeline Isom, Romana Jarosova & David Hua. 2023. Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools. Cell Reports Physical Science 4. 101426. https://doi.org/10.1016/j.xcrp.2023.101426.Search in Google Scholar
Ellis, Rod. 2003. Task-based language learning and teaching. Oxford: Oxford University Press.Search in Google Scholar
Engel, Ulrich. 1996. Deutsche Grammatik. 3. korrigierte Auflage. Heidelberg: Groos.Search in Google Scholar
Godwin-Jones, Robert. 2024. Distributed agency in second language learning and teaching through generative AI. Language Learning and Technology 28(2). 5–31. https://doi.org/10.64152/10125/73570.Search in Google Scholar
Goulart, Larissa, Marine Laísa Matte, Alanna Mendoza, Lee Alvarado & Ingrid Veloso. 2024. AI or student writing? Analyzing the situational and linguistic characteristics of undergraduate student writing and AI-generated assignments. Journal of Second Language Writing 66. 101160. https://doi.org/10.1016/j.jslw.2024.101160.Search in Google Scholar
Hancke, Julia. 2013. Automatic prediction of CEFR proficiency levels based on linguistic features of learner language. Tübingen: Universität Tübingen MA thesis.Search in Google Scholar
Herbold, Steffen, Annett Hautli-Janisz, Ute Heuer, Zlata Kikteva & Alexander Trautsch. 2023. A large-scale comparison of human-written versus ChatGPT-generated essays. Scientific Reports 13. 18617. https://doi.org/10.1038/s41598-023-45644-9.Search in Google Scholar
Housen, Alex & Folkert Kuiken. 2009. Complexity, accuracy, and fluency in second language acquisition. Applied Linguistics 30(4). 461–473. https://doi.org/10.1093/applin/amp048.Search in Google Scholar
Islam, Niful, Debopom Sutradhar, Humaira Noor, Jarin Tasnim Raya, Monowara Tabassum Maisha & Dewan Md. Farid. 2023. Distinguishing human generated text from ChatGPT generated text using machine learning. arXiv 2306.01761. https://arxiv.org/abs/2306.01761 (accessed 14 May 2025).Search in Google Scholar
Jiang, Feng & Ken Hyland. 2024. Does ChatGPT argue like students? Bundles in argumentative essays. Applied Linguistics. 1–17. https://academic.oup.com/applij/advance-article-abstract/doi/10.1093/applin/amae052/7736875?redirectedFrom=fulltext (accessed 15 May 2025).Search in Google Scholar
Johnson, Rebecca L., Giada Pistilli, Natalia Menédez-González, Leslye Denisse Dias Duran, Enrico Panai, Julija Kalpokiene & Donald Jay Bertulfo. 2022. The ghost in the machine has an American accent: Value conflict in GPT-3. arXiv 2203.07785 https://arxiv.org/abs/2203.07785 (accessed 14 May 2025).Search in Google Scholar
Kar, Sujita Kumar, Teena Bansal, Sumit Modi & Amit Singh. 2024. How sensitive are the free AI-detector tools in detecting AI-generated texts? A comparison of popular AI-detector tools. Indian Journal of Psychological Medicine. 1–4. https://journals.sagepub.com/doi/full/10.1177/02537176241247934 (accessed 15 May 2025).Search in Google Scholar
Kasneci, Enkelejda, Kathrin Sessler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, Stephan Krusche, Gitta Kutyniok, Tilman Michaeli, Claudia Nerdel, Jürgen Pfeffer, Oleksandra Poquet, Michael Sailer, Albrecht Schmidt, Tina Seidel, Matthias Stadler, Jochen Weller, Jochen Kuhn & Gjergji Kasneci. 2023. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences 103. 102274. https://doi.org/10.1016/j.lindif.2023.102274.Search in Google Scholar
Köbis, Nils & Luca D. Mossink. 2021. Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry. Computers in Human Behavior 114. 106553. https://doi.org/10.1016/j.chb.2020.106553.Search in Google Scholar
Lavalley, Rémi, Kay Berkling & Sebastian Stüker. 2015. Preparing children’s writing database for automated processing. In Language teaching, Learning and technology (LTLT-2015), 9–15. Leipzig. https://www.isca-archive.org/ltlt_2015/lavalley15_ltlt.html (accessed 15 May 2025).Search in Google Scholar
Li, Manli & ShuwenLi. 2021. 学术语言的概念、特征及教育意义 [The concept, features and educational significance of academic language]. Educational Research 42(6). 37–48.Search in Google Scholar
Liang, Weixin, Mert Yuksekgonul, Yining Mao, Eric Wu & James Zou. 2023. GPT detectors are biased against non-native English writers. Patterns 4. 1–4. https://doi.org/10.1016/j.patter.2023.100779.Search in Google Scholar
Lu, Xiaofei. 2010. Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics 15(4). 474–496. https://doi.org/10.1075/ijcl.15.4.02lu.Search in Google Scholar
Lu, Xiaofei. 2011. A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. Tesol Quarterly 45(1). 36–62. https://doi.org/10.5054/tq.2011.240859.Search in Google Scholar
Mizumoto, Atsushi, Sachiko Yasuda & Yu Tamura. 2024. Identifying ChatGPT-generated texts in EFL students’ writing: Through comparative analysis of linguistic fingerprints. Applied Corpus Linguistics (4). 100106. https://doi.org/10.1016/j.acorp.2024.100106.Search in Google Scholar
Neary-Sundquist, Colleen A. 2017. Syntactic complexity at multiple proficiency levels of L2 German speech. International Journal of Applied Linguistics 27(1). 242–262. https://doi.org/10.1111/ijal.12128.Search in Google Scholar
OpenAI. 2023. GPT-4 technical report. arXiv 2303.08774 https://arxiv.org/abs/2303.08774 (accessed 14 May 2025).Search in Google Scholar
Ortega, Lourdes. 2003. Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics 24(4). 492–518. https://doi.org/10.1093/applin/24.4.492.Search in Google Scholar
Reznicek, Marc, Anke Lüdeling, Cedric Krummes, Franziska Schwantuschke, Maik Walter, Karin Schmidt, Hagen Hirschmann & Torsten Andreas. 2012. Das Falko-Handbuch Korpusaufbau und Annotationen Version 2.01. https://www.linguistik.hu-berlin.de/de/institut/professuren/korpuslinguistik/forschung/falko/FalkoHandbuchV2/view (accessed 14 May 2025).Search in Google Scholar
Sardinha, Tony Berber. 2024. AI-generated vs human-authored texts: A multidimensional comparison. Applied Corpus Linguistics 4. 100083. https://doi.org/10.1016/j.acorp.2023.100083.Search in Google Scholar
Shah, Aditya, Prateek Ranka, Urmi Dedhia, Shruti Prasad, Siddhi Muni & Kiran Bhowmick. 2023. Detecting and unmasking AI-generated texts through explainable artificial intelligence using stylistic features. International Journal of Advanced Computer Science and Applications 14(10). 1043–1053. https://doi.org/10.14569/ijacsa.2023.01410110.Search in Google Scholar
Skehan, Peter. 2009. Modelling second language performance: Integrating complexity, accuracy, fluency, and lexis. Applied Linguistics 30(4). 510–532. https://doi.org/10.1093/applin/amp047.Search in Google Scholar
Theocharopoulos, Panagiotis C., Panagiotis Anagnostou, Anastasia Tsoukala, Spiros V. Georgakopoulos, Sotiris K. Tasoulis & Vassilis P. Plagianakos. 2023. Detection of fake generated scientific abstracts. In IEEE ninth international Conference on big data computing Service and applications BigDataService 2023, 33–39. Athens: IEEE Computer Society Conference Publishing Services.10.1109/BigDataService58306.2023.00011Search in Google Scholar
Wei, Yuming, Kai Jia, Runxi Zeng, Zhe He, Lin Qiu, Wenxuan Yu, Man Tang, Huang Huang, Xiong Zeng, Hong Zhang, Lei Zheng, Huiping Zhang, Xiaoyu Zhang, Jing Zhao, Hongyu Fu & Yuhao Jiang. 2025. DeepSeek 突破效应下的人工智能创新发展与治理变革 [The innovation and governance transformation of artificial intelligence under the DeepSeek breakthrough effect]. E-government 3. 2–39.Search in Google Scholar
Weiss, Zarah. 2017. Using measures of linguistic complexity to assess German L2 proficiency in learner corpora under consideration of task-effects. Tübingen: Universität Tübingen MA thesis.Search in Google Scholar
Weiss, Zarah & Detmar Meurers. 2019a. Analyzing linguistic complexity and accuracy in academic language development of German across elementary and secondary school. In Proceedings of the fourteenth workshop on innovative use of NLP for building educational applications, 380–393. Florence: Association for Computational Linguistics.10.18653/v1/W19-4440Search in Google Scholar
Weiss, Zarah & Detmar Meurers. 2019b. Broad linguistic modeling is beneficial for German L2 proficiency assessment. In Andrea Abel, Aivars Glaznieks, Verena Lyding & Lionel Nicolas (eds.), Widening the scope of learner corpus research. Selected papers from the fourth learner corpus research conference, 419–435. Louvain-la-Neuve: Presses universitaires de Louvain.Search in Google Scholar
Wendler, Chris, Veniamin Veselovsky, Giovanni Monea & Robert West. 2024. Do Llamas work in English? On the latent language of multilingual transformers. In Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers), 15366–15394. Bangkok: Association for Computational Linguistics.10.18653/v1/2024.acl-long.820Search in Google Scholar
Wolfe-Quintero, Kate, Shunji Inagaki & Hae-Young Kim. 1998. Second language development in writing: Measures of fluency, accuracy, and complexity. Honolulu: University of Hawaii Press.Search in Google Scholar
Yang, Lu & Rui Li. 2024. ChatGPT for L2 learning: Current status and implications. System 124. 103351. https://doi.org/10.1016/j.system.2024.103351.Search in Google Scholar
Zhang, Mengxuan & Peter Crosthwaite. 2025. More human than human? Differences in lexis and collocation within academic essays produced by ChatGPT-3.5 and human L2 writers. International Review of Applied Linguistics in Language Teaching. 1–28. https://www.degruyterbrill.com/document/doi/10.1515/iral-2024-0196/html (accessed 15 May 2025).10.1515/iral-2024-0196Search in Google Scholar
Zhou, Tongquan, Siyi Cao, Siruo Zhou, Yao Zhang & Aijing He. 2023. Chinese intermediate English learners outdid ChatGPT in deep cohesion: Evidence from English narrative writing. System 118. 103141. https://doi.org/10.1016/j.system.2023.103141.Search in Google Scholar
© 2025 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Articles
- AI-generated, L2 learner, and native German writing: a comparative analysis of linguistic complexity
- More similar than needed: Czech exam texts from the perspective of quantitative linguistics
- A study of metadiscourse markers in formal writings
- Sprachliche Resonanzen des Krieges: Transferneologismen im Deutschen, Ukrainischen und Russischen
- Das deutsch-ukrainische Online-Fachwörterbücherportal: Entwicklung und Datenmodellierung
- Book Reviews
- Andreea S. Calude: The Linguistics of Social Media: An Introduction
- von Csaba Földes: Auslandsdeutsche Pressesprache in Europa, Asien und Nordamerika
Articles in the same Issue
- Frontmatter
- Articles
- AI-generated, L2 learner, and native German writing: a comparative analysis of linguistic complexity
- More similar than needed: Czech exam texts from the perspective of quantitative linguistics
- A study of metadiscourse markers in formal writings
- Sprachliche Resonanzen des Krieges: Transferneologismen im Deutschen, Ukrainischen und Russischen
- Das deutsch-ukrainische Online-Fachwörterbücherportal: Entwicklung und Datenmodellierung
- Book Reviews
- Andreea S. Calude: The Linguistics of Social Media: An Introduction
- von Csaba Földes: Auslandsdeutsche Pressesprache in Europa, Asien und Nordamerika