Enhancing automated essay scoring with GCNs and multi-level features for robust multidimensional assessments

Xiaoyi Tang; Daoyu Lin; Kexin Li

doi:10.1515/lingvan-2024-0184

Artikel

Enhancing automated essay scoring with GCNs and multi-level features for robust multidimensional assessments

, und

Veröffentlicht/Copyright: 23. Dezember 2024

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

Aus der Zeitschrift Linguistics Vanguard Band 10 Heft 1

Abstract

The advancement of automated essay scoring (AES) is pivotal in alleviating the burdens on educators and ensuring fair and reliable writing assessments. While deep neural networks have enhanced AES accuracy, they frequently encounter challenges in capturing comprehensive contextual features and achieving generalizability across multiple dimensions of writing quality. To address these limitations, we propose GCNs-MTL+Multi-Level Features, a novel integration of graph convolutional networks (GCNs) with the pretrained BERT model and multi-task learning (MTL). Our proposed approach significantly enriches essay representation fidelity by incorporating both word-level and sentence-level features, thereby enhancing transparency and improving the robustness and accuracy of writing evaluations across holistic and analytic rating scales. By sharing representations across AES tasks, GCNs-MTL+Multi-Level Features streamlines the evaluation process, establishing a new benchmark in multidimensional writing assessments.

Keywords: automated essay scoring; graph convolutional networks; multi-task learning; the pretrained language model

Corresponding author: Xiaoyi Tang, School of Foreign Studies, University of Science and Technology Beijing, Beijing, China, E-mail: tangxiaoyi@ustb.edu.cn

Funding source: the MOE (Ministry of Education in China) Project of Humanities and Social Sciences

Award Identifier / Grant number: 23YJCZH197

Acknowledgement

This research was funded by the MOE (Ministry of Education in China) Project of Humanities and Social Sciences (No. 23YJCZH197).

References

Bacha, Nahla. 2001. Writing evaluation: What can analytic versus holistic essay scoring tell us? System 29(3). 371–383. https://doi.org/10.1016/s0346-251x(01)00025-2.Suche in Google Scholar

Beseiso, Majdi, Omar A. Alzubi & Hasan Rashaideh. 2021. A novel automated essay scoring approach for reliable higher educational assessments. Journal of Computing in Higher Education 33. 727–746. https://doi.org/10.1007/s12528-021-09283-1.Suche in Google Scholar

Cho, Minsoo, Jin-Xia Huang & Oh-Woog Kwon. 2024. Dual-scale BERT using multi-trait representations for holistic and trait-specific essay grading. ETRI Journal 46(1). 82–95. https://doi.org/10.4218/etrij.2023-0324.Suche in Google Scholar

Crossley, Scott A., Kristopher Kyle, Laura K. Allen, Liang Guo & Danielle S. McNamara. 2014. Linguistic microfeatures to predict L2 writing proficiency: A case study in automated writing evaluation. Journal of Writing Assessment 7(1).Suche in Google Scholar

Crossley, Scott A., Kristopher Kyle & Mihai Dascalu. 2019. The tool for the automatic analysis of Cohesion 2.0: Integrating semantic similarity and text overlap. Behavior Research Methods 51. 14–27. https://doi.org/10.3758/s13428-018-1142-4.Suche in Google Scholar

Crossley, Scott A. & Danielle McNamara. 2011. Text coherence and judgments of essay quality: Models of quality and coherence. In Proceedings of the annual meeting of the cognitive science society, Vol. 33. Avaialable at https://escholarship.org/uc/item/5cp1x9r2.Suche in Google Scholar

Crossley, Scott A., Tom Salsbury & Danielle S. McNamara. 2010. The development of semantic relations in second language speakers: A case for latent semantic analysis. Vigo International Journal of Applied Linguistics 7. 55–74.Suche in Google Scholar

Dascalu, Mihai, Ciprian Dobre, Stefan Trausan-Matu & Valentin Cristea. 2011. Beyond traditional NLP: A distributed solution for optimizing chat processing – Automatic chat assessment using tagged latent semantic analysis. In 2011 10th International Symposium on Parallel and distributed computing, 133–138. IEEE.10.1109/ISPDC.2011.28Suche in Google Scholar

Dascalu, Mihai, Danielle S. McNamara, Stefan Trausan-Matu & Laura K. Allen. 2018. Cohesion network analysis of CSCL participation. Behavior Research Methods 50. 604–619. https://doi.org/10.3758/s13428-017-0888-4.Suche in Google Scholar

Dennis, Simon. 2007. Introducing word order within the LSA framework. In Thomas K. Landauer, Danielle S. McNamara, Simon Dennis & Walter Kintsch (eds.), Handbook of latent semantic analysis, 461–476. New York: Psychology Press.Suche in Google Scholar

Devlin, Jacob, Ming-Wei Chang, Kenton Lee & Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv. https://doi.org/10.48550/arXiv.1810.04805.Suche in Google Scholar

Dong, Fei, Yue Zhang & Jie Yang. 2017. Attention-based recurrent convolutional neural network for automatic essay scoring. In Proceedings of the 21st conference on computational natural language learning (CoNLL 2017), 153–162.10.18653/v1/K17-1017Suche in Google Scholar

Harsch, Claudia & Guido Martin. 2013. Comparing holistic and analytic scoring methods: Issues of validity and reliability. Assessment in Education: Principles, Policy & Practice 20(3). 281–307. https://doi.org/10.1080/0969594x.2012.742422.Suche in Google Scholar

Huot, Brian. 1990. Reliability, validity, and holistic scoring: What we know and what we need to know. College Composition & Communication 41(2). 201–213. https://doi.org/10.2307/358160.Suche in Google Scholar

Kumar, Vivekanandan S. & David Boulanger. 2021. Automated essay scoring and the deep learning black box: How are rubric scores determined? International Journal of Artificial Intelligence in Education 31. 538–584. https://doi.org/10.1007/s40593-020-00211-5.Suche in Google Scholar

Kyle, Kristopher, Scott A. Crossley & Cynthia Berger. 2018. The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behavior Research Methods 50. 1030–1046. https://doi.org/10.3758/s13428-017-0924-4.Suche in Google Scholar

Li, Xia, Huali Yang, Shengze Hu, Jing Geng, Keke Lin & Yuhai Li. 2022. Enhanced hybrid neural network for automated essay scoring. Expert Systems 39(10). e13068. https://doi.org/10.1111/exsy.13068.Suche in Google Scholar

Mayfield, Elijah & Alan W. Black. 2020. Should you fine-tune BERT for automated essay scoring? In Proceedings of the fifteenth workshop on innovative use of NLP for building educational applications, 151–162.10.18653/v1/2020.bea-1.15Suche in Google Scholar

McNamara, Danielle S. 2011. Computational methods to extract meaning from text and advance theories of human cognition. Topics in Cognitive Science 3(1). 3–17. https://doi.org/10.1111/j.1756-8765.2010.01117.x.Suche in Google Scholar

McNamara, Danielle S., Scott A. Crossley & Philip M. McCarthy. 2010. Linguistic features of writing quality. Written Communication 27(1). 57–86. https://doi.org/10.1177/0741088309351547.Suche in Google Scholar

McNamara, Danielle S., Scott A. Crossley, Rod D. Roscoe, Laura K. Allen & Jianmin Dai. 2015. A hierarchical classification approach to automated essay scoring. Assessing Writing 23. 35–59. https://doi.org/10.1016/j.asw.2014.09.002.Suche in Google Scholar

Mizumoto, Atsushi & Masaki Eguchi. 2023. Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics 2(2). 100050. https://doi.org/10.1016/j.rmal.2023.100050.Suche in Google Scholar

Muangkammuen, Panitan & Fumiyo Fukumoto. 2020. Multi-task learning for automated essay scoring with sentiment analysis. In Proceedings of the 1st conference of the Asia-Pacific chapter of the association for computational linguistics and the 10th international joint conference on natural language processing: Student research workshop, 116–123.10.18653/v1/2020.aacl-srw.17Suche in Google Scholar

Ramesh, Dadi & Suresh Kumar Sanampudi. 2022. An automated essay scoring systems: A systematic literature review. Artificial Intelligence Review 55(3). 2495–2527. https://doi.org/10.1007/s10462-021-10068-2.Suche in Google Scholar

Shin, Jinnie & Mark J. Gierl. 2021. More efficient processes for creating automated essay scoring frameworks: A demonstration of two algorithms. Language Testing 38(2). 247–272. https://doi.org/10.1177/0265532220937830.Suche in Google Scholar

Taghipour, Kaveh & Hwee Tou Ng. 2016. A neural approach to automated essay scoring. In Proceedings of the 2016 conference on empirical methods in natural language processing, 1882–1891.10.18653/v1/D16-1193Suche in Google Scholar

Wang, Yongjie, Chuan Wang, Ruobing Li & Hui Lin. 2022. On the use of BERT for automated essay scoring: Joint learning of multi-scale essay representation. arXiv. https://doi.org/10.48550/arXiv.2205.03835.Suche in Google Scholar

Weigle, Sara Cushing. 2002. Assessing writing. Cambridge: Cambridge University Press.10.1017/CBO9780511732997Suche in Google Scholar

Wu, Felix, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu & Kilian Weinberger. 2019. Simplifying graph convolutional networks. In International conference on machine learning, 6861–6871. PMLR Available at: https://proceedings.mlr.press/v97/ (visited on 12/06/2024).Suche in Google Scholar

Zhou, Jie, Jimmy Xiangji Huang, Qinmin Vivian Hu & Liang He. 2020. SK-GCN: Modeling syntax and knowledge via graph convolutional network for aspect-level sentiment classification. Knowledge-Based Systems 205. 106292. https://doi.org/10.1016/j.knosys.2020.106292.Suche in Google Scholar

Received: 2024-07-22

Accepted: 2024-11-19

Published Online: 2024-12-23

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/lingvan-2024-0184

Schlagwörter für diesen Artikel

automated essay scoring; graph convolutional networks; multi-task learning; the pretrained language model