Abstract
The advancement of automated essay scoring (AES) is pivotal in alleviating the burdens on educators and ensuring fair and reliable writing assessments. While deep neural networks have enhanced AES accuracy, they frequently encounter challenges in capturing comprehensive contextual features and achieving generalizability across multiple dimensions of writing quality. To address these limitations, we propose GCNs-MTL+Multi-Level Features, a novel integration of graph convolutional networks (GCNs) with the pretrained BERT model and multi-task learning (MTL). Our proposed approach significantly enriches essay representation fidelity by incorporating both word-level and sentence-level features, thereby enhancing transparency and improving the robustness and accuracy of writing evaluations across holistic and analytic rating scales. By sharing representations across AES tasks, GCNs-MTL+Multi-Level Features streamlines the evaluation process, establishing a new benchmark in multidimensional writing assessments.
Funding source: the MOE (Ministry of Education in China) Project of Humanities and Social Sciences
Award Identifier / Grant number: 23YJCZH197
Acknowledgement
This research was funded by the MOE (Ministry of Education in China) Project of Humanities and Social Sciences (No. 23YJCZH197).
References
Bacha, Nahla. 2001. Writing evaluation: What can analytic versus holistic essay scoring tell us? System 29(3). 371–383. https://doi.org/10.1016/s0346-251x(01)00025-2.Suche in Google Scholar
Beseiso, Majdi, Omar A. Alzubi & Hasan Rashaideh. 2021. A novel automated essay scoring approach for reliable higher educational assessments. Journal of Computing in Higher Education 33. 727–746. https://doi.org/10.1007/s12528-021-09283-1.Suche in Google Scholar
Cho, Minsoo, Jin-Xia Huang & Oh-Woog Kwon. 2024. Dual-scale BERT using multi-trait representations for holistic and trait-specific essay grading. ETRI Journal 46(1). 82–95. https://doi.org/10.4218/etrij.2023-0324.Suche in Google Scholar
Crossley, Scott A., Kristopher Kyle, Laura K. Allen, Liang Guo & Danielle S. McNamara. 2014. Linguistic microfeatures to predict L2 writing proficiency: A case study in automated writing evaluation. Journal of Writing Assessment 7(1).Suche in Google Scholar
Crossley, Scott A., Kristopher Kyle & Mihai Dascalu. 2019. The tool for the automatic analysis of Cohesion 2.0: Integrating semantic similarity and text overlap. Behavior Research Methods 51. 14–27. https://doi.org/10.3758/s13428-018-1142-4.Suche in Google Scholar
Crossley, Scott A. & Danielle McNamara. 2011. Text coherence and judgments of essay quality: Models of quality and coherence. In Proceedings of the annual meeting of the cognitive science society, Vol. 33. Avaialable at https://escholarship.org/uc/item/5cp1x9r2.Suche in Google Scholar
Crossley, Scott A., Tom Salsbury & Danielle S. McNamara. 2010. The development of semantic relations in second language speakers: A case for latent semantic analysis. Vigo International Journal of Applied Linguistics 7. 55–74.Suche in Google Scholar
Dascalu, Mihai, Ciprian Dobre, Stefan Trausan-Matu & Valentin Cristea. 2011. Beyond traditional NLP: A distributed solution for optimizing chat processing – Automatic chat assessment using tagged latent semantic analysis. In 2011 10th International Symposium on Parallel and distributed computing, 133–138. IEEE.10.1109/ISPDC.2011.28Suche in Google Scholar
Dascalu, Mihai, Danielle S. McNamara, Stefan Trausan-Matu & Laura K. Allen. 2018. Cohesion network analysis of CSCL participation. Behavior Research Methods 50. 604–619. https://doi.org/10.3758/s13428-017-0888-4.Suche in Google Scholar
Dennis, Simon. 2007. Introducing word order within the LSA framework. In Thomas K. Landauer, Danielle S. McNamara, Simon Dennis & Walter Kintsch (eds.), Handbook of latent semantic analysis, 461–476. New York: Psychology Press.Suche in Google Scholar
Devlin, Jacob, Ming-Wei Chang, Kenton Lee & Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv. https://doi.org/10.48550/arXiv.1810.04805.Suche in Google Scholar
Dong, Fei, Yue Zhang & Jie Yang. 2017. Attention-based recurrent convolutional neural network for automatic essay scoring. In Proceedings of the 21st conference on computational natural language learning (CoNLL 2017), 153–162.10.18653/v1/K17-1017Suche in Google Scholar
Harsch, Claudia & Guido Martin. 2013. Comparing holistic and analytic scoring methods: Issues of validity and reliability. Assessment in Education: Principles, Policy & Practice 20(3). 281–307. https://doi.org/10.1080/0969594x.2012.742422.Suche in Google Scholar
Huot, Brian. 1990. Reliability, validity, and holistic scoring: What we know and what we need to know. College Composition & Communication 41(2). 201–213. https://doi.org/10.2307/358160.Suche in Google Scholar
Kumar, Vivekanandan S. & David Boulanger. 2021. Automated essay scoring and the deep learning black box: How are rubric scores determined? International Journal of Artificial Intelligence in Education 31. 538–584. https://doi.org/10.1007/s40593-020-00211-5.Suche in Google Scholar
Kyle, Kristopher, Scott A. Crossley & Cynthia Berger. 2018. The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behavior Research Methods 50. 1030–1046. https://doi.org/10.3758/s13428-017-0924-4.Suche in Google Scholar
Li, Xia, Huali Yang, Shengze Hu, Jing Geng, Keke Lin & Yuhai Li. 2022. Enhanced hybrid neural network for automated essay scoring. Expert Systems 39(10). e13068. https://doi.org/10.1111/exsy.13068.Suche in Google Scholar
Mayfield, Elijah & Alan W. Black. 2020. Should you fine-tune BERT for automated essay scoring? In Proceedings of the fifteenth workshop on innovative use of NLP for building educational applications, 151–162.10.18653/v1/2020.bea-1.15Suche in Google Scholar
McNamara, Danielle S. 2011. Computational methods to extract meaning from text and advance theories of human cognition. Topics in Cognitive Science 3(1). 3–17. https://doi.org/10.1111/j.1756-8765.2010.01117.x.Suche in Google Scholar
McNamara, Danielle S., Scott A. Crossley & Philip M. McCarthy. 2010. Linguistic features of writing quality. Written Communication 27(1). 57–86. https://doi.org/10.1177/0741088309351547.Suche in Google Scholar
McNamara, Danielle S., Scott A. Crossley, Rod D. Roscoe, Laura K. Allen & Jianmin Dai. 2015. A hierarchical classification approach to automated essay scoring. Assessing Writing 23. 35–59. https://doi.org/10.1016/j.asw.2014.09.002.Suche in Google Scholar
Mizumoto, Atsushi & Masaki Eguchi. 2023. Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics 2(2). 100050. https://doi.org/10.1016/j.rmal.2023.100050.Suche in Google Scholar
Muangkammuen, Panitan & Fumiyo Fukumoto. 2020. Multi-task learning for automated essay scoring with sentiment analysis. In Proceedings of the 1st conference of the Asia-Pacific chapter of the association for computational linguistics and the 10th international joint conference on natural language processing: Student research workshop, 116–123.10.18653/v1/2020.aacl-srw.17Suche in Google Scholar
Ramesh, Dadi & Suresh Kumar Sanampudi. 2022. An automated essay scoring systems: A systematic literature review. Artificial Intelligence Review 55(3). 2495–2527. https://doi.org/10.1007/s10462-021-10068-2.Suche in Google Scholar
Shin, Jinnie & Mark J. Gierl. 2021. More efficient processes for creating automated essay scoring frameworks: A demonstration of two algorithms. Language Testing 38(2). 247–272. https://doi.org/10.1177/0265532220937830.Suche in Google Scholar
Taghipour, Kaveh & Hwee Tou Ng. 2016. A neural approach to automated essay scoring. In Proceedings of the 2016 conference on empirical methods in natural language processing, 1882–1891.10.18653/v1/D16-1193Suche in Google Scholar
Wang, Yongjie, Chuan Wang, Ruobing Li & Hui Lin. 2022. On the use of BERT for automated essay scoring: Joint learning of multi-scale essay representation. arXiv. https://doi.org/10.48550/arXiv.2205.03835.Suche in Google Scholar
Weigle, Sara Cushing. 2002. Assessing writing. Cambridge: Cambridge University Press.10.1017/CBO9780511732997Suche in Google Scholar
Wu, Felix, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu & Kilian Weinberger. 2019. Simplifying graph convolutional networks. In International conference on machine learning, 6861–6871. PMLR Available at: https://proceedings.mlr.press/v97/ (visited on 12/06/2024).Suche in Google Scholar
Zhou, Jie, Jimmy Xiangji Huang, Qinmin Vivian Hu & Liang He. 2020. SK-GCN: Modeling syntax and knowledge via graph convolutional network for aspect-level sentiment classification. Knowledge-Based Systems 205. 106292. https://doi.org/10.1016/j.knosys.2020.106292.Suche in Google Scholar
© 2024 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Frontmatter
- Editorial
- Editorial 2024
- Phonetics & Phonology
- The role of recoverability in the implementation of non-phonemic glottalization in Hawaiian
- Epenthetic vowel quality crosslinguistically, with focus on Modern Hebrew
- Japanese speakers can infer specific sub-lexicons using phonotactic cues
- Articulatory phonetics in the market: combining public engagement with ultrasound data collection
- Investigating the acoustic fidelity of vowels across remote recording methods
- The role of coarticulatory tonal information in Cantonese spoken word recognition: an eye-tracking study
- Tracking phonological regularities: exploring the influence of learning mode and regularity locus in adult phonological learning
- Morphology & Syntax
- #AreHashtagsWords? Structure, position, and syntactic integration of hashtags in (English) tweets
- The meaning of morphomes: distributional semantics of Spanish stem alternations
- A refinement of the analysis of the resultative V-de construction in Mandarin Chinese
- L2 cognitive construal and morphosyntactic acquisition of pseudo-passive constructions
- Semantics & Pragmatics
- “All women are like that”: an overview of linguistic deindividualization and dehumanization of women in the incelosphere
- Counterfactual language, emotion, and perspective: a sentence completion study during the COVID-19 pandemic
- Constructing elderly patients’ agency through conversational storytelling
- Language Documentation & Typology
- Conative animal calls in Macha Oromo: function and form
- The syntax of African American English borrowings in the Louisiana Creole tense-mood-aspect system
- Syntactic pausing? Re-examining the associations
- Bibliographic bias and information-density sampling
- Historical & Comparative Linguistics
- Revisiting the hypothesis of ideophones as windows to language evolution
- Verifying the morpho-semantics of aspect via typological homogeneity
- Psycholinguistics & Neurolinguistics
- Sign recognition: the effect of parameters and features in sign mispronunciations
- Influence of translation on perceived metaphor features: quality, aptness, metaphoricity, and familiarity
- Effects of grammatical gender on gender inferences: Evidence from French hybrid nouns
- Processing reflexives in adjunct control: an exploration of attraction effects
- Language Acquisition & Language Learning
- How do L1 glosses affect EFL learners’ reading comprehension performance? An eye-tracking study
- Modeling L2 motivation change and its predictive effects on learning behaviors in the extramural digital context: a quantitative investigation in China
- Ongoing exposure to an ambient language continues to build implicit knowledge across the lifespan
- On the relationship between complexity of primary occupation and L2 varietal behavior in adult migrants in Austria
- The acquisition of speaking fundamental frequency (F0) features in Cantonese and English by simultaneous bilingual children
- Sociolinguistics & Anthropological Linguistics
- A computational approach to detecting the envelope of variation
- Attitudes toward code-switching among bilingual Jordanians: a comparative study
- “Let’s ride this out together”: unpacking multilingual top-down and bottom-up pandemic communication evidenced in Singapore’s coronavirus-related linguistic and semiotic landscape
- Across time, space, and genres: measuring probabilistic grammar distances between varieties of Mandarin
- Navigating linguistic ideologies and market dynamics within China’s English language teaching landscape
- Streetscapes and memories of real socialist anti-fascism in south-eastern Europe: between dystopianism and utopianism
- What can NLP do for linguistics? Towards using grammatical error analysis to document non-standard English features
- From sociolinguistic perception to strategic action in the study of social meaning
- Minority genders in quantitative survey research: a data-driven approach to clear, inclusive, and accurate gender questions
- Variation is the way to perfection: imperfect rhyming in Chinese hip hop
- Shifts in digital media usage before and after the pandemic by Rusyns in Ukraine
- Computational & Corpus Linguistics
- Revisiting the automatic prediction of lexical errors in Mandarin
- Finding continuers in Swedish Sign Language
- Conversational priming in repetitional responses as a mechanism in language change: evidence from agent-based modelling
- Construction grammar and procedural semantics for human-interpretable grounded language processing
- Through the compression glass: language complexity and the linguistic structure of compressed strings
- Could this be next for corpus linguistics? Methods of semi-automatic data annotation with contextualized word embeddings
- The Red Hen Audio Tagger
- Code-switching in computer-mediated communication by Gen Z Japanese Americans
- Supervised prediction of production patterns using machine learning algorithms
- Introducing Bed Word: a new automated speech recognition tool for sociolinguistic interview transcription
- Decoding French equivalents of the English present perfect: evidence from parallel corpora of parliamentary documents
- Enhancing automated essay scoring with GCNs and multi-level features for robust multidimensional assessments
- Sociolinguistic auto-coding has fairness problems too: measuring and mitigating bias
- The role of syntax in hashtag popularity
- Language practices of Chinese doctoral students studying abroad on social media: a translanguaging perspective
- Cognitive Linguistics
- Metaphor and gender: are words associated with source domains perceived in a gendered way?
- Crossmodal correspondence between lexical tones and visual motions: a forced-choice mapping task on Mandarin Chinese
Artikel in diesem Heft
- Frontmatter
- Editorial
- Editorial 2024
- Phonetics & Phonology
- The role of recoverability in the implementation of non-phonemic glottalization in Hawaiian
- Epenthetic vowel quality crosslinguistically, with focus on Modern Hebrew
- Japanese speakers can infer specific sub-lexicons using phonotactic cues
- Articulatory phonetics in the market: combining public engagement with ultrasound data collection
- Investigating the acoustic fidelity of vowels across remote recording methods
- The role of coarticulatory tonal information in Cantonese spoken word recognition: an eye-tracking study
- Tracking phonological regularities: exploring the influence of learning mode and regularity locus in adult phonological learning
- Morphology & Syntax
- #AreHashtagsWords? Structure, position, and syntactic integration of hashtags in (English) tweets
- The meaning of morphomes: distributional semantics of Spanish stem alternations
- A refinement of the analysis of the resultative V-de construction in Mandarin Chinese
- L2 cognitive construal and morphosyntactic acquisition of pseudo-passive constructions
- Semantics & Pragmatics
- “All women are like that”: an overview of linguistic deindividualization and dehumanization of women in the incelosphere
- Counterfactual language, emotion, and perspective: a sentence completion study during the COVID-19 pandemic
- Constructing elderly patients’ agency through conversational storytelling
- Language Documentation & Typology
- Conative animal calls in Macha Oromo: function and form
- The syntax of African American English borrowings in the Louisiana Creole tense-mood-aspect system
- Syntactic pausing? Re-examining the associations
- Bibliographic bias and information-density sampling
- Historical & Comparative Linguistics
- Revisiting the hypothesis of ideophones as windows to language evolution
- Verifying the morpho-semantics of aspect via typological homogeneity
- Psycholinguistics & Neurolinguistics
- Sign recognition: the effect of parameters and features in sign mispronunciations
- Influence of translation on perceived metaphor features: quality, aptness, metaphoricity, and familiarity
- Effects of grammatical gender on gender inferences: Evidence from French hybrid nouns
- Processing reflexives in adjunct control: an exploration of attraction effects
- Language Acquisition & Language Learning
- How do L1 glosses affect EFL learners’ reading comprehension performance? An eye-tracking study
- Modeling L2 motivation change and its predictive effects on learning behaviors in the extramural digital context: a quantitative investigation in China
- Ongoing exposure to an ambient language continues to build implicit knowledge across the lifespan
- On the relationship between complexity of primary occupation and L2 varietal behavior in adult migrants in Austria
- The acquisition of speaking fundamental frequency (F0) features in Cantonese and English by simultaneous bilingual children
- Sociolinguistics & Anthropological Linguistics
- A computational approach to detecting the envelope of variation
- Attitudes toward code-switching among bilingual Jordanians: a comparative study
- “Let’s ride this out together”: unpacking multilingual top-down and bottom-up pandemic communication evidenced in Singapore’s coronavirus-related linguistic and semiotic landscape
- Across time, space, and genres: measuring probabilistic grammar distances between varieties of Mandarin
- Navigating linguistic ideologies and market dynamics within China’s English language teaching landscape
- Streetscapes and memories of real socialist anti-fascism in south-eastern Europe: between dystopianism and utopianism
- What can NLP do for linguistics? Towards using grammatical error analysis to document non-standard English features
- From sociolinguistic perception to strategic action in the study of social meaning
- Minority genders in quantitative survey research: a data-driven approach to clear, inclusive, and accurate gender questions
- Variation is the way to perfection: imperfect rhyming in Chinese hip hop
- Shifts in digital media usage before and after the pandemic by Rusyns in Ukraine
- Computational & Corpus Linguistics
- Revisiting the automatic prediction of lexical errors in Mandarin
- Finding continuers in Swedish Sign Language
- Conversational priming in repetitional responses as a mechanism in language change: evidence from agent-based modelling
- Construction grammar and procedural semantics for human-interpretable grounded language processing
- Through the compression glass: language complexity and the linguistic structure of compressed strings
- Could this be next for corpus linguistics? Methods of semi-automatic data annotation with contextualized word embeddings
- The Red Hen Audio Tagger
- Code-switching in computer-mediated communication by Gen Z Japanese Americans
- Supervised prediction of production patterns using machine learning algorithms
- Introducing Bed Word: a new automated speech recognition tool for sociolinguistic interview transcription
- Decoding French equivalents of the English present perfect: evidence from parallel corpora of parliamentary documents
- Enhancing automated essay scoring with GCNs and multi-level features for robust multidimensional assessments
- Sociolinguistic auto-coding has fairness problems too: measuring and mitigating bias
- The role of syntax in hashtag popularity
- Language practices of Chinese doctoral students studying abroad on social media: a translanguaging perspective
- Cognitive Linguistics
- Metaphor and gender: are words associated with source domains perceived in a gendered way?
- Crossmodal correspondence between lexical tones and visual motions: a forced-choice mapping task on Mandarin Chinese