Combined Effects of Task Sequencing and Corrective Feedback on EFL Learners’ Writing: a comparison between human raters and ChatGPT

Document Type : Research Paper

Authors

1 PhD, English Department, University of Isfahan, Isfahan, Iran.

2 Professor, English Department, University of Isfahan, Isfahan, Iran

Abstract

The study, which has been derived from a larger project, examined how effective ChatGPT, compared to human raters, is for scoring writing tasks when tasks were arranged from simple to complex or vice versa. In so doing, a correlational design was employed. The participants were 113 EFL learners. Two sets of writing tasks were customized based on the SSARC (simplify, stabilize, automatize, reconstruct, complexify) model. The participants were divided into two groups. They took a pre-test and did tasks in two different orders. The tasks were rectified by the researcher and returned to them later. The participants enhanced their text based on comments on tasks. After that, they took a posttest. Human raters and ChatGPT scored the pretests and posttests.  A Pearson Correlation test was run to obtain the correlation between a human rater and ChatGPT. The results indicated a strong positive correlation between scores assessed by human raters and those by ChatGPT when tasks were arranged from simple to complex (r = 968, p > 05) or complex to simple (r = 860, p > 05). These findings suggest that ChatGPT can be an effective tool for writing assessments. Suggestions for further research are discussed.

Keywords

Main Subjects


Allaw, E., & McDonough, K. (2019). The effect of task sequencing on second language written lexical complexity, accuracy, and fluency. System, 85(2019), 102-104. https://doi.org/10.1016/j.system.2019.06.008
Bui, N. M., & Barrot, J. S. (2024). ChatGPT as an automated essay scoring tool in the writing classrooms: How it compares with human scoring. Education and Information Technologies, 30(2041–2058). https://doi.org/10.1007/s10639-024-12891-w
Cotos, E. (2014). Genre-based automated writing evaluation for L2 research writing. Palgrave Macmillan. Dai, W., Lin, J., Jin, H., Li, T., Tsai, Y. S., Gašević, D., & Chen, G. (2023). Can large language models provide feedback to students? A case study on ChatGPT. In 2023 IEEE International Conference on Advanced   Learning Technologies (ICALT),323-325.
Kim, H., Baghestani, Sh., Yin, Sh., Karatay, Y., Kurt, S., Beck, J., & Karatay, L. (2024). ChatGPT for writing    evaluation: Examining the accuracy and reliability of AI-generated scores compared to human raters. In   C. A. Chapelle, G. H. Beckett, & J. Ranalli (Eds.), Exploring artificial intelligence in applied linguistics (pp. 73-95). Iowa State University Digital Press. https://doi.org/10.31274/isudp.2024.154.06
Kumaravadivelu, B. (2006b). TESOL methods: Changing tracks, challenging trends. TESOL Quarterly, 40(1), 59- 81. https://doi.org/10.2307/40264511
Lagakis, P., & Demetriadis, S. (2021). Automated essay feedback generation in the learning of writing: A Review of the Field. Interactive Mobile Communication, Technologies and Learning, 2(1), 443-453. https://doi.org/10.1007/978-3-030-96296-8_40
Liu, Q., & Brown, D. (2015). Methodological synthesis of research on the effectiveness of corrective feedback in L2 writing. Journal of Second Language Writing, 30(2015), 66-81. https://doi.org/10.1016/j.jslw2015.08.011
Lu, X., & Hu, R. (2022). Sense-aware lexical sophistication indices and their relationship to second language writing quality. Behavior research methods, 54(3), 1444-1460. https://doi.org/10.3758/s13428-021-01675-6
Nam, B. H., & Bai, Q. (2023). ChatGPT and its ethical implications for STEM research and higher education: A media discourse analysis. International Journal of STEM Education, 10(66). https://doi.org/10.1186/s40594-023-00452-5
Baralt M., Gilabert R., Robinson P. (2014). An introduction to theory and research in task sequencing and instructed second language learning. In Baralt M., Gilabert R., Robinson P. (Eds.), Task sequencing and instructed second language learning (pp. 1–37). Bloomsbury.
Blanchard, D., Tetreault, J., Higgins, D., Cahill, A., & Chodorow, M. (2013). TOEFL11: A corpus of non-native English. ETS Research Report Series (i-15). https://doi.org/10.1002/j.2333-8504.2013.tb02331.x
Hogue, C., Fry, M. & Fry, A. & Pressman, S. (2013). The Influence of a Motivational Climate Intervention on Participants' Salivary Cortisol and Psychological Responses. Journal of sport & exercise psychology. 35(1). 85-97. 10.1123/jsep.35.1.85.
Jiang, Z., Xu, Z., Pan, Z., He, J., & Xie, K. (2023). Exploring the role of artificial intelligence in facilitating assessment of writing performance in second language learning. Languages, 8(4), 247-264. https://doi.org/10.3390/languages8040247
Malicka, A. (2020). The role of task sequencing in fluency, accuracy, and complexity: Investigating the SSARC model of pedagogic task sequencing. Language Teaching Research, 24(5), 642-665. https://doi.org/10.1177/1362168818813668
Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2(2), 23-41. https://doi.org/10.1016/j.resmal.2023.100050
Naismith, B. & Mulcaire, P. & Burstein, J. (2023). Automated evaluation of written discourse coherence using GPT-4. Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), 394-403. 10.18653/v1/2023.bea-1.32.
Nguyen, N. H., & Nguyen, K. D. (2022). Vietnamese learners’ performance in The IELTS writing task 2 International Journal of TESOL & Education. 2(1)170-189. https://doi.org/10.54855/ijte.222111
Oshima, A., & Hogue, A. (2013). Longman Academic Writing Series, Level 3: Paragraphs to Essays. (3rd ed.). Longman
 Pfau, A., Polio, C., & Xu, Y. (2023). Exploring the potential of ChatGPT in assessing L2 writing accuracy for research purposes. Research Methods in Applied Linguistics, 2(3), 45-67. https://doi.org/10.1016/j.resmal.2023.100083
Polio, C., & Park, J. H. (2016). Language development in second language writing. In Manchón, R. M. & Matsuda, P. (Eds.). Handbook of Second and Foreign Language Writing. de Gruyter, pp. 287–306. https://doi.org/10.1515/9781614511335-017
Poole, F. M, & Coss, M. D. (2024).  Can ChatGPT Reliably and Accurately Apply a Rubric to L2 Writing Assessments? The Devil is in the Prompt(s). Journal of Technology and Chinese Language Teaching, 15(1), 1-24. http://www.tclt.us/journal/2024v15n1/poolecoss.pdf
Poole, F. J., & Polio, C. (2024). From sci-fi to the classroom: Implications of AI in task-based writing. TASK: Journal on Task-Based Language Teaching, 3(2), 243-272. https://doi.org/10.1007/s10462-021-10068-2
Ramineni, C., & Williamson, D. (2018). Understanding mean score differences between the e‐rater® automated scoring engine and humans for demographically based groups in the GRE® general test. ETS Research Report Series, 2018(1), 1–31. https://doi.org/10.1002/ets2.12211
Rattanadilok, P. (2015). Understanding EFL students’ errors in writing. Journal of Education and Practice, 6(32), 99-106. https://files.eric.ed.gov/fulltext/EJ1083531.pdf
Robinson, P. (2001). Task complexity, task difficulty, and task production: Exploring interactions in a componential framework. Applied linguistics, 22(1), 27-57. https://doi.org/10.1093/applin /22.1.27
Robinson, P. (2007). Task complexity, theory of mind, and intentional reasoning: Effects on L2 speech production, interaction, uptake and perceptions of task difficulty. International Review of Applied Linguistics in Language Teaching, 45, 193-213. https://doi.org/10.1515/iral.2007.009
Shermis, M. D. (2024). Using ChatGPT to score essays and short-form constructed responses. ArXiv. https://doi.org/10.48550/arXiv.2408.09540
Tang, C., & Liu, Y. T. (2018). Effects of indirect coded corrective feedback with and without short affective teacher comments on L2 writing performance, learner uptake and motivation. Assessing Writing, 35, 26-40. https://doi.org/10.1016/j.asw.2017.12.002
Tajeddin, Z., & Bahador, H. (2012). Pair grouping and resource-dispersing variables of cognitive task complexity: Effects on L2 output. Iranian Journal of Applied Linguistics (IJAL), 15(1), 123-149. http://ijal.khu.ac.ir/article-1-81-en.html
Uchida, S. (2024). Evaluating the accuracy of ChatGPT in assessing writing and speaking: A verification study using ICNALE GRA. Learner Corpus Studies in Asia and the World, 6(1), 1–12. https://doi.org/10.24546/0100487710
Zhai, K., & Gao, X. (2018). Effects of corrective feedback on EFL speaking task complexity in China’s university classroom. Cogent Education, 5(1), 148-157. https://doi.org/10.1080/2331186X.2018.1485472