Developing CEFR-related language proficiency tests: A focus on the role of piloting

Caroline Shackleton

doi:10.1515/cercles-2018-0019

Artikel

Developing CEFR-related language proficiency tests: A focus on the role of piloting

Caroline Shackleton is a teacher and test developer at the University of Granada’s Modern Language Centre (Centro de Lenguas Modernas). She holds an MA in Language Testing from the University of Lancaster and a PhD in Applied Linguistics (Language Testing) from the University of Granada. Currently, she is an expert member of ACLES (Association of Higher Education Language Centers in Spain) and regularly provides training in language testing.

Veröffentlicht/Copyright: 20. September 2018

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

Aus der Zeitschrift Language Learning in Higher Education Band 8 Heft 2

Abstract

Most language proficiency exams in Europe are presently developed so that reported scores can be related to the Common European Framework of Reference for Languages (CEFR; (Council of Europe. 2001. Common European framework of reference for languages: learning, teaching, assessment. Cambridge: Cambridge University Press.). Before any CEFR linking process can take place, such tests should be shown to be both valid and reliable, as “if an exam is not valid or reliable, it is meaningless to link it to the CEFR [and] a test that is not reliable cannot, by definition, be valid” (Alderson, Charles J. 2012. Principles and practice in language testing: compliance or conflict? Presentation at TEA SIG Conference: Innsbruck. http://tea.iatefl.org/inns.html (accessed May 2017).). In the test development process, tasks developed based on test specifications must therefore be piloted in order to check that test items perform as predicted. The present article focuses on the statistical analysis of test trial data provided by the piloting of three B1 listening tasks carried out at the University of Granada’s Modern Language Center (CLM). Here, results from a detailed Rasch analysis of the data showed the test to be consistently measuring a unidimensional construct of listening ability. In order to confirm that the test contains items at the correct difficulty level, teacher judgements of candidates’ listening proficiency were also collected. The test was found to separate A2 and B1 candidates well; used in conjunction with the establishment of appropriate cut scores, the reported score can be considered an accurate representation of CEFR B1 listening proficiency. The study demonstrates how Rasch measurement can be used as part of the test development process in order to make improvements to test tasks and hence create more reliable tests

Keywords: common European framework of reference for languages (CEFR); listening proficiency; assessment; test validity; test reliability; item response theory

About the author

Caroline Shackleton

Caroline Shackleton is a teacher and test developer at the University of Granada’s Modern Language Centre (Centro de Lenguas Modernas). She holds an MA in Language Testing from the University of Lancaster and a PhD in Applied Linguistics (Language Testing) from the University of Granada. Currently, she is an expert member of ACLES (Association of Higher Education Language Centers in Spain) and regularly provides training in language testing.

References

AERA (American Educational Research Association), APA (American Psychological Association) & NCME (National Council on Measurement in Education). 2014. Standards for educational and psychological testing. Washington, DC: AERA.Suche in Google Scholar

Alderson, Charles J. 2012. Principles and practice in language testing: compliance or conflict? Presentation at TEA SIG Conference: Innsbruck. http://tea.iatefl.org/inns.html (accessed May 2017).Suche in Google Scholar

Alderson, Charles J & Jayanti Banerjee. 2002. Language testing and assessment (part 2). Language Teaching 35(2). 79–113.10.1017/S0261444802001751Suche in Google Scholar

Alderson, Charles.J, Caroline Clapham & Diana Wall. 1995. Language test construction and evaluation. Cambridge: Cambridge University Press.Suche in Google Scholar

ALTE/Council of Europe. 2011. Manual for language test development and examining. for use with the CEFR. http://www.coe.int/t/dg4/linguistic/ManualtLangageTest-Alte2011_EN.pdf (accessed January 2017).Suche in Google Scholar

Bachman, Lyle. F. 1990. Fundamental considerations in language testing. Oxford: Oxford University Press.Suche in Google Scholar

Bachman, Lyle. F. 2004. Statistical analysis for language assessment. Cambridge: Cambridge University Press.10.1017/CBO9780511667350Suche in Google Scholar

Bachman, Lyle. F. 2005. Building and supporting a case for test use. Language Assessment Quarterly 2(1). 1–34.10.1207/s15434311laq0201_1Suche in Google Scholar

Bachman, Lyle. F. 2007. What is the construct? The dialectic of abilities and contexts in defining constructs in language assessment. In Janna Fox, Mari Wesche, Doreen Bayliss, Carolyn E Liying Cheng & Christine Doe Turner (eds.), Language testing reconsidered, 41–71. Ottawa: University of Ottawa Press.10.2307/j.ctt1ckpccf.9Suche in Google Scholar

Bond, Trevor. G & Christine M Fox. 2015. Applying the Rasch model: Fundamental measurement in the human sciences. 3rd edn., New York: Routledge.10.4324/9781315814698Suche in Google Scholar

Buck, Gary. 2001. Assessing listening. Cambridge: Cambridge University Press.10.1017/CBO9780511732959Suche in Google Scholar

Cohen, Jacob. 1988. Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates.Suche in Google Scholar

Council of Europe. 2001. Common European framework of reference for languages: Learning, teaching, assessment. Cambridge: Cambridge University Press.Suche in Google Scholar

Council of Europe. 2008. Recommendation CM/Rec (2008)7 of the committee of ministers to member states on the use of the council of Europe’s Common European framework of reference for languages (CEFR) and the promotion of plurilingualism. Strasbourg: Council of Europe. http://www.coe.int/t/dg4/linguistic/Conventions_EN.asp (accessed January 2017).Suche in Google Scholar

Council of Europe. 2009. Relating language examinations to the common European framework of reference for languages: Learning, teaching, assessment (CEFR): A manual. Strasburg: Council of Europe. http://www.coe.int/T/DG4/Linguistic/Manuel1_EN.asp (accessed January 2017).Suche in Google Scholar

Council of Europe. 2017. Common European framework of reference for languages: Learning, teaching, assessment. Companion volume with new descriptors. Strasbourg: Council of Europe.Suche in Google Scholar

Davies, Alan & Catherine Elder. 2005. Validity and validation in language testing. In Eli Hinkle (ed.), Handbook of research in second language teaching and learning, vol. 1, 795–813. Mahwah, NJ: Lawrence Erlbaum.Suche in Google Scholar

Field, John. 2008. Listening in the language classroom. Cambridge: Cambridge University Press.Suche in Google Scholar

Field, John. 2013. Cognitive validity. In Ardeshir Geranpayeh & Lynda Taylor (eds.), Examining listening: research and practice in assessing second language listening. Cambridge: Cambridge University Press.Suche in Google Scholar

Green, Rita. 2013. Statistical analyses for language test developers. London: Palgrave Macmillan.10.1057/9781137018298Suche in Google Scholar

Green, Rita. 2017. Designing listening tests: A practical approach. London: Palgrave Macmillan.10.1057/978-1-349-68771-8Suche in Google Scholar

Kane, Michael. 2012. Validating score interpretations and uses. Language Testing 29(1). 3–17.10.1177/0265532211417210Suche in Google Scholar

Kane, Michael. 2013. Validating the interpretations and uses of test scores. Journal of Educational Measurement 50(1). 1–73.10.1111/jedm.12000Suche in Google Scholar

Kecker, Gabriele & Thomas Eckes. 2010. Putting the manual to the test: The TestDaF–CEFR linking project. In Waldemar Martyniuk (ed.), Aligning tests with the CEFR: Reflections on using The Council Of Europe’s Draft Manual, 50–79. Canbridge: Cambrideg University Press.Suche in Google Scholar

Kolen, Michael J & Robert L Brennan. 2014. Test equating, scaling, and linking: Methods and practices. 3rd edn. New York: Springer-Verlag.10.1007/978-1-4939-0317-7Suche in Google Scholar

Larry Vandergrift & Christine C. M Goh. 2012. Teaching and learning second language listening: Metacognition in action. New York: Routledge.10.4324/9780203843376Suche in Google Scholar

Linacre, John Michael. 2017. Winsteps® Rasch measurement computer program user’s guide. Beaverton, Oregon: Winsteps.com (accessed January 2017).Suche in Google Scholar

McNamara, Timothy Francis. 1996. Measuring second language performance. Harlow: Addison Wesley Longman Ltd.Suche in Google Scholar

Messick, Samuel. 1989. Validity. In Robert L Linn (ed.), Educational measurement, 3rd edn., 13–103. New York, NY: Macmillan.Suche in Google Scholar

North, Brian & Neil Jones. 2009. Further material on maintaining standards across languages, contexts and administrations by exploiting teacher judgment and IRT scaling. Strasbourg: Council of Europe.Suche in Google Scholar

Rasch, George. 1960. Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.Suche in Google Scholar

Reckase, Mark D. 2010. NCME 2009 presidential address: What I think I know. Educational Measurement: Issues and Practice 29(3). 3–7.10.1111/j.1745-3992.2010.00178.xSuche in Google Scholar

Shackleton, Caroline. 2018. Linking the university of Granada CertAcles listening test to the CEFR. Revista de Educación 381. 37–65.Suche in Google Scholar

Sick, James. 2008. Rasch measurement in language education part 2: Measurement scales and invariance. Shiken: JALT Testing & Evaluation SIG Newsletter 12(2). 26–31.Suche in Google Scholar

Sick, James. 2010. Rasch measurement in language education part 5: Assumptions and requirements of Rasch measurement. SHIKEN: JALT Testing & Evaluation SIG Newsletter 14(2). 23–29.Suche in Google Scholar

Wright, Benjamin D & Mark H Stone. 1979. Best test design. Chicago: MESA.Suche in Google Scholar

Wu, Margaret & Ray Adams. 2007. Applying the Rasch model to psycho-social measurement: A practical approach. Melbourne: Educational Measurement Solutions.Suche in Google Scholar

Xi, Xiaoming. 2008. Methods of test validation. In Elana Shohamy (ed.), Language testing and assessment, volume 7 of encyclopedia of language and education, 177–196. New York: Springer.Suche in Google Scholar

Published Online: 2018-09-20

Published in Print: 2018-09-25

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/cercles-2018-0019

Schlagwörter für diesen Artikel

common European framework of reference for languages (CEFR); listening proficiency; assessment; test validity; test reliability; item response theory