Applying a two-parameter item response model to explore the psychometric properties: The case of the ministry of Science, Research and Technology (MSRT) high-stakes English Language Proficiency test

Ghahraki, Shahram; Tavakoli, Manssor; Ketabi, Saeed

doi:10.22034/elt.2021.46325.2396

فهرست نشریات دارای اعتبار وزارت علوم، تحقیقات و فناوری

تعداد نشریات	45
تعداد شماره‌ها	1,384
تعداد مقالات	16,957
تعداد مشاهده مقاله	54,581,296
تعداد دریافت فایل اصل مقاله	17,199,409

	Applying a two-parameter item response model to explore the psychometric properties: The case of the ministry of Science, Research and Technology (MSRT) high-stakes English Language Proficiency test
Journal of English Language Teaching and Learning
دوره 14، شماره 29، 2022، صفحه 1-26 اصل مقاله (1.84 M)
نوع مقاله: Research Paper
شناسه دیجیتال (DOI): 10.22034/elt.2021.46325.2396
نویسندگان
Shahram Ghahraki¹؛ Manssor Tavakoli^* ²؛ Saeed Ketabi²
¹English Language and Literature Department, University of Isfahan, Isfahan, Iran
²Applied Linguistics Department, University of Isfahan, Isfahan, Iran
چکیده
Perhaps the degree of test difficulty is one of the most significant characteristics of a test. However, no empirical research on the difficulty of the MSRT test has been carried out. The current study attempts to fill the gap by utilizing a two-parameter item response model to investigate the psychometric properties (item difficulty and item discrimination) of the MSRT test. The Test Information Function (TIF) was also figured out to estimate how well the test at what range of ability distinguishes respondents. To this end, 328 graduate students (39.9% men and 60.1% women) were selected randomly from three universities in Isfahan. A version of MSRT English proficiency test was administered to the participants. The results supported the unidimensionality of the components of MSRT test. Analysis of difficulty and discrimination indices of the total test revealed that 14% of the test items were either easy / very easy, 38% were medium, and 48% were either difficult or very difficult. In addition, 14% of the total items were classified as nonfunctioning. They discriminated negatively or did not discriminate at all. 7% of the total items discriminated poorly, 17% discriminated moderately, and 62% discriminated either highly or perfectly, however they differentiated between high-ability and higher-ability test takers. Thus, 38% of the items displayed satisfactory difficulty. Too easy (14%) and too difficult (48%) items could be one potential reason why some items have low discriminating power. An auxiliary inspection of items by the MSRT test developers is indispensable.
کلیدواژه‌ها
IRT؛ MSRT؛ High-stakes؛ Item analysis؛ Item difficulty؛ Item discrimination؛ Accountability

مراجع
Airasian, P. W. (1988). Measurement driven instruction: A closer look. Educational Measurement: Issues and Practice, 7(4), 6-11. Andrich, D. (1988) Rasch Models for Measurement. Sage Publications, Inc., Beverly Hills. Andrich, D. (2004). Controversy and the Rash model: A characteristic of incompatible paradigm? Medical Care, 42(I), 1–16. Andrich, D. (2010). Understanding the response structure and process in the polytomous Rasch model. In M. Nering & R. Ostini (Eds.), Handbook of polytomous item response theory models (pp. 123– 152). New York, NY: Routledge. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington DC. Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press. Bachman, L. F., & Eignor, D. R. (1997). Recent advances in Quantitative test analysis. In C. Clapham & D. Corson (Eds.), Encyclopedia of language and education, language testing and assessment, Vol. 7, (pp. 227–242). Dordrecht: Kluwer Academic. Bachman, L. F. & Palmer, A. S. (2010). Language assessment in practice: Developing language assessments and justifying their use. Oxford: Oxford University Press. Baghaei Moghadam,P. (2009). Understanding the Rasch model. Mashhad, Sokhangostar Publications. Baker, F. B. (1977). Advances in item analysis. Review of Educational Research, 47, 151- 158. Baker, F. B. (1985). The basics of item response theory. Portsmouth, NH: Heinemann. Baker, F. B. (1989). Computer technology in test construction and processing. In R. L. Linn (Ed.), Educational measurement (pp. 409–428). Macmillan Publishing. Baker, F. B., & Kim, S. H. (2017). The basics of item response theory using R. Berlin: Springer. Boopathiraj, C., Chellamani, K. (2013). Analysis of test items on difficulty level and discrimination index in the test for research in education. International Journal of Social Science & Interdisciplinary Research, (2), 189-193.Available at indianresearchjournals.com. Brown, J. D. (2005). Testing in language programs: A comprehensive guide to English language assessment. New York, NY: McGraw- Hill. Brown, J. D. (2012). Classical test theory. In G. Fulcher and F. Davidson (Eds.), The Routledge handbook of language testing, (pp.323-335). Routledge. Brown, J. D. (2013). Classical theory reliability. In A. Kunnan (Ed.), Companion to language assessment, Vol. 3. Hoboken, NJ: Wiley Blackwell. Bryson, M. (1974). Heavy-tailed distributions: Properties and tests. Technometrics,6,61-68. http://dx.doi.org/10.1080/00401706.1974.10489150 Bulut, O. (2015). Applying item response theory models to entrance examination for graduate studies: Practical issues and insights. Journal of measurement and evaluation in education and psychology, 6(2): 313-330. Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO for Windows (Computer software).Lincolnwood, IL: Scientific Software International. Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Sage Publications. Carlson,J.E & D avier,M.V. (2013). Item response theory. Educational Testing Service, Princeton, New Jersey. Cohen, A. D. (1980). Testing Language Ability in the Classroom. Rowley, Mass: Newbury House Publishers. Crocker, L. and Algina, J. (1986). Introduction to Classical and Modern Test Theory. Harcourt, New York. Deville, C., & Chalhoub-Deville, M. (1993). Modified scoring, traditional item analysis, and Sato’s caution index used to investigate the reading recall protocol. Language Testing, (10), 117-132. Downing, S. M., & Haladyna, T. M. (Eds.). (2006). Handbook of test development. Lawrence Erlbaum Associates Publishers. Edelen, M.O. & Reeve, B.B. (2007). Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Quality of Life Research 16(Suppl 1), 5-18. Fallahian, E. & Tabatabaei, O. (2015). Construct validity of MSRT reading comprehension module in Iranian context. English Language Teaching, 8(9), 173-186. Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/response statistics, Educational and Psychological Measurement, 58(3), 357- 381. Farhady, H. & Hedayati, H. (2009). Language assessment policy in Iran. Annual Review of Applied Linguistics (29), 132-141. Farhady, H. Jafarpur, A. and Birjandi, P.(1994). Language skills testing from theory to practice. Tehran: SAMT Publications. Frey, B. B. (Ed.). (2018).The sage encyclopedia of educational research, measurement, and evaluation. Sage Publications. Geranpayeh, A. (1994) Are Score Comparisons across Language Proficiency Test Batteries Justified? An IELTS-TOEFL Comparability Study, Edinburgh Working Papers in Applied Linguistics, 5: 50-65. Gilbert, S. & Newtton, W. J.(1997). Principles of educational and psychological measurement and evaluation. Wadsworth: The University of California. Green, R. (2013). Statistical analyses for language testers. New York, NY: Palgrave Macmillan. Green, R. (2019). Item analysis in language assessment. In V. Aryadoost, & M. Raquel (Eds.). Quantitative data analysis for language assessment volume I: Fundamental techniques. Routledge. Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3rd ed.). Mahwah, NJ: Lawrence Erlbaum. Haladyna, T. M. (2016). Item analysis for selected response items. In S. Lane, M. R. Raymond, & T. M. Haladyna (Eds.), Handbook of Test Development (2nd ed), (pp. 392–407). New York, NY: Routledge. Haladyna, T. M., & Downing, S. M. (2004). Construct-irrelevant variance in high-stakes testing. Educational Measurement: Issues and Practice, 23(1), 17-27. Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. New York, NY: Routledge. Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer Academic Publishers. Henning, G. (1984). Advantages of latent trait measurement in language testing. Language Testing (1), 123–133. Henning, G. (1987). A guide to language testing: Development, evaluation, research. Cambridge: Newbury House Publishers. Henning, G., Hudson, T. and Turner, J. (1985). Item response theory and the assumption of unidimensionality for language tests. Language Testing (2), 141–154. Janssen, G., Meier, V., Trace, J. (2014). Classical test theory and item response theory: Two understandings of one high-stakes performance exam. Colombian Applied. Linguistics Journal. 16 (2), 167–184. Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, (112), 527-535. Kiani,G.R. & Haghighi, M.(2006). The investigation of the TMU English proficiency test: Reliability related issues. Journal of Humanities (16), 55-73. Kline, T.J.B. (2005). Psychological Testing: A Practical Approach to Design and Evaluation. Sage Publications. Kohli, N., Koran, J. & Henn. L. (2015). Relationships among classical test theory and item response theory frameworks via factor analytic models. Educational and Psychological Measurement, 75(3), 389-405. Linacre, J. M. (1994). Sample size and item calibration stability. Rasch Measurement Transactions, 7(4), 328. Linacre, J. M. (2015). A user’s guide to WINSTEPS MINISTEP Rasch-model computer programs. Chicago, IL: Winsteps.com. Loe, A. (2021). Intro to IRT. Available at https:// aidenloe.github.io. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum. Malec,W. & Krzemińska-Adamek, A. (2020). A practical comparison of selected methods of evaluating multiple-choice options through classical item analysis. Practical Assessment, Research, and Evaluation: Vol.25, Article 7. Retrieved from https://scholarworks.umass.edu/pare/vol25/iss1/7 Mehrens, W. A., & Lehmann, I.J. (1991). Measurement and evaluation in education and psychology (4th ed). Belmont, CA: Wadsworth.Thomson Learning. Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational Measurement, (3^rd ed. pp. 13-103). New York: American Council of Education and Macmillan Publishing Company. Morizot, J., Ainsworth, A. T., & Reise, S. P. (2007). Toward modern psychometrics: Application of item response theory models in personality research. In R.W. Robins, R.C. Fraley, & R.F. Krueger (Eds.), Handbook of research methods in personality psychology (pp. 407-423). New York: Guilford. Moses, T. (2017). A review of developments and applications in item analysis. In R. Bennett & M. von Davier (Eds.), Advancing human assessment. The methodological, psychological and policy contributions of ETS (pp. 19–46). Springer Open. http://dx.doi.org/10.1007/978-3-319- 58689-2_2. Mousavi, A. (2009).An encyclopedic dictionary of language testing. Rahnama Press, Tehran. Nguyen, T. H., Han, H. R., Kim, M.T. & Chan, K.S.(2014).An introduction to item response theory for patient-reported outcome measurement. Patient, (7), 23-35. Springer. https://doi.org/10.1007/s40271-013-0041-0 Noori, M. & Hosseini Zadeh, S. (2017). The English Proficiency Test of the Iranian Ministry of Science, Research, and Technology: A Review. International Journal of English Language & Translation Studies. 5(3). 21-26. Osterlind, S.J. (1983). Test item bias. Beverly Hills: Sage Publications. Osterlind, S. J. (1998). Constructing test items: Multiple-choice, constructed response, performance, and other formats (2nd ed.). Boston, MA: Kluwer Academic. Ockey, G.J. (2012). Item response theory. In G. Fulcher and F. Davidson (Eds.), The Routledge handbook of language testing, (pp.336-345). Routledge. Popham, W. J. (2000). Modern educational measurement: Practical guidelines for educational leaders. Boston, MA: Allyn & Bacon. Rizopoulos, D. (2018). ltm: An R package for latent trait models under IRT. Retrieved from https://github.com/drizopoulos/ltm. Robitzsch, A. (2019). sirt: Supplementary item response theory models. R package version 3.4-64. Retrieved from https://CRAN.R-project.org/package=sirt Sahrai, R. & Mamagani , H. (2013). The assessment of the reliability and validity of the MSRT proficiency test. The Educational Assessment Journal, 10(3), 1-19 [In Persian]. Salehi, M. (2011). On the construct validity of the reading section of the University of Tehran English Proficiency Test. Journal of English Language Teaching and Learning, (222), 129-159. Sawaki, Y. (2013). Classical test theory. In A. Kunnan (Ed.), The companion to language assessment. Vol. 3. Hoboken, NJ: Wiley Blackwell. Shepard, L. A. (1993). Evaluating test validity. In L. Darling-Hammond (Ed.), Review of research in education, 19 (pp. 405-450). Washington, DC: American Educational Research Association. Traub, R. E. (1997). Classical test theory in historical perspective. Educational Measurement Issues and Practice (16), 8–14. Tsutakawa, R. k, & Johnson, J.C. (1990). The effect of uncertainty of item parameter estimation on ability estimates. Psychometrika, (55), 371-390. Wiersma, W., & Jurs, S. (1990). Educational measurement and testing. Needham Heights, MA: Allyn and Bacon. Wright, R.J. (2008). Educational Assessment: Tests and measurements in the age of accountability. Sage Publications. Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago, IL: MESA Press. Yu, C. H. (2010). A simple guide to the item response theory (IRT) and Rasch modeling. Retrieved from http://www.creative-wisdom.com. Zimowski, M., Muraki, E., Mislevy, R. J., Bock, R. D. (2002). BILOG-MG [Computer software]. Lincolnwood, IL: Scientific Software International.
آمار تعداد مشاهده مقاله: 539 تعداد دریافت فایل اصل مقاله: 504

سامانه مدیریت نشریات علمی. قدرت گرفته از سیناوب

پیوندهای مفید

آمار

Applying a two-parameter item response model to explore the psychometric properties: The case of the ministry of Science, Research and Technology (MSRT) high-stakes English Language Proficiency test