Computerized Adaptive Testing: The Capitalization on Chance Problem

  1. Olea Díaz, Julio 1
  2. Barrada González, Juan Ramón 2
  3. Abad García, Francisco José 1
  4. Ponsoda Gil, Vicente 1
  5. Cuevas, Lara 3
  1. 1 Universidad Autónoma de Madrid
    info

    Universidad Autónoma de Madrid

    Madrid, España

    ROR https://ror.org/01cby8j38

  2. 2 Universidad de Zaragoza
    info

    Universidad de Zaragoza

    Zaragoza, España

    ROR https://ror.org/012a91z28

  3. 3 Universidad Complutense de Madrid
    info

    Universidad Complutense de Madrid

    Madrid, España

    ROR 02p0gd045

Revista:
The Spanish Journal of Psychology

ISSN: 1138-7416

Año de publicación: 2012

Volumen: 15

Número: 1

Páginas: 424-441

Tipo: Artículo

DOI: 10.5209/REV_SJOP.2012.V15.N1.37348 DIALNET GOOGLE SCHOLAR lock_openAcceso abierto editor

Otras publicaciones en: The Spanish Journal of Psychology

Resumen

Se describen varios estudios de simulación para examinar los efectos de la capitalización del azar en la selección de items y la estimación de rasgo en Tests Adaptativos Informatizados (TAI), empleando el modelo logístico de 3 parámetros. Para generar diferentes errores de estimación de los parámetros de los ítems, se manipuló el tamaño de la muestra de calibración (N = 500, 1000 y 2000 sujetos), así como la ratio entre tamaño del banco y longitud del test (bancos de 197 y 788 ítems, longitudes del test de 20 y 40 ítems), ambos tanto en un TAI como en un test aleatorio. Los resultados muestran que la capitalización del azar es especialmente importante en el TAI, donde se obtuvo un sesgo positivo en las condiciones de escaso tamaño de la muestra. Para rangos amplios de θ, la sobrestimación de la precisión (Se asintótico) alcanza niveles del 40%, algo que no ocurre con los valores de RMSE (θ). El problema es mayor a medida que se incrementa la ratio entre el tamaño del banco de ítems y la longitud del test. Varias soluciones fueron puestas a prueba en un segundo estudio, donde se incorporaron dos métodos para el control de la exposición en los algoritmos de selección de los ítems. Se discuten también algunas soluciones alternativas.

Información de financiación

This research was partly supported by two grants from the Spanish Ministerio de Educación y Ciencia (projects PSI2008-01685 and PSI2009-10341) and by the UAM-IIC Chair Psychometric Models and Applications.

Financiadores

Referencias bibliográficas

  • Abad, F. J., Olea, J., Aguado, D., Ponsoda, V., & Barrada, J. R. (2010). Deterioro de parametros de los items en tests adaptativos informatizados: Estudio con eCAT. [Item parameter drift in computerized adaptive testing: Study with eCAT]. Psicothema, 22, 340-347.
  • Baker, F. B. (1992). Item Response Theory. Parameter estimation techniques. New York, NY: Marcel Dekker.
  • Barrada, J. R. (In press). Tests adaptativos informatizados: Una perspectiva general [Computerized Adaptive Testing: An overview]. Anales de Psicología.
  • Barrada, J. R., Abad, F. J., & Olea, J. (2011). Varying the valuating function and the presentable bank in computerized adaptive testing. The Spanish Journal of Psychology, 14, 500-508. http://dx.doi.org/10.5209/rev_SJOP.2011.v14.n1.45
  • Barrada, J. R., Olea, J., Ponsoda, V., & Abad, F. J. (2008). Incorporating randomness in the Fisher information for improving item-exposure control in CATs. British Journal of Mathematical and Statistical Psychology, 61, 493-513. http://dx.doi.org/10.1348/000711007X230937
  • Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee/s ability. In F. M. Lord & M. R. Novick, (1968). Statistical theories of mental test scores (pp. 397-479). Reading, MA: Addison-Wesley.
  • Dodd, B. G. (1990). The effect of item selection procedure and stepsize on computerized adaptive attitude measurement using the rating scale model. Applied Psychological Measurement, 14, 355-366. http://dx.doi.org/10.1177/014662169001400403
  • Gao, F., & Chen, L. (2005). Bayesian or non-Bayesian: A comparison study of item parameter estimation in the threeparameter logistic model. Applied Measurement in Education, 18, 351-380. http://dx.doi.org/10.1207/s15324818ame1804_2
  • Georgiadou, E., Triantafillou, E., & Economides, A. (2007). A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005. Journal of Technology, Learning, and Assessment, 5. Retrieved from http://escholarship.bc.edu/ojs/index.php/jtla/article/viewFile/164 7/1482
  • Glas, C. A. W. (2005). The impact of item parameter estimation on CAT with item cloning. (Computerized Testing Report 02-06). Newtown, PA: Law School Admission Council.
  • Haley, S. M., Ni, P., Hambleton, R. K., Slavin, M. D., & Jette, A. M. (2006). Computer adaptive testing improved accuracy and precision of scores over random item selection in a physical functioning item bank. Journal of Clinical Epidemiology, 59, 1174-1182. http://dx.doi.org/10.1016/j. jclinepi.2006.02.010
  • Hambleton, R. K., & Jones, R. W. (1994). Item parameter estimation errors and their influence on test information functions. Applied Measurement in Education, 7, 171-186. http://dx.doi.org/10.1207/s15324818ame0703_1
  • Hambleton, R. K., Jones, R. W., & Rogers, H. J. (1993). Influence of item parameter estimation errors in test development. Journal of Educational Measurement, 30, 143-155. http://dx.doi.org/10. 1111/j.1745-3984.1993.tb01071.x
  • Hambleton, R. K., Zaal, J. N., & Pieters, J. P. M. (1991). Computerized adaptive testing: Theory, applications, and standards. In R. K. Hambleton & J. N. Zaal (Eds.), Advances in educational and psychological testing. (pp. 341-366). Boston, MA: Kluwer.
  • Hulin, C. L., Drasgow, F., & Parsons, C. K. (1983). Item response theory: Application to psychological measurement. Homewood, IL: Dow Jones-Irwin.
  • Hulin, C. L., Lissak, R. I., & Drasgow, F. (1982). Recovery of two and three parameter logistic item characteristic curves: A Monte Carlo study. Applied Psychological Measurement, 6, 249-260. http://dx.doi.org/10.1177/014662168200600301
  • Leung, C. K., Chang, H. H., & Hau, K. T. (2005). Computerized adaptive testing: A mixture item selection approach for constrained situations. British Journal of Mathematical and Statistical Psychology, 58, 239-257. http://dx.doi.org/10.1348/000711005X62945
  • Li, Y. H., & Lissitz, R. W. (2004). Applications of the analytically derived asymptotic standard errors of item response theory item parameter estimates. Journal of Educational Measurement, 41, 85-117. http://dx.doi.org/10.1111/j.1745-3984.2004.tb01109.x
  • Li, Y. H., & Schafer, W. D. (2003 April). The effect of item selection methods on the variability of CAT's ability estimates when item parameters are contaminated with measurement errors. Paper presented at the National Council on Measurement in Education Convention, Chicago, IL.
  • Li, Y. H., & Schafer, W. D. (2005). Increasing the homogeneity of CAT's item-exposure rates by minimizing or maximizing varied target functions while assembling shadow tests. Journal of Educational Measurement, 42, 245-269. http://dx.doi.org/10. 1111/j.1745-3984.2005.00013.x
  • Lord, F. M. (1977). A broad-range test of verbal ability. Applied Psychological Measurement, 1, 95-100. http://dx.doi.org/10. 1177/014662167700100115
  • Lord, F. M. (1980). Applications of Item Response Theory to practical testing problems. Hillsdale, NJ: LEA.
  • Luecht, R. M., De Champlain, A., & Nungester, R. J. (1998). Maintaining content validity in computerized adaptive testing. Advances in Health Sciences Education, 3, 29-41. http://dx.doi. org/10.1023/A:1009789314011
  • Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177-196. http://dx.doi.org/10.1007/BF02293979
  • Mislevy, R. J., & Bock, R. D. (1990). PC-BILOG 3: Item analysis and test scoring with binary logistic models (Computer Program). Mooresville, IN: Scientific Software.
  • Mislevy, R. J., Wingersky, M. S., & Seehan, K. M. (1994). Dealing with uncertainty about item parameters: Expected response functions. Research Report 94-28-ONR. Princeton, NJ: Education Testing Service.
  • Nicewander, W. A., & Tomasson, G. L. (1999). Some reliability estimates for computerized adaptive tests. Applied Psychological Measurement, 29, 239-247. http://dx.doi.org/10. 1177/01466219922031356
  • Olea, J., Abad, F. J., Ponsoda, V. & Ximenez, M. C. (2004). Un test adaptativo informatizado para evaluar el conocimiento de ingles escrito: Diseno y comprobaciones psicometricas [A CAT for the assessment of written English: Design and psychometric properties]. Psicothema, 16, 519-525.
  • Owen, R. J. (1975). A bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 351-356. http://dx.doi. org/10.2307/2285821
  • Ponsoda, V., & Olea, J. (2003). Adaptive and Tailored testing (including IRT and non-IRT application). In R. Fernandez-Ballesteros (Ed.), Encyclopaedia of Psychological Assessment (pp. 9-13). London, England: SAGE.
  • Revuelta, J., & Ponsoda, V. (1998). A comparison of item exposure control methods in computerized adaptive testing. Journal of Educational Measurement, 35, 311-327. http://dx.doi.org/10. 1111/j.1745-3984.1998.tb00541.x
  • Sympson, J. B., & Hetter, R. D. (1985, October). Controlling itemexposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the Military Testing Association (pp. 973-977). San Diego, CA: Navy Personnel Research and Development Center.
  • Swaminathan, H., Hambleton, R. K., Sireci, S. G., Xing, D., & Rizavi, S. M. (2003). Small sample estimation in dichotomous item response models: Effect of priors based on judgmental information on the accuracy of item parameter estimates. Applied Psychological Measurement, 27, 27-51. http://dx.doi. org/10.1177/0146621602239475
  • Tsutakawa, R. K., & Johnson, J. C. (1990). The effect of uncertainty on item parameter estimation on ability estimates. Psychometrika, 55, 371-390. http://dx.doi.org/10.1007/BF022 95293
  • van der Linden, W. J., & Glas, C. A. W. (2000). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education, 13, 35-53. http://dx.doi.org/10. 1207/s15324818ame1301_2
  • van der Linden, W. J., & Glas, C. A. W. (2001). Cross-validating item parameter estimation in computerize adaptive testing. In A. Boomsma, M. A. J. van Duijn, & T. A. M. Snijders (Eds.), Essays on Item Response Theory (pp. 205-219). New York, NY: Springer.
  • Warm, T. A. (1989). Weighted likelihood estimation of ability in Item Response Theory. Psychometrika, 54, 427-450. http://dx.doi.org/10.1007/BF02294627
  • Willse, J. T. (2002). Controlling computer adaptive testinǵs capitalization on chance errors in item parameter estimates. (Unpublised doctoral dissertation). James Madison University, Harrisonburg, VA.
  • Wise, S. L., & Kingsbury, G. G. (2000). Practical issues in developing and maintaining a computerized adaptive testing. Psicológica, 21, 135-156.
  • Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BILOG-MG: Multiple-group IRT analysis and test maintenance for binary items. Chicago, IL: Scientific Software International.