Bondad de ajuste en ítems politómicos: tasas de error tipo I y potencia de tres índices de ajuste

Sueiro Abad, Manuel J.; Abad García, Francisco José

Bondad de ajuste en ítems politómicostasas de error tipo I y potencia de tres índices de ajuste

Sueiro Abad, Manuel J.
Abad García, Francisco José

Revista:

Psicothema

ISSN: 0214-9915

Año de publicación: 2009

Volumen: 21

Número: 4

Páginas: 639-645

Tipo: Artículo

DIALNET GOOGLE SCHOLAR Acceso abierto editor

Otras publicaciones en: Psicothema

Resumen

Al aplicar un modelo de Teoría de la Respuesta al Ítem es fundamental disponer de un procedimiento que permita conocer si el modelo se ajusta a los datos. Este artículo compara, mediante un estudio de simulación, las tasas de error tipo I y potencia de tres tipos de índices de ajuste generalizados a ítems politómicos: el índice tradicional basado en la agrupación de los sujetos según su nivel de rasgo estimado, otro basado en el cálculo de las probabilidades posteriores y un tercero consistente en agrupar a los sujetos mediante su puntuación total en el test. Las condiciones bajo estudio fueron la longitud del test (10, 20 y 40 ítems), número de opciones de los ítems (3, 4 y 5) y tamaño de la muestra (500, 1.000 y 2.000 sujetos). Los resultados mostraron que el índice basado en las probabilidades posteriores presentaba tasas de error más próximas a las nominales, así como una mayor potencia, especialmente cuando la muestra era grande o el test era corto.

Referencias bibliográficas

Ankenmann, R.D., Witt, E.A., y Dunbar, S.B. (1999) An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36(4), 277-300.
Bock, R.D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29-51.
DeMars, C.E. (2005). Type I error rates for PARSCALE’s fit index. Educational and Psychological Measurement, 65, 42-50.
Glas, C.A.W., y Suárez-Falcón, J.C. (2003). A comparison of item-fit statistics for the three-parameter logistic model. Applied Psychological Measurement, 27(2), 87-106.
Kang, T., y Chen, T.T. (2007). An investigation of the performance of the generalized S-X2 item-fit index for polytomous IRT models. ACT Research Report Series, 2007-1.
Lord, F.M., y Wingersky, M.S. (1984). Comparison of IRT true-score and equipercentile observed-score «equatings». Applied Psychological Measurement, 8, 452-461.
McKinley, R.L., y Mills, C.N. (1985). A comparasion of several goodnessof-fit statistics. Applied Psychological Measurement, 9(1), 49-57.
Muraki, E. (1990). Fitting a polytomous item response model to Likerttype data. Applied Psychological Measurement, 14, 59-71.
Muraki, E., y Bock, R.D. (1997). PARSCALE: IRT item analysis and test scoring for rating scale data [Computer software]. Chicago: Scientific Software.
Olea, J., Ponsoda, V., y Prieto, G. (1999). Tests informatizados: fundamentos y aplicaciones. Madrid: Pirámide.
Orlando, M., y Thissen, D. (2000). Likelihood-based item fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50-64.
Orlando, M., y Thissen, D. (2003). Further investigation of the performance of S X2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27, 289298.
Reise, S.P. (1990). A comparision of item and person-fit methods of assessing model data fit in IRT. Applied Psychological Measurement, 14, 127-137.
Roberts, J.S. (2008). Modified likelihood-based item fit statistics for the generalized graded unfolding model. Applied Psychological Measurement, 32, 407-423.
Samejima, R. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometric Monograph, 17.
Stone, C.A., y Zhang, B. (2003). Assessing goodness of fit of item response theory models: A comparison of traditional and alternative procedures. Journal of Educational Measurement, 40(4), 331-352.
Stone, C.A. (2000) Monte Carlo based null distribution for an alternative goodness-of-fit test statistic in IRT models. Journal of Educational Measurement, 37, 58-75.
Sueiro, M.J., y Abad, F.J. (en preparación). MRFITIT: Goodness-of-fit software for IRT models. Unpublished software.
Swaminathan, H., Hambleton, R.K., y Rogers, H.J. (2007). Assessing the fit of item response theory models, en C.R. Rao y S. Sinharay (Eds.): Handbook of Statistics, vol. 26, North Holland.
Thissen, D., Chen, W-H., y Bock, R.D. (2003). Multilog (version 7) [Computer sotware]. Lincolnwood, IL: Scientific Software International.
Thissen, D., Pommerich, M., Billeaud, K., y Williams, V.S. (1995) Item response theory for scores on tests including polytomous items with ordered responses. Applied Psychological Measurement. Special Issue: Polytomous item response theory, 19(1), 39-49.
von Davier, M. (1997). Bootstrapping goodness-of-fit statistics for sparse categorical data. Methods of Psychological Research Online, 2(2), 2948.
Yen, W.M. (1981) Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5(2), 245-262.

Fuente de los datos: Dialnet