An analysis of (dis)ordered categories, thresholds, and crossings in difference and divide-by-total irt models for ordered responses

Miguel García-Pérez

doi:10.1017/SJP.2017.11

An analysis of (dis)ordered categories, thresholds, and crossings in difference and divide-by-total irt models for ordered responses

Miguel García-Pérez ¹

1 Universidad Complutense de Madrid

Universidad Complutense de Madrid

Madrid, España

ROR 02p0gd045

Revista:

The Spanish Journal of Psychology

ISSN: 1138-7416

Año de publicación: 2017

Volumen: 20

Tipo: Artículo

DOI: 10.1017/SJP.2017.11 DIALNET GOOGLE SCHOLAR Acceso abierto editor

Otras publicaciones en: The Spanish Journal of Psychology

Objetivos de desarrollo sostenible

Resumen

Threshold parameters have distinct referents across models for ordered responses. In difference models, thresholds are trait levels at which responding beyond category k is as likely as responding at or below it; in divide-by-total models, thresholds are trait levels at which responding in category k is as likely as responding in category k – 1. Thus, thresholds in divide-by-total models (but not in difference models) are the crossings of the option response functions for consecutive categories. Thresholds in difference models are always ordered but they may inconsequentially yield ordered or disordered crossings. In contrast, assimilation of thresholds and crossings in divide-by-total models questions category order when crossings are disordered. We analyze these aspects of difference and divide-by-total models, their relation to the order of response categories, and the consequences of collapsing categories to instate ordered crossings under divide-by-total models. We also show that item parameters in models for ordered responses can never contradict the pre-assumed order of categories and that the empirical order can only be established using a polytomous model that does not assume ordered categories, although this often gives rise to spurious outcomes. Practical implications for scale development are discussed.

Referencias bibliográficas

Adams R. J., Wu M. L., & Wilson M. (2012). The Rasch rating model and the disordered threshold controversy. Educational and Psychological Measurement, 72, 547–573. https://doi.org/10.1177/0013164411432166
Alexandrowicz R. W., Friedrich F., Jahn R., & Soulier N. (2015). Using Rasch-models to compare the 30–, 20–, and 12-items version of the general health questionnaire taking four recoding schemes into account. Neusopsychiatrie, 29, 179–191. https://doi.org/10.1007/s40211-015-0160-z
Andersen E. B. (1997). The rating scale model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 68–84). New York, NY: Springer.
Andrich D. (1978a). A rating formulation for ordered response categories. Psychometrika, 43, 561–573. https://doi.org/10.1007/BF02293814
Andrich D. (1978b). Application of a psychometric rating model to ordered categories which are scored with successive integers. Applied Psychological Measurement, 2, 581–594. https://doi.org/10.1177/014662167800200413
Andrich D. (1995). Models for measurement, precision, and the nondichotomization of graded responses. Psychometrika, 60, 7–26. https://doi.org/10.1007/ BF02294426
Andrich D. (2004). Understanding resistance to the data-model relationship in Rasch’s paradigm: A reflection for the next generation. In E. V. Smith Jr, & R. M. Smith (Eds.), Introduction to Rasch measurement (pp. 167–200). Maple Grove, MN: JAM.
Andrich D. (2010). Sufficiency and conditional estimation of person parameters in the polytomous Rasch model. Psychometrika, 75, 292–308. https://doi.org/10.1007/ S11336-010-9154-8
Andrich D. (2013a). The legacies of R. A. Fisher and K. Pearson in the application of the polytomous Rasch model for assessing the empirical order of categories. Educational and Psychological Measurement, 73, 553–580. https://doi.org/ 10.1177/0013164413477107
Andrich D. (2013b). An expanded derivation of the threshold structure of the polytomous Rasch model that dispels any “threshold disorder controversy”. Educational and Psychological Measurement, 73, 78–124. https://doi. org/10.1177/0013164412450877
Andrich D., de Jong J. H. A. L., & Sheridan B. E. (1997). Diagnostic opportunities with the Rasch model for ordered response categories. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 59–70). New York, NY: Waxmann.
Annoni P., Weziak-Bialowolska D., & Farhan H. (2013). Measuring the impact of the Web: Rasch modelling for survey evaluation. Journal of Applied Statistics, 40, 1831–1851. https://doi.org/10.1080/02664763.2013.796351
Ashley L., Smith A. B., Keding A., Jones H., Velikova G., & Wright P. (2013). Psychometric evaluation of the Revised Illness Perception Questionnaire (IPQ-R) in cancer patients: Confirmatory factor analysis and Rasch analysis. Journal of Psychosomatic Research, 75, 556–562. https://doi. org/10.1016/j.jpsychores.2013.08.005
Baker F. B. (1997a). Estimation of graded response model parameters using multilog. Applied Psychological Measurement, 21, 89−90. https://doi.org/10.1177/ 0146621697211007
Baker F. B. (1997b). Empirical sampling distributions of equating coefficients for graded and nominal response instruments. Applied Psychological Measurement, 21, 157–172. https://doi.org/10.1177/01466216970212005
Baker J. G., Rounds J. B., & Zevon M. A. (2000). A comparison of graded response and Rasch partial credit models with subjective well-being. Journal of Educational and Behavioral Statistics, 25, 253–270. http://www.jstor. org/stable/1165205
Bee P., Gibbons C., Callaghan P., Fraser C., & Lovell K. (2016). Evaluating and quantifying user and carer involvement in mental health care planning (EQUIP): Co-development of a new patient-reported outcome measure. PLoS One, 11, e0149973. https://doi.org/ 10.1371/journal.pone.0149973
Bell R. C., Low L. H., Jackson H. J., Dudgeon P. L., Copolov D. L., & Singh B. S. (1994). Latent trait modelling of symptoms of schizophrenia. Psychological Medicine, 24, 335–345. https://doi.org/10.1017/S0033291700027318
Bock R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51. https://doi.org/ 10.1007/BF02291411
Bock R. D. (1997). The nominal categories model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 33–49). New York, NY: Springer.
Bokhary K. A., Suttle C., Alotaibi A. G., Stapleton F., & Boon M. Y. (2013). Development and validation of the 21-item Children’s Vision for Living Scale (CVLS) by Rasch analysis. Clinical and Experimental Optometry, 96, 566–576. https://doi.org/10.1111/cxo.12055
Bourke M., Wallace L., Greskamp M., & Tormoehlen L., (2015). Improving objective measurement in nursing research: Rasch model analysis and diagnostics of the Nursing Students’ Clinical Stress Scale. Journal of Nursing Measurement, 23, E1–E15. https://doi.org/10.1891/ 1061-3749.23.1.E1
Brogårdh C., Lexell J., & Lundgren-Nilsson Å. (2013). Construct validity of a new rating scale for the selfreported impairments in persons with late effects of polio. Physical Medicine & Rehabilitation, 5, 176–181. https://doi. org/10.1016/j.pmrj.2012.07.007
Choi S. W., Reise S. P., Pilkonis P. A., Hays R. D., & Cella D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research, 19, 125–136. https://doi.org/10.1007/s11136-009-9560-5
Clinton M., Alayan N., & El-Alti L. (2014). Rasch analysis of Lebanese nurses’ responses to the EIS questionnaire. SAGE Open, 1–10. https://doi.org/10.1177/2158244014547182
Das Nair R., Moreton B. J., & Lincoln N. B. (2011). Rasch analysis of the Nottingham extended activities of Daily Living Scale. Journal of Rehabilitation Medicine, 43, 944–950. https://doi.org/10.2340/16501977-0858
De Ayala R. J., Dodd B. G., & Koch W. R. (1992). A comparison of the partial credit and graded response models in computerized adaptive testing. Applied Measurement in Education, 5, 17–34. https://doi.org/ 10.1207/s15324818ame0501_2
De Ayala R. J., & Sava-Bolesta M. (1999). Item parameter recovery for the nominal response model. Applied Psychological Measurement, 23, 3–19. https://doi.org/ 10.1177/01466219922031130
DeMars C. E. (2003). Sample size and the recovery of nominal response model item parameters. Applied Psychological Measurement, 27, 275–288. https://doi.org/ 10.1177/0146621603027004003
Dougherty B. E., Nichols J. J., & Nichols K. K. (2011). Rasch analysis of the Ocular Surface Disease Index (OSDI). Investigative Ophthalmology and Visual Science, 52, 8630–8635. https://doi.org/10.1167/iovs.11-8027
Du Toit M. (2003). IRT from SSI: bilog-mg, multilog, parscale, testfact. Lincolnwood, IL: Scientific Software International.
Forrest C. B., Bevans K. B., Pratiwadi R., Moon J., Teneralli R. E., Minton J. M., & Tucker C. A. (2014). Development of the PROMIS® pediatric global health (PGH-7) measure. Quality of Life Research, 23, 1221–1231. https://doi.org/10.1007/s11136-013-0581-8
García-Pérez M. A. (2014). Multiple-choice tests: Polytomous IRT models misestimate item information. The Spanish Journal of Psychology, 17, 1–18. https://doi.org/10.1017/ sjp.2014.95
García-Pérez M. A., Alcalá-Quintana R., & García-Cueto E. (2010). A comparison of anchor-item designs for the concurrent calibration of large banks of Liker-type items. Applied Psychological Measurement, 34, 580–599. https:// doi.org/10.1177/0146621609351259
González-Romá V., & Espejo B. (2003). Testing the middle response categories «Not sure», «In between» and «?» in polytomous items. Psicothema, 15, 278–284. Retrieved from http://www.psicothema.com/pdf/1058.pdf
Gordon R. A., Fujimoto K., Kaestner R., Korenman S., & Abner K. (2013). An assessment of the validity of the ECERS-R with implications for measures of child care quality and relations to child development. Developmental Psychology, 49, 146–160. https://doi.org/10.1037/a0027899
Gothwal V. K., Wright T. A., Lamoureux E. L., & Pesudovs K. (2011). Multiplicative rating scales do not enable measurement of vision-related quality of life. Clinical and Experimental Optometry, 94, 52–62. https://doi.org/ 10.1111/j.1444-0938.2010.00554.x
Grimbeek P., & Nisbet S. (2006). Surveying primary teachers about compulsory numeracy testing: Combining factor analysis with Rasch analysis. Mathematics Education Research Journal, 18, 27–39. https://doi.org/10.1007/ BF03217434
Hahn E. A., DeVellis R. F., Bode R. K., Garcia S. F., Castel L. D., Eisen S. V., ... on behalf of the PROMIS Cooperative Group (2010). Measuring social health in the patient-reported outcomes measurement information system (PROMIS): Item bank development and testing. Quality of Life Research, 19, 1035–1044. https://doi.org/ 10.1007/s11136-010-9654-0
Hernández A., Espejo B., & González-Romá V. (2006). The functioning of central categories Middle Level and Sometimes in graded response scales: Does the label matter? Psicothema, 18, 300–306. Retrieved from http://www.psicothema.com/pdf/3214.pdf
Jansen P. G. W., & Roskam E. E. (1984). The polychotomous Rasch model and dichotomization of graded responses. In E. Degreef & J. van Buggenhaut (Eds.), Trends in Mathematical Psychology (pp. 413–430). Amsterdam, the Netherlands: North-Holland.
Jansen P. G. W., & Roskam E. E. (1986). Latent trait models and dichotomization of graded responses. Psychometrika, 51, 69–91. https://doi.org/10.1007/BF02294001
Kieftenbeld V., & Natesan P. (2012). Recovery of graded response model parameters: A comparison of marginal maximum likelihood and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 36, 399–419. https://doi.org/10.1177/0146621612446170
Linacre J. M. (1999). Category disordering (disordered categories) vs. threshold disordering (disordered thresholds). Rasch Measurement Transactions, 13, 675.
Linacre J. M. (2004). Rasch model estimation: Further topics. In E. V. Smith, Jr. & R. M. Smith (Eds.), Introduction to rasch measurement (pp. 48–72). Maple Grove, MN: JAM.
Lundgren-Nilsson Å., Tennant A., Grimby G., & Sunnerhagen K. S. (2006). Cross-diagnostic validity in a generic instrument: An example from the Functional Independence Measure in Scandinavia. Health and Quality of Life Outcomes, 4, 55. https://doi.org/10.1186/14777525-4-55
Masters G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. https://doi.org/ 10.1007/BF02296272
Masters G. N., & Wright B. D. (1997). The partial credit model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 101–121). New York, NY: Springer.
Maydeu-Olivares A. (2005). Further empirical results on parametric versus non-parametric IRT modeling of Likert-type personality data. Multivariate Behavioral Research, 40, 261–279. https://doi.org/10.1207/ s15327906mbr4002_5
Maydeu-Olivares A., Drasgow F., & Mead A. D. (1994). Distinguishing among parametric ítem response models for polychotomous ordered data. Applied Psychological Measurement, 18, 245–256. https://doi.org/10.1177/ 014662169401800305
Muraki E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 59–71. https://doi.org/10.1177/014662169001400106
Muraki E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176. https://doi.org/10.1177/ 014662169201600206
Muraki E. (1993). Information functions of the generalized partial credit model. Applied Psychological Measurement, 17, 351–363. https://doi.org/10.1177/014662169301700403
Muraki E. (1997). A generalized partial credit model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 153–164). New York, NY: Springer.
Murray A. L., Booth T., & Molenaar D. (2016). When middle really means “top” or “bottom”: An analysis of the 16PF5 using Bock’s nominal response model. Journal of Personality Assessment, 98, 319–331. https://doi.org/10.1080/0022389 1.2015.1095197
Nilsson Å. L., Sunnerhagen K. S., & Grimby G. (2005). Scoring alternatives for FIM in neurological disorders applying Rasch analysis. Acta Neurologica Scandinavica, 111, 264–273. https://doi.org/10.1111/j.1600-0404.2005.00404.x
Oluboyede Y., & Smith A. B. (2013). Evidence of a unidimensional 15-item version of the CASP-19 using a Rasch model approach. Quality of Life Research, 22, 2429–2433. https://doi.org/10.1007/s11136-013-0367-z
Osborne R. H., Batterham R. W., Elsworth G. R., Hawkins M., & Buchbinder R. (2013). The grounded psychometric development and initial validation of the Health Literacy Questionnaire (HLQ). BMC Public Health, 13, 658. https://doi.org/10.1186/1471-2458-13-658
Preston K., Reise S., Cai L., & Hays R. D. (2011). Using the nominal response model to evaluate response category discrimination in the PROMIS Emotional Distress item pools. Educational and Psychological Measurement, 71, 523–550. https://doi.org/10.1177/0013164410382250
Reckase M. D. (2009). Multidimensional item response theory. New York, NY: Springer.
Reise S. P., & Yu J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133–144. https://doi.org/10.1111/ j.1745-3984.1990.tb00738.x
Roskam E. E. (1995). Graded responses and joining categories: A rejoinder to Andrich’ “Models for measurement, precision, and nondichotomization of graded responses”. Psychometrika, 60, 27–35. https://doi. org/10.1007/BF02294427
Roskam E. E., & Jansen P. G. W. (1989). Conditions for Rasch-dichotomizability of the unidimensional polytomous Rasch model. Psychometrika, 54, 317–332. https://doi.org/10.1007/BF02294523
Rubio V. J., Aguado D., Hontangas P. M., & Hernández J. M. (2015). Psychometric properties of an emotional adjustment measure: An application of the graded response model. European Journal of Psychological Assessment, 23, 39–46. https://doi.org/10.1027/1015-5759.23.1.39
Salzberger T. (2015). The validity of polytomous items in the Rasch model – The role of statistical evidence of the threshold order. Psychological Test and Assessment Modeling, 57, 377–395.
Samejima F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph No. 17. Richmond, VA: Psychometric Society. Retrieved from https://www.psychometricsociety.org/ sites/default/files/pdf/MN17.pdf
Samejima F. (1972). A general model for free-response data. Psychometrika Monograph No. 18. Richmond, VA: Psychometric Society. Retrieved from https://www. psychometricsociety.org/sites/default/files/pdf/MN18.pdf
Samejima F. (1996). Evaluation of mathematical models for ordered polychotomous responses. Behaviormetrika, 23, 17–35. https://doi.org/10.2333/bhmk.23.17
Samejima F. (1997). Graded response model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 85–100). New York, NY: Springer.
Smith H. J., Richardson J. B., & Tennant A. (2009). Modification and validation of the Lysholm Knee Scale to assess articular cartilage damage. Osteoarthritis and Cartilage, 17, 53–58. https://doi.org/10.1016/j.joca. 2008.05.002
Smith E. V. Jr., Wakely M. B., de Kruif R. E. L., & Swartz C. W. (2003). Optimizing rating scales for self-efficacy (and other) research. Educational and Psychological Measurement, 63, 369–391. https://doi.org/10.1177/0013164403063003002
Thissen D., & Steinberg L. (1986). A taxonomy of item response models. Psychometrika, 51, 567–577. https://doi. org/10.1007/BF02295596
Thissen D., Steinberg L., & Fitzpatrick A. R. (1989). Multiple-choice models: The distractors are also part of the item. Journal of Educational Measurement, 26, 161–176. https://doi.org/10.1111/j.1745-3984.1989. tb00326.x
Van der Wal M. B. A., Tuinebreijer W. E., Bloemen M. C. T., Verhaegen P. D. H. M., Middelkoop E., & van Zuijlen P. P. M. (2012). Rasch analysis of the Patient and Observer Scar Assessment Scale (POSAS) in burn scars. Quality of Life Research, 21, 13–23. https://doi.org/10.1007/ s11136-011-9924-5
Wang Y.-C., Deutscher D., Yen S. -C., Werneke M. W., & Mioduski J. E. (2014). The self-report Fecal Incontinence and Constipation Questionnaire in patients with pelvicfloor dysfunction seeking outpatient rehabilitation. Physical Therapy, 94, 273–288.
Wang Z., Zhou J., Luo X., Xu Y., She X., Chen L., … Wang X. (2015). Rasch analysis of the Adult Strabismus Quality of Life Questionnaire (AS-20) among Chinese adult patients with strabismus. PLoS ONE, 10, e0142188. https://doi.org/ 10.1371/journal.pone.0142188
Wetzel E., & Carstensen C. H. (2014). Reversed thresholds in partial credit models: A reason for collapsing categories? Assessment, 21, 765–774. https://doi.org/10.1177/ 1073191114530775
Wetzel E., Hell B., & Pässler K. (2012). Comparison of different test construction strategies in the development of a gender fair interest inventory using verbs. Journal of Career Assessment, 20, 88–104. https://doi.org/10.1177/ 1069072711417166
Wollack J. A., Bolt D. M., Cohen A. S., & Lee Y.-S. (2002). Recovery of item parameters in the nominal response model: A comparison of marginal maximum likelihood estimation and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 26, 339–352. https:// doi.org/10.1177/0146621602026003007
Zhong Q., Gelaye B., Fann J. R., Sanchez S. E., & Williams M. A. (2014). Cross-cultural validity of the Spanish version of the PHQ-9 among pregnant Peruvian women: A Rasch item response theory analysis. Journal of Affective Disorders, 158, 148–153. https://doi.org/10.1016/j.jad.2014.02.012

Fuente de los datos: Dialnet

An analysis of (dis)ordered categories, thresholds, and crossings in difference and divide-by-total irt models for ordered responses

Universidad Complutense de Madrid

Objetivos de desarrollo sostenible

Resumen

Referencias bibliográficas