An analysis of (dis)ordered categories, thresholds, and crossings in difference and divide-by-total irt models for ordered responses

  1. Miguel García-Pérez 1
  1. 1 Universidad Complutense de Madrid
    info

    Universidad Complutense de Madrid

    Madrid, España

    ROR 02p0gd045

Revista:
The Spanish Journal of Psychology

ISSN: 1138-7416

Año de publicación: 2017

Volumen: 20

Tipo: Artículo

DOI: 10.1017/SJP.2017.11 DIALNET GOOGLE SCHOLAR lock_openAcceso abierto editor

Otras publicaciones en: The Spanish Journal of Psychology

Resumen

Threshold parameters have distinct referents across models for ordered responses. In difference models, thresholds are trait levels at which responding beyond category k is as likely as responding at or below it; in divide-by-total models, thresholds are trait levels at which responding in category k is as likely as responding in category k – 1. Thus, thresholds in divide-by-total models (but not in difference models) are the crossings of the option response functions for consecutive categories. Thresholds in difference models are always ordered but they may inconsequentially yield ordered or disordered crossings. In contrast, assimilation of thresholds and crossings in divide-by-total models questions category order when crossings are disordered. We analyze these aspects of difference and divide-by-total models, their relation to the order of response categories, and the consequences of collapsing categories to instate ordered crossings under divide-by-total models. We also show that item parameters in models for ordered responses can never contradict the pre-assumed order of categories and that the empirical order can only be established using a polytomous model that does not assume ordered categories, although this often gives rise to spurious outcomes. Practical implications for scale development are discussed.

Referencias bibliográficas

  • Adams R. J., Wu M. L., & Wilson M. (2012). The Rasch rating model and the disordered threshold controversy. Educational and Psychological Measurement, 72, 547–573. https://doi.org/10.1177/0013164411432166
  • Alexandrowicz R. W., Friedrich F., Jahn R., & Soulier N. (2015). Using Rasch-models to compare the 30–, 20–, and 12-items version of the general health questionnaire taking four recoding schemes into account. Neusopsychiatrie, 29, 179–191. https://doi.org/10.1007/s40211-015-0160-z
  • Andersen E. B. (1997). The rating scale model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 68–84). New York, NY: Springer.
  • Andrich D. (1978a). A rating formulation for ordered response categories. Psychometrika, 43, 561–573. https://doi.org/10.1007/BF02293814
  • Andrich D. (1978b). Application of a psychometric rating model to ordered categories which are scored with successive integers. Applied Psychological Measurement, 2, 581–594. https://doi.org/10.1177/014662167800200413
  • Andrich D. (1995). Models for measurement, precision, and the nondichotomization of graded responses. Psychometrika, 60, 7–26. https://doi.org/10.1007/ BF02294426
  • Andrich D. (2004). Understanding resistance to the data-model relationship in Rasch’s paradigm: A reflection for the next generation. In E. V. Smith Jr, & R. M. Smith (Eds.), Introduction to Rasch measurement (pp. 167–200). Maple Grove, MN: JAM.
  • Andrich D. (2010). Sufficiency and conditional estimation of person parameters in the polytomous Rasch model. Psychometrika, 75, 292–308. https://doi.org/10.1007/ S11336-010-9154-8
  • Andrich D. (2013a). The legacies of R. A. Fisher and K. Pearson in the application of the polytomous Rasch model for assessing the empirical order of categories. Educational and Psychological Measurement, 73, 553–580. https://doi.org/ 10.1177/0013164413477107
  • Andrich D. (2013b). An expanded derivation of the threshold structure of the polytomous Rasch model that dispels any “threshold disorder controversy”. Educational and Psychological Measurement, 73, 78–124. https://doi. org/10.1177/0013164412450877
  • Andrich D., de Jong J. H. A. L., & Sheridan B. E. (1997). Diagnostic opportunities with the Rasch model for ordered response categories. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 59–70). New York, NY: Waxmann.
  • Annoni P., Weziak-Bialowolska D., & Farhan H. (2013). Measuring the impact of the Web: Rasch modelling for survey evaluation. Journal of Applied Statistics, 40, 1831–1851. https://doi.org/10.1080/02664763.2013.796351
  • Ashley L., Smith A. B., Keding A., Jones H., Velikova G., & Wright P. (2013). Psychometric evaluation of the Revised Illness Perception Questionnaire (IPQ-R) in cancer patients: Confirmatory factor analysis and Rasch analysis. Journal of Psychosomatic Research, 75, 556–562. https://doi. org/10.1016/j.jpsychores.2013.08.005
  • Baker F. B. (1997a). Estimation of graded response model parameters using multilog. Applied Psychological Measurement, 21, 89−90. https://doi.org/10.1177/ 0146621697211007
  • Baker F. B. (1997b). Empirical sampling distributions of equating coefficients for graded and nominal response instruments. Applied Psychological Measurement, 21, 157–172. https://doi.org/10.1177/01466216970212005
  • Baker J. G., Rounds J. B., & Zevon M. A. (2000). A comparison of graded response and Rasch partial credit models with subjective well-being. Journal of Educational and Behavioral Statistics, 25, 253–270. http://www.jstor. org/stable/1165205
  • Bee P., Gibbons C., Callaghan P., Fraser C., & Lovell K. (2016). Evaluating and quantifying user and carer involvement in mental health care planning (EQUIP): Co-development of a new patient-reported outcome measure. PLoS One, 11, e0149973. https://doi.org/ 10.1371/journal.pone.0149973
  • Bell R. C., Low L. H., Jackson H. J., Dudgeon P. L., Copolov D. L., & Singh B. S. (1994). Latent trait modelling of symptoms of schizophrenia. Psychological Medicine, 24, 335–345. https://doi.org/10.1017/S0033291700027318
  • Bock R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51. https://doi.org/ 10.1007/BF02291411
  • Bock R. D. (1997). The nominal categories model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 33–49). New York, NY: Springer.
  • Bokhary K. A., Suttle C., Alotaibi A. G., Stapleton F., & Boon M. Y. (2013). Development and validation of the 21-item Children’s Vision for Living Scale (CVLS) by Rasch analysis. Clinical and Experimental Optometry, 96, 566–576. https://doi.org/10.1111/cxo.12055
  • Bourke M., Wallace L., Greskamp M., & Tormoehlen L., (2015). Improving objective measurement in nursing research: Rasch model analysis and diagnostics of the Nursing Students’ Clinical Stress Scale. Journal of Nursing Measurement, 23, E1–E15. https://doi.org/10.1891/ 1061-3749.23.1.E1
  • Brogårdh C., Lexell J., & Lundgren-Nilsson Å. (2013). Construct validity of a new rating scale for the selfreported impairments in persons with late effects of polio. Physical Medicine & Rehabilitation, 5, 176–181. https://doi. org/10.1016/j.pmrj.2012.07.007
  • Choi S. W., Reise S. P., Pilkonis P. A., Hays R. D., & Cella D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research, 19, 125–136. https://doi.org/10.1007/s11136-009-9560-5
  • Clinton M., Alayan N., & El-Alti L. (2014). Rasch analysis of Lebanese nurses’ responses to the EIS questionnaire. SAGE Open, 1–10. https://doi.org/10.1177/2158244014547182
  • Das Nair R., Moreton B. J., & Lincoln N. B. (2011). Rasch analysis of the Nottingham extended activities of Daily Living Scale. Journal of Rehabilitation Medicine, 43, 944–950. https://doi.org/10.2340/16501977-0858
  • De Ayala R. J., Dodd B. G., & Koch W. R. (1992). A comparison of the partial credit and graded response models in computerized adaptive testing. Applied Measurement in Education, 5, 17–34. https://doi.org/ 10.1207/s15324818ame0501_2
  • De Ayala R. J., & Sava-Bolesta M. (1999). Item parameter recovery for the nominal response model. Applied Psychological Measurement, 23, 3–19. https://doi.org/ 10.1177/01466219922031130
  • DeMars C. E. (2003). Sample size and the recovery of nominal response model item parameters. Applied Psychological Measurement, 27, 275–288. https://doi.org/ 10.1177/0146621603027004003
  • Dougherty B. E., Nichols J. J., & Nichols K. K. (2011). Rasch analysis of the Ocular Surface Disease Index (OSDI). Investigative Ophthalmology and Visual Science, 52, 8630–8635. https://doi.org/10.1167/iovs.11-8027
  • Du Toit M. (2003). IRT from SSI: bilog-mg, multilog, parscale, testfact. Lincolnwood, IL: Scientific Software International.
  • Forrest C. B., Bevans K. B., Pratiwadi R., Moon J., Teneralli R. E., Minton J. M., & Tucker C. A. (2014). Development of the PROMIS® pediatric global health (PGH-7) measure. Quality of Life Research, 23, 1221–1231. https://doi.org/10.1007/s11136-013-0581-8
  • García-Pérez M. A. (2014). Multiple-choice tests: Polytomous IRT models misestimate item information. The Spanish Journal of Psychology, 17, 1–18. https://doi.org/10.1017/ sjp.2014.95
  • García-Pérez M. A., Alcalá-Quintana R., & García-Cueto E. (2010). A comparison of anchor-item designs for the concurrent calibration of large banks of Liker-type items. Applied Psychological Measurement, 34, 580–599. https:// doi.org/10.1177/0146621609351259
  • González-Romá V., & Espejo B. (2003). Testing the middle response categories «Not sure», «In between» and «?» in polytomous items. Psicothema, 15, 278–284. Retrieved from http://www.psicothema.com/pdf/1058.pdf
  • Gordon R. A., Fujimoto K., Kaestner R., Korenman S., & Abner K. (2013). An assessment of the validity of the ECERS-R with implications for measures of child care quality and relations to child development. Developmental Psychology, 49, 146–160. https://doi.org/10.1037/a0027899
  • Gothwal V. K., Wright T. A., Lamoureux E. L., & Pesudovs K. (2011). Multiplicative rating scales do not enable measurement of vision-related quality of life. Clinical and Experimental Optometry, 94, 52–62. https://doi.org/ 10.1111/j.1444-0938.2010.00554.x
  • Grimbeek P., & Nisbet S. (2006). Surveying primary teachers about compulsory numeracy testing: Combining factor analysis with Rasch analysis. Mathematics Education Research Journal, 18, 27–39. https://doi.org/10.1007/ BF03217434
  • Hahn E. A., DeVellis R. F., Bode R. K., Garcia S. F., Castel L. D., Eisen S. V., ... on behalf of the PROMIS Cooperative Group (2010). Measuring social health in the patient-reported outcomes measurement information system (PROMIS): Item bank development and testing. Quality of Life Research, 19, 1035–1044. https://doi.org/ 10.1007/s11136-010-9654-0
  • Hernández A., Espejo B., & González-Romá V. (2006). The functioning of central categories Middle Level and Sometimes in graded response scales: Does the label matter? Psicothema, 18, 300–306. Retrieved from http://www.psicothema.com/pdf/3214.pdf
  • Jansen P. G. W., & Roskam E. E. (1984). The polychotomous Rasch model and dichotomization of graded responses. In E. Degreef & J. van Buggenhaut (Eds.), Trends in Mathematical Psychology (pp. 413–430). Amsterdam, the Netherlands: North-Holland.
  • Jansen P. G. W., & Roskam E. E. (1986). Latent trait models and dichotomization of graded responses. Psychometrika, 51, 69–91. https://doi.org/10.1007/BF02294001
  • Kieftenbeld V., & Natesan P. (2012). Recovery of graded response model parameters: A comparison of marginal maximum likelihood and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 36, 399–419. https://doi.org/10.1177/0146621612446170
  • Linacre J. M. (1999). Category disordering (disordered categories) vs. threshold disordering (disordered thresholds). Rasch Measurement Transactions, 13, 675.
  • Linacre J. M. (2004). Rasch model estimation: Further topics. In E. V. Smith, Jr. & R. M. Smith (Eds.), Introduction to rasch measurement (pp. 48–72). Maple Grove, MN: JAM.
  • Lundgren-Nilsson Å., Tennant A., Grimby G., & Sunnerhagen K. S. (2006). Cross-diagnostic validity in a generic instrument: An example from the Functional Independence Measure in Scandinavia. Health and Quality of Life Outcomes, 4, 55. https://doi.org/10.1186/14777525-4-55
  • Masters G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. https://doi.org/ 10.1007/BF02296272
  • Masters G. N., & Wright B. D. (1997). The partial credit model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 101–121). New York, NY: Springer.
  • Maydeu-Olivares A. (2005). Further empirical results on parametric versus non-parametric IRT modeling of Likert-type personality data. Multivariate Behavioral Research, 40, 261–279. https://doi.org/10.1207/ s15327906mbr4002_5
  • Maydeu-Olivares A., Drasgow F., & Mead A. D. (1994). Distinguishing among parametric ítem response models for polychotomous ordered data. Applied Psychological Measurement, 18, 245–256. https://doi.org/10.1177/ 014662169401800305
  • Muraki E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 59–71. https://doi.org/10.1177/014662169001400106
  • Muraki E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176. https://doi.org/10.1177/ 014662169201600206
  • Muraki E. (1993). Information functions of the generalized partial credit model. Applied Psychological Measurement, 17, 351–363. https://doi.org/10.1177/014662169301700403
  • Muraki E. (1997). A generalized partial credit model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 153–164). New York, NY: Springer.
  • Murray A. L., Booth T., & Molenaar D. (2016). When middle really means “top” or “bottom”: An analysis of the 16PF5 using Bock’s nominal response model. Journal of Personality Assessment, 98, 319–331. https://doi.org/10.1080/0022389 1.2015.1095197
  • Nilsson Å. L., Sunnerhagen K. S., & Grimby G. (2005). Scoring alternatives for FIM in neurological disorders applying Rasch analysis. Acta Neurologica Scandinavica, 111, 264–273. https://doi.org/10.1111/j.1600-0404.2005.00404.x
  • Oluboyede Y., & Smith A. B. (2013). Evidence of a unidimensional 15-item version of the CASP-19 using a Rasch model approach. Quality of Life Research, 22, 2429–2433. https://doi.org/10.1007/s11136-013-0367-z
  • Osborne R. H., Batterham R. W., Elsworth G. R., Hawkins M., & Buchbinder R. (2013). The grounded psychometric development and initial validation of the Health Literacy Questionnaire (HLQ). BMC Public Health, 13, 658. https://doi.org/10.1186/1471-2458-13-658
  • Preston K., Reise S., Cai L., & Hays R. D. (2011). Using the nominal response model to evaluate response category discrimination in the PROMIS Emotional Distress item pools. Educational and Psychological Measurement, 71, 523–550. https://doi.org/10.1177/0013164410382250
  • Reckase M. D. (2009). Multidimensional item response theory. New York, NY: Springer.
  • Reise S. P., & Yu J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133–144. https://doi.org/10.1111/ j.1745-3984.1990.tb00738.x
  • Roskam E. E. (1995). Graded responses and joining categories: A rejoinder to Andrich’ “Models for measurement, precision, and nondichotomization of graded responses”. Psychometrika, 60, 27–35. https://doi. org/10.1007/BF02294427
  • Roskam E. E., & Jansen P. G. W. (1989). Conditions for Rasch-dichotomizability of the unidimensional polytomous Rasch model. Psychometrika, 54, 317–332. https://doi.org/10.1007/BF02294523
  • Rubio V. J., Aguado D., Hontangas P. M., & Hernández J. M. (2015). Psychometric properties of an emotional adjustment measure: An application of the graded response model. European Journal of Psychological Assessment, 23, 39–46. https://doi.org/10.1027/1015-5759.23.1.39
  • Salzberger T. (2015). The validity of polytomous items in the Rasch model – The role of statistical evidence of the threshold order. Psychological Test and Assessment Modeling, 57, 377–395.
  • Samejima F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph No. 17. Richmond, VA: Psychometric Society. Retrieved from https://www.psychometricsociety.org/ sites/default/files/pdf/MN17.pdf
  • Samejima F. (1972). A general model for free-response data. Psychometrika Monograph No. 18. Richmond, VA: Psychometric Society. Retrieved from https://www. psychometricsociety.org/sites/default/files/pdf/MN18.pdf
  • Samejima F. (1996). Evaluation of mathematical models for ordered polychotomous responses. Behaviormetrika, 23, 17–35. https://doi.org/10.2333/bhmk.23.17
  • Samejima F. (1997). Graded response model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 85–100). New York, NY: Springer.
  • Smith H. J., Richardson J. B., & Tennant A. (2009). Modification and validation of the Lysholm Knee Scale to assess articular cartilage damage. Osteoarthritis and Cartilage, 17, 53–58. https://doi.org/10.1016/j.joca. 2008.05.002
  • Smith E. V. Jr., Wakely M. B., de Kruif R. E. L., & Swartz C. W. (2003). Optimizing rating scales for self-efficacy (and other) research. Educational and Psychological Measurement, 63, 369–391. https://doi.org/10.1177/0013164403063003002
  • Thissen D., & Steinberg L. (1986). A taxonomy of item response models. Psychometrika, 51, 567–577. https://doi. org/10.1007/BF02295596
  • Thissen D., Steinberg L., & Fitzpatrick A. R. (1989). Multiple-choice models: The distractors are also part of the item. Journal of Educational Measurement, 26, 161–176. https://doi.org/10.1111/j.1745-3984.1989. tb00326.x
  • Van der Wal M. B. A., Tuinebreijer W. E., Bloemen M. C. T., Verhaegen P. D. H. M., Middelkoop E., & van Zuijlen P. P. M. (2012). Rasch analysis of the Patient and Observer Scar Assessment Scale (POSAS) in burn scars. Quality of Life Research, 21, 13–23. https://doi.org/10.1007/ s11136-011-9924-5
  • Wang Y.-C., Deutscher D., Yen S. -C., Werneke M. W., & Mioduski J. E. (2014). The self-report Fecal Incontinence and Constipation Questionnaire in patients with pelvicfloor dysfunction seeking outpatient rehabilitation. Physical Therapy, 94, 273–288.
  • Wang Z., Zhou J., Luo X., Xu Y., She X., Chen L., … Wang X. (2015). Rasch analysis of the Adult Strabismus Quality of Life Questionnaire (AS-20) among Chinese adult patients with strabismus. PLoS ONE, 10, e0142188. https://doi.org/ 10.1371/journal.pone.0142188
  • Wetzel E., & Carstensen C. H. (2014). Reversed thresholds in partial credit models: A reason for collapsing categories? Assessment, 21, 765–774. https://doi.org/10.1177/ 1073191114530775
  • Wetzel E., Hell B., & Pässler K. (2012). Comparison of different test construction strategies in the development of a gender fair interest inventory using verbs. Journal of Career Assessment, 20, 88–104. https://doi.org/10.1177/ 1069072711417166
  • Wollack J. A., Bolt D. M., Cohen A. S., & Lee Y.-S. (2002). Recovery of item parameters in the nominal response model: A comparison of marginal maximum likelihood estimation and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 26, 339–352. https:// doi.org/10.1177/0146621602026003007
  • Zhong Q., Gelaye B., Fann J. R., Sanchez S. E., & Williams M. A. (2014). Cross-cultural validity of the Spanish version of the PHQ-9 among pregnant Peruvian women: A Rasch item response theory analysis. Journal of Affective Disorders, 158, 148–153. https://doi.org/10.1016/j.jad.2014.02.012