Testing equivalence with repeated measurestests of the difference model of two-alternative forced-choice performance
-
1
Universidad Complutense de Madrid
info
ISSN: 1138-7416
Año de publicación: 2011
Volumen: 14
Número: 2
Páginas: 1023-1049
Tipo: Artículo
Otras publicaciones en: The Spanish Journal of Psychology
Resumen
Solving theoretical or empirical issues sometimes involves establishing the equality of two variables with repeated measures. This defies the logic of null hypothesis significance testing, which aims at assessing evidence against the null hypothesis of equality, not for it. In some contexts, equivalence is assessed through regression analysis by testing for zero intercept and unit slope (or simply for unit slope in case that regression is forced through the origin). This paper shows that this approach renders highly inflated Type I error rates under the most common sampling models implied in studies of equivalence. We propose an alternative approach based on omnibus tests of equality of means and variances and in subject-by-subject analyses (where applicable), and we show that these tests have adequate Type I error rates and power. The approach is illustrated with a re-analysis of published data from a signal detection theory experiment with which several hypotheses of equivalence had been tested using only regression analysis. Some further errors and inadequacies of the original analyses are described, and further scrutiny of the data contradict the conclusions raised through inadequate application of regression analyses.
Información de financiación
Supported by grant PSI2009-08800 from Ministerio de Ciencia e Innovación (Spain). We thank Marisa Carrasco for sharing the data from their study.Financiadores
Referencias bibliográficas
- Alcalá-Quintana, R., & García-Pérez, M. A. (2007). A comparison of fixed-step-size and Bayesian staircases for sensory threshold estimation. Spatial Vision, 20, 197-218. (Pubitemid 47181990)
- Airman, D. G, & Bland, J. M. (1983). Measurement in medicine: The analysis of method comparison studies. The Statistician, 32, 307-317. doi:10.2307/2987937
- Anderson, S., & Hauck, W. W. (1983). A new procedure for testing equivalence in comparative bioavailability and other clinical trials. Communications in Statistics - Theory and Methods, 12, 2663-2692. doi:10.1080/03610928308828634
- Astrua, M., Ichim, D., Pennecchi, F., & Pisani, M. (2007). Statistical techniques for assessing agreement between two instruments. Metrologia, 44, 385-392. doi:10.1088/0026-1394/44/5/015 (Pubitemid 351152064)
- Baguley, T., Lansdale, M. W., Lines, L. K., & Parkin, J. K. (2006). Two spatial memories are not better than one: Evidence of exclusivity in memory for object location. Cognitive Psychology, 52, 243-289. doi:10.1016/j.cogpsych. 2005.08.001 (Pubitemid 43422656)
- Benjamini, Y. (1983). Is the t test really conservative when the parent distribution is long-tailed? Journal of the American Statistical Association, 78, 645-654. doi:10.2307/2288133
- Blackwelder, W. C. (1982). "Proving the null hypothesis" in clinical trials. Controlled Clinical Trials, 3, 345-353. doi:10.1016/0197- 2456(81)90059-3 (Pubitemid 13113168)
- Bland, J. M., & Altman, D. G (1986). Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet, 327, 307-310. doi:10.1016/j.ijnurstu.2009.10.001 (Pubitemid 16134762)
- Bland, J. M., & Altman, D. G (1999). Measuring agreement in method comparison studies. Statistical Methods in Medical Research, 8, 135-160. doi:10.1191/096228099673819272 (Pubitemid 29425900)
- Bland, J. M., & Altman, D. G (2003). Applying the right statistics: Analyses of measurement studies. Ultrasound in Obstetrics and Gynecology, 22, 85-93. doi:10.1002/uog.122 (Pubitemid 36886457)
- Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71, 425-440. doi:10.1007/sl 1336-006-1447-6 (Pubitemid 44788537)
- Bradley, E. L., & Blackwood, L. G (1989). Comparing paired data: A simultaneous test for means and variances. The American Statistician, 43, 234-235. doi:10.2307/2685368
- Brink, W. P. van den, & Koele, P. (1980). Item sampling, guessing and decision-making in achievement testing. British Journal of Mathematical and Statistical Psychology, 33, 104-108.
- Casella, G (1983). Leverage and regression through the origin. The American Statistician, 37, 147-152. doi:10.2307/2685876
- Chatterjee, S., Hadi, A. S., & Price, B. (2000). Regression Analysis by Example (3rd edition). New York, NY: Wiley.
- Corina, D. P. (1999). On the nature of left hemisphere specialization for signed language. Brain and Language, 69, 230-240. doi:10.1006/brln.1999.2062 (Pubitemid 29408980)
- Cox, N. J. (2006). Assessing agreement of measurements and predictions in geomorphology. Geomorphology, 76, 332-346. doi:10.1016/j.geomorph.2005.12.001 (Pubitemid 43779233)
- Cressie, N. (1980). Relaxing assumptions in the one-sample t- test. Australian Journal of Statistics, 22, 143-153. doi:10.1111/j.1467-842X.1980. tb01161.x
- Cusack, R., & Carlyon, R. P. (2003). Perceptual asymmetries in audition. Journal of Experimental Psychology: Human Perception and Performance, 29, 713-725. doi:10.1037/0096-1523.29.3.713 (Pubitemid 37189342)
- Diederich, A., & Colonhis, H. (2011). Modeling multisensory processes in saccadic responses: Time-window-of-integration model. In M. M. Murray & M. T. Wallace (Eds.), The Neural bases of multisensory processes. Boca Raton, FL: CRC Press, in press.
- Dierdorff, E. C, & Morgeson, F. P. (2007). Consensus in work role requirements: The influence of discrete occupational context on role expectations. Journal of Applied Psychology, 92, 1228-1241. doi:10.1037/0021- 9010.92.5.1228
- Dixon, P., & O'Reilly, T. (1999). Scientific versus statistical inference. Canadian Journal of Experimental Psychology, 53, 133-149. doi:10.1037/h0087305
- Dunn, G, & Roberts, C. (1999). Modelling method comparison data. Statistical Methods in Medical Research, 8, 161-179. doi:10.1191/ 096228099668524590 (Pubitemid 29425901)
- Dunnett, C. W, & Gent, M. (1977). Significance testing to establish equivalence between treatments, with special reference to data in the form of 2 × 2 tables. Biometrics, 33, 593-602. doi:10.2307/2529457 (Pubitemid 8236330)
- Edgell, S. E. (1995). Commentary on "Accepting the null hypothesis." Memory & Cognition, 23, 525. doi:10.3758/BF03197252
- Eisenhauer, J. G (2003). Regression through the origin. Teaching Statistics, 25, 76-80. doi:10.1111/1467-9639.00136
- Ferrand, L. (1999). Why naming takes longer than reading? The special case of Arabic numbers. Acta Psychologica, 100, 253-266. doi:10.1016/S0001- 6918(98)00021-3
- Freund, R. J., Wilson, W. J., & Sa, P. (2006). Regression Analysis: Statistical Modeling of a Response Variable (2nd edition). Burlington, MA: Academic Press.
- Frick, R. R. (1995a). Accepting the null hypothesis. Memory & Cognition, 23, 132-138. doi:10.3758/BF03210562
- Frick, R. R. (1995b). A reply to Edgell. Memory & Cognition, 23, 526. doi:10.3758/BF03197253
- García-Pérez, M. A. (1989). Item sampling, guessing, partial information and decision-making in achievement testing. In E. E. Roskam (Ed.), Mathematical Psychology in Progress (pp. 249-265). Berlin, Germany: Springer.
- García-Pérez, M. A. (2010). Statistical criteria for parallel tests: A comparison of accuracy and power. Manuscript submitted for publication.
- García-Pérez, M. A., & Alcalá-Quintana, R. (2009). Fixed vs. variable noise in 2AFC contrast discrimination: Lessons from psychometric functions. Spatial Vision, 22, 273-300. doi:10.1163/ 156856809788746309
- García-Pérez, M. A., & Núñez-Antón, V. (2009). Accuracy of power-divergence statistics for testing independence and homogeneity in two-way contingency tables. Communications in Statistics - Simulation and Computation, 38, 503-512. doi:10.1080/03610910802538351
- Gigerenzer, G. (1993). The Superego, the Ego, and the Id in statistical reasoning. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences. Methodological issues, (pp. 311-339). Hillsdale, NJ: Erlbaum.
- Gigerenzer, G (1998). We need statistical thinking, not statistical rituals. Behavioral and Brain Sciences, 21, 199-200. doi:10.1017/ S0140525X98281167
- Goertzen, J. R., & Cribbie, R. A. (2010). Detecting a lack of association: An equivalence testing approach. British Journal of Mathematical and Statistical Psychology, 63, 527-537. doi:10.1348/000711009X475853
- Good, P. I., & Hardin, J. W. (2006). Common errors in statistics (and how to avoid them) (2nd edition). Hoboken, NJ: Wiley.
- Goodman, S. N., & Royall, R. (1988). Evidence and scientific research. American Journal of Public Health, 78, 1568-1574. doi:10.2105/AJPH.78. 12.1568
- Gulliksen, H. (1950). Theory of mental tests. New York, NY: Wiley.
- Hacking, I. (1965). The logic of statistical inference. Cambridge, UK: Cambridge University Press.
- Hahn, G J. (1977). Fitting regression models with no intercept term. Journal of Quality Technology, 9, 56-61.
- Hawkins, D. M. (2002). Diagnostics for conformity of paired quantitative measurements. Statistics in Medicine, 21, 1913-1935. doi:I0.1002/sim.1013 (Pubitemid 34732648)
- Hays, S., & McCallum, R. S. (2005). A comparison of the pencil-and-paper and computer-administered Minnesota Multiphasic Personality Inventory-Adolescent. Psychology in the Schools, 42, 605-613. doi:10.1002/pits.20106 (Pubitemid 41017730)
- Hietanen, J. K., & Leppanen, J. M. (2003). Does facial expression affect attention orienting by gaze direction cues? Journal of Experimental Psychology: Human Perception and Performance, 29, 1228-1243. doi:10.1037/0096-1523.29.6.1228 (Pubitemid 38023916)
- Hollands, J. G, & Spence, I. (1998). Judging proportion with graphs: The summation model. Applied Cognitive Psychology, 12, 173-190. doi:10.1002/(SICI)1099-0720(1998 04)12:2<173::AID-ACP499>3.0.CO;2-K (Pubitemid 128531636)
- Huntsman, L. A. (1998). Testing the direct-access model: GOD does not prime DOG Perception & Psychophysics, 60, 1128-1140. doi:10.3758/BF03206163 (Pubitemid 128460284)
- Jäkel, F, & Wichmann, F. A. (2006). Spatial four-alternative forced-choice method is the preferred psychophysical method for naïve observers. Journal of Vision, 6, 1307-1322. doi:10.1167/6.11.13 (Pubitemid 44801212)
- Jordan, P. J., & Troth, A. C. (2004). Managing emotions during team problem solving: Emotional intelligence and conflict resolution. Human Performance, 17, 195-218. doi:10.1207/s15327043hup1702-4 (Pubitemid 38830392)
- Kane, M. J., Poole, B. J., Tuholski, S. W, & Engle, R. W. (2006). Working memory capacity and the top-down control of visual search: Exploring the boundaries of "executive attention." Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 149-111. doi:10.1037/0278-7393. 32.4.749 (Pubitemid 44157652)
- Kirkwood, T. B. L. (1981). Bioequivalence testing - A need to rethink. Biometrics, 37, 589-591. doi:10.2307/2530573
- Lin, L. I.-K. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45, 255-268. doi:10.2307/2532051 (Pubitemid 19109172)
- Lin, L. I.-K. (1992). Assay validation using the concordance correlation coefficient. Biometrics, 48, 599-604. doi:10.2307/2532314
- Lin, L. I.-K. (2000). Correction: A note on the concordance correlation coefficient. Biometrics, 56, 324-325. (Pubitemid 30165130)
- Lin, L., Hedayat, A. S., Sinha, B., & Yang, M. (2002). Statistical methods for assessing agreement: Models, issues, and tools. Journal of the American Statistical Association, 97, 257-270. doi:10.1198/016214502753479392
- Loftus, G (1985). Johannes Kepler's computer simulation of the universe: Some remarks about theory in psychology. Behavior Research Methods, Instruments, & Computers, 17, 149-156.
- Los, S. A. (2004). Inhibition of return and nonspecific preparation: Separable inhibitory control mechanisms in space and time. Perception & Psychophysics, 66, 119-130. doi:10.3758/BF03194866 (Pubitemid 38545500)
- Macmillan, N. A., & Creelman, C. D. (2005). Detection Theory: A user's guide. Mahwah, NJ: Erlbaum.
- McNicol, D. (2005). A primer of Signal Detection Theory. Mahwah, NJ: Erlbaum.
- Metzler, C. M. (1974). Bioavailability - A problem in equivalence. Biometrics, 30, 309-317. doi:10.2307/2529651
- Miller, J. (1996). The sampling distribution of d'. Perception & Psychophysics, 58, 65-72. doi:10.3758/BF03205476
- Mukherjee, C, White, H., & Wuyts, M. (1998). Econometrics and data analysis for developing countries. New York, NY: Routledge.
- Myers, R. H. (1990). Classical and modern regression with applications (2nd edition). Boston, MA: PWS-KENT.
- Neter, J., Kutner, M. H., Wasserman, W., & Nachtsheim, C. J. (1996). Applied linear statistical models (4th edition). Chicago, IL: Irwin.
- Perea, M., & Rosa, E. (2002). Does the proportion of associatively related pairs modulate the associative priming effect at very brief stimulus-onset asynchronies? Acta Psychologica, 110, 103-124. doi:10.1016/S0001-6918(01)00074-9
- Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113, 553-565. doi:10.1037//0033-2909.113.3.553
- Rorden, C, Karnath, H.O., & Driver, J. (2001). Do neck- proprioceptive and caloric-vestibular stimulation influence covert visual attention in normals, as they influence visual neglect? Neuropsychologia, 39, 364-375. doi:10.1016/S0028-3932(00)00126-3 (Pubitemid 32093888)
- Russo, R., Fox, E., & Bowles, R. J. (1999). On the status of implicit memory bias in anxiety. Cognition and Emotion, 13, 435-456. doi:10.1080/026999399379258
- Saint-Aubin, J., & Poirier, M. (1999). Semantic similarity and immediate serial recall: Is there a detrimental effect on order information? Quarterly Journal of Experimental Psychology, 52(A), 367-394. doi:10.1080/027249899391115
- Segrin, C. (2004). Concordance on negative emotion in close relationships: Transmission of emotion or assortative mating? Journal of Social and Clinical Psychology, 23, 836-856. doi:10.1521/jscp.23.6.836.54802 (Pubitemid 40145086)
- Selwyn, M. R., Demptster, A. P., & Hall, N. R. (1981). A Bayesian approach to bioequivalence for the 2 × 2 changeover design. Biometrics, 37, 11-21. doi:10.2307/2530518
- Selwyn, M. R., & Hall, N. R. (1984). On Bayesian methods for bioequivalence. Biometrics, 40, 1103-1108. doi:10.2307/2531161
- Sen, A., & Srivastava, M. (1990). Regression analysis. Theory, methods, and applications. New York, NY: Springer.
- Smith, R. W., & Kounios, J. (1996). Sudden insight: All-or-none processing revealed by speed-accuracy decomposition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 1443-1462. doi:10.1037//0278- 7393.22.6.1443 (Pubitemid 126491371)
- Spence, C, & Driver, J. (1997). Audiovisual links in exogenous covert spatial orienting. Perception & Psychophysics, 59, 1-22. doi:10.3758/BF03206843 (Pubitemid 127456363)
- Spence, C, & Driver, J. (1998). Auditory and audiovisual inhibition of return. Perception & Psychophysics, 60, 125-139. doi:10.3758/BF03211923 (Pubitemid 128460304)
- Stegner, B. L., Bostrom, A. G, & Greenfield, T. K. (1996). Equivalence testing for use in psychosocial and services research: An introduction with examples. Evaluation and Program Planning, 19, 193-198. doi:10.1016/0149-7189(96)00011-0 (Pubitemid 126383925)
- Van Berkum, J. J. A. (1997). Syntactic processes in speech production: The retrieval of grammatical gender. Cognition, 64, 115-152. doi:10.1016/S0010-0277(97)00026-7 (Pubitemid 127432690)
- van Stralen, K. J., Jager, K. J., Zoccali, C, & Dekker, F. W. (2008). Agreement between methods. Kidney International, 74, 1116-1120. doi:10.1038/ki.2008.306
- Tipples, J., & Sharma, D. (2000). Orienting to exogenous cues and attentional bias to affective pictures reflect separate processes. British Journal of Psychology, 91, 87-97. doi:10.1348/000712600161691
- Tryon, W. W. (2001). Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis statistical tests. Psychological Methods, 6, 371-386. (Pubitemid 33599065)
- Tryon, W. W., & Lewis, C. (2008). An inferential confidence interval method for establishing statistical equivalence that corrects Tryon's (2001) reduction factor. Psychological Methods, 13, 272-277. doi:10.1037/a0013158
- Turner, M. E. (1960). Straight line regression through the origin. Biometrics, 16, 483-485. doi:10.2307/2527698
- Vatakis, A., & Spence, C. (2008). Evaluating the influence of the 'unity assumption' on the temporal perception of realistic audiovisual stimuli. Acta Psychologica, 127, 12-23. doi:10.1016/j.actpsy.2006.12.002
- Vatakis, A., Ghazanfar, A. A., & Spence, C. (2008). Facilitation of multisensory integration by the "unity effect" reveals that speech is special. Journal of Vision, 8(9), 1-11. doi:10.1167/8.9.14
- Wang, C. M., & Iyer, H. K. (2008). Fiducial approach for assessing agreement between two instruments. Metrologia, 45, 415-421. doi:10.1088/0026- 1394/45/4/006
- Westgard, J. O., & Hunt, M. R. (1973). Use and interpretation of common statistical tests in method-comparison studies. Clinical Chemistry, 19, 49-57. doi:10.1373/clinchem.2007.094060
- Westlake, W. J. (1976). Symmetrical confidence intervals for bioequivalence trials. Biometrics, 32, 741-744. doi:10.2307/2529259 (Pubitemid 8002276)
- Westlake, W. J. (1979). Statistical aspects of comparative bioavailability trials. Biometrics, 35, 273-280. doi:10.2307/2529949 (Pubitemid 9136579)
- Westlake, W J. (1981). Bioequivalence testing - A need to rethink (Reader reaction response). Biometrics, 37, 591-593.
- Wickens, T. D. (2002). Elementary Signal Detection Theory. New York, NY: Oxford.
- Yeshurun, Y, Carrasco, M., & Maloney, L. T. (2008). Bias and sensitivity in two-interval forced choice procedures: Tests of the difference model. Vision Research, 48, 1837-1851. doi:10.1016/j.visres.2008.05.008
- Zampini, M., Brown, T, Shore, D. I., Maravita, A., Roder, B., & Spence, C. (2005). Audiotactile temporal order judgments. Acta Psychologica, 118, 277-291. doi:10.1016/j.actpsy.2004.10.017 (Pubitemid 40216883)