Abstract
Summary: Item parameters for several hundreds of items were estimated based on empirical data from several thousands of subjects. The logistic one-parameter (1PL) and two-parameter (2PL) model estimates were evaluated. However, model fit showed that only a subset of items complied sufficiently, so that the remaining ones were assembled in well-fitting item banks. In several simulation studies 5000 simulated responses were generated in accordance with a computerized adaptive test procedure along with person parameters. A general reliability of .80 or a standard error of measurement of .44 was used as a stopping rule to end CAT testing. We also recorded how often each item was used by all simulees. Person-parameter estimates based on CAT correlated higher than .90 with true values simulated. For all 1PL fitting item banks most simulees used more than 20 items but less than 30 items to reach the pre-set level of measurement error. However, testing based on item banks that complied to the 2PL revealed that, on average, only 10 items were sufficient to end testing at the same measurement error level. Both clearly demonstrate the precision and economy of computerized adaptive testing. Empirical evaluations from everyday uses will show whether these trends will hold up in practice. If so, CAT will become possible and reasonable with some 150 well-calibrated 2PL items.
References
References
Flaugher, R. (1990). Item Pools. In H. Wainer (Ed.), Computerized adaptive testing - A primer (p. 65-102). Hillsdale. NJ: ErlbaumGuilford, J.P. (1965). Fundamental statistics in psychology and education . New York: McGraw HillHornke, L.F. (1976). Grundlagen und Probleme adaptiver Testverfahren . Frankfurt: Haag + HerchenHornke, L.F. (1994). Erfahrungen mit der computergestützten Diagnostik im Leistungsbereich. In D. Bartussek & A. Amelang (Eds.), Fortschritte der Differentiellen Psychologie und Psychologischen Diagnostik (pp. 321-332). Göttingen: HogrefeHornke, L.F. (1995). Stand der Technik des computerunterstützten adaptiven Testens (CAT). Untersuchungen des psychologischen Dienstes der Bundeswehr, 28/30, 9– 36Hornke, L.F. (1997). Untersuchung von Itembearbeitungszeiten beim computergestützten adaptiven Testen. Diagnostica, 43, 27– 39Hornke, L.F. Etzel, S. (1995). Theoriegeleitete Konstruktion und Evaluation von computergestützten Tests zum Merkmalsbereich “Gedächtnis und Orientierung.”. Untersuchungen des psychologischen Dienstes der Bundeswehr, 28/30, 183– 296Hornke, L.F. Habon, M.W. (1984). Regelgeleitete Konstruktion und Evaluation von nicht-verbalen Denkaufgaben. Wehrpsychologische Untersuchungen, 4, 1– 143Hornke, L.F. Habon, M.W. (1986). Rule based item construction and evaluation within the linear logistic framework. Applied Psychological Measurement, 10, 360– 380Hornke, L.F. Rettig, K. (1989). Konstruktion eines Tests mit verbalen Analogien (CAT-A2): Weitere Untersuchungen. Untersuchungen des psychologischen Dienstes der Bundeswehr, 24, 49– 137Lord, F.M. Novick, M.R. (1968). Theory of mental test scores . Reading, MA: Addison-WesleyRettig, K. Hornke, L.F. (1989). Computerunterstütztes adaptives Testen mit verbalen Analogien. Untersuchungen des psychologischen Dienstes der Bundeswehr, 24, 231– 292Rettig, K. Hornke, L.F. Schiff, B. (1989). Konstruktion eines Rechentests. Untersuchungen des psychologischen Dienstes der Bundeswehr, 24, 139– 201Sands, W.A. Waters, B.K. McBride, J.R. (1997). Computerized adaptive testing - From theory to operation . Washington, DC: American Psychological AssociationWainer, H. (1990). Computerized adaptive testing - A primer . Hillsdale, NJ: ErlbaumWainer, H. Mislevy, R.J. (1990). Item response theory, item calibration and proficiency estimation . In H. Wainer (Ed.), Computerized adaptive testing - A primer (p. 65-102). Hillsdale, NJ: ErlbaumWainer, H. Dorans, N.J. Green, B.F. Mislevy, R.J. Steinberg, L. Thissen, D. (1990). Future challenges. In H. Wainer (Ed.), Computerized adaptive testing - A primer. (p. 233-271). Hillsdale, NJ: ErlbaumYen, W.M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245– 262