Abstract
We give an account of Classical Test Theory (CTT) in terms of the more fundamental ideas of Item Response Theory (IRT). This approach views classical test theory as a very general version of IRT, and the commonly used IRT models as detailed elaborations of CTT for special purposes. We then use this approach to CTT to derive some general results regarding the prediction of the true-score of a test from an observed score on that test as well from an observed score on a different test. This leads us to a new view of linking tests that were not developed to be linked to each other. In addition we propose true-score prediction analogues of the Dorans and Holland measures of the population sensitivity of test linking functions. We illustrate the accuracy of the first-order theory using simulated data from the Rasch model, and illustrate the effect of population differences using a set of real data.
Similar content being viewed by others
References
Bock, R.D., & Mislevy, R.J. (1982). Adaptive EAP estimation in a microcomputer environment.Applied Psychological Measurement, 6, 431–444.
Dorans, N., & Holland, P.W. (2000). Population invariance and the equatability of tests: Basic theory and the linear case.Journal of Educational Measurement, 37, 281–306.
Feuer, M.J., Holland, P.W., Green, B.F., Bertenthal, M.W., & Hemphill, F.C. (1999).Uncommon measures. Washington, DC: National Academy Press.
Gelman, A. Carlin, J.B., Stern, H.S., & Rubin, D.B. (1995).Bayesian data analysis. London: Chapman and Hall.
Holland, P.W. (1990) On the sampling theory foundations of item response theory models.Psychometrika, 55, 577–601.
Kelley, T.L. (1923)Statistical methods. New York, NY: Macmillan
Lord, F.M., & Novick, M.R. (1968).Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Mislevy, R.J., Beaton, A.E., Kaplan, B., & Sheehan, K.M. (1992). Estimating population characteristics from sparse matrix samples of item responses.Journal of Educational Measurement, 29, 133–161.
Pashley, P.J., & Phillips, G.W. (1993) Toward world-class standards: A research study linking national and international assessments. Center for Educational Progress. Princeton NJ: Educational Testing Service.
Wainer, H. et al. (2001) Augmented scores—“Borrowing strength” to compute scores based on small numbers of items. In D. Thissen & H. Wainer (Eds.),Test Scoring (pp. 343–387). Mahwah, NJ: Earlbaum.
Williams, V. et al. (1995) Projecting to the NAEP scale: Results from the North Carolina End-of-Grade testing program (Tech. Rep. #34). Chapel Hill, NC: National Institute of Statistical Science, University of North Carolina, Chapel Hill.
Wu, M., Adams, R., & Wilson, M. (1997) ConQuest [Computer program]. Melbourne, Australia: Australian Council for Educational Research.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research is collaborative in every respect and the order of authorship is alphabetical. It was begun when both authors were on the faculty of the Graduate School of Education at the University of California, Berkeley.
We would like to thank both Neil Dorans, Skip Livingston and two anonymous referees for many suggestions that have greatly improved this paper.
Rights and permissions
About this article
Cite this article
Holland, P.W., Hoskens, M. Classical Test Theory as a first-order Item Response Theory: Application to true-score prediction from a possibly nonparallel test. Psychometrika 68, 123–149 (2003). https://doi.org/10.1007/BF02296657
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02296657