Abstract
Three methods for estimating reliability are studied within the context of nonparametric item response theory. Two were proposed originally by Mokken (1971) and a third is developed in this paper. Using a Monte Carlo strategy, these three estimation methods are compared with four “classical” lower bounds to reliability. Finally, recommendations are given concerning the use of these estimation methods.
Similar content being viewed by others
References
Birnbaum, A. (1968). Part V. In F. M. Lord & M. R. Novick (Eds.),Statistical theories of mental test scores. Reading: Addison-Wesley.
Boomsma, A. (1983).On the robustness of LISREL (maximum likelihood estimation) against small sample size and non-normality. Unpublished doctoral dissertation, University of Groningen.
Cliff, N. (1983). Evaluating Guttman scales: Some old and new thoughts. In H. Wainer & S. Messick (Eds.),Principals of modern psychological measurement. Hillsdale, NJ: Lawrence Erlbaum.
Cliff, N. (1984). An improved internal consistency reliability estimate.Journal of Educational Statistics, 9, 151–161.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests.Psychometrika, 16, 297–334.
Feldt, L. S. (1965). The approximate sampling distribution of Kuder-Richardson reliability coefficient twenty.Psychometrika, 30, 357–370.
Fischer, G. H. (1974).Einführung in die theorie psychologischer tests [Introduction to psychological test theory]. Bern: Huber.
Gustafsson, J. E. (1977).The Rasch for dichotomous items: Theory, applications and a computer program (Internal Rep. No. 63). Institute of Education, University of Goteborg.
Guttman, L. (1945). A basis for analyzing test-retest reliability.Psychometrika, 10, 255–282.
Guttman, L. (1950). The basis for scalogram analysis. In S. A. Stouffer, L. Guttman, E. A. Suchman, P. F. Lazarsfeld, S. A. Star, & J. A. Clausen (Eds.),Measurement and prediction. Princeton: Princeton University Press.
Henning, H. J. (1976). Die Technik der Mokken-Skalenanalyse [The technique of Mokken scale analysis].Psychologische Beiträge, 18, 410–430.
Horn, J. (1971). Integration of concepts of reliability and standard error of measurement.Educational and Psychological Measurement, 31, 57–74.
Horst, P. (1953). Correcting the Kuder-Richardson reliability for dispersion of item difficulties.Psychological Bulletin, 50, 371–374.
Jackson, P. H., & Agunwamba, C. C. (1977). Lower bounds for the reliability of the total score on a test composed of non-homogeneous items: I: Algebraic lower bounds.Psychometrika, 42, 567–578.
Jansen, P. G. W. (1982a). Homogenitätsmessung mit Hilfe des Koeffizienten H von Loevinger: Eine kritische Diskussion [Measuring homogeneity by means of Loevinger's coefficient H: A critical discussion].Psychologische Beiträge, 24, 96–105.
Jansen, P. G. W. (1982b). De onbruikbaarheid van Mokkenschaalanalyse [On the uselessness of Mokken scale analysis].Tijdschrift voor Onderwijsresearch, 7, 11–24.
Jansen, P. G. W. (1983).Rasch analysis of attitudinal data. Unpublished doctoral dissertation. Den Haag: Rijks Psychologische Dienst.
Jansen, P. G. W., Roskam, E.E.Ch.I., & Wollenberg, A. L. van den (1982). De Mokkenschaal gewogen [Weighing the Mokken scale].Tijdschrift voor Onderwijsresearch, 7, 31–42.
Kristof, W. (1963). The statistical theory of stepped-up reliability coefficients when a test has been divided into several equivalent parts.Psychometrika, 28, 221–238.
Lewis, C. (1983). Bayesian inference for latent abilities. In S. B. Anderson & J. S. Helmick (Eds.),On educational testing. San Francisco: Jossey-Bass.
Loevinger, J. (1948). The technique of homogeneous tests compared with some aspects of “scale analysis” and factor analysis.Psychological Bulletin, 45, 507–530.
Lord, F. M. (1980).Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Lord, F. M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability.Psychometrika, 48, 233–245.
Lord, F. M., & Novick, M. R. (1968).Statistical theories of mental test scores. Reading: Addison-Wesley.
Lumsden, J. (1976). Test theory.Annual Review of Psychology, 27, 251–280.
Mokken, R. J. (1971).A theory and procedure of scale analysis. The Hague: Mouton.
Mokken, R. J., & Lewis, C. (1982). A nonparametric approach to the analysis of dichotomous item responses.Applied Psychological Measurement, 6, 417–430.
Molenaar, I. W. (1982a). Mokken scaling revisited.Kwantitatieve Methoden, 8, 145–164.
Molenaar, I. W. (1982b). Een tweede weging van de Mokkenschaal [A second weighing of the Mokken scale].Tijdschrift voor Onderwijsresearch, 7, 172–181.
Molenaar, I. W. (1982c). De beperkte bruikbaarheid van Jansen's kritiek [On the limited usefulness of Jansen's criticisms].Tijdschrift voor Onderwijsresearch, 7, 25–30.
Molenaar, I. W., & Sijtsma, K. (1984). Internal consistency and reliability in Mokken's nonparametric item response model.Tijdschrift voor Onderwijsresearch, 9, 257–268.
Oosterloo, S. (1984). Confidence intervals for test information and relative efficiency.Statistica Neerlandica, 38, 37–53.
Samejima, F. (1977). Weakly parallel tests in latent trait theory with some criticisms of classical test theory.Psychometrika, 42, 193–198.
Schulman, R. S., & Haden, R. L. (1975). A test theory model for ordinal measurements.Psychometrika, 40, 455–472.
Sedere, M. U., & Feldt, L. S. (1977). The sampling distributions of the Kristof reliability coefficient, the Feldt coefficient, and Guttman's lambda-2.Journal of Educational Measurement, 14, 53–62.
Sijtsma, K. (1984). Useful nonparametric scaling: A reply to Jansen.Psychologische Beiträge, 26, 423–437.
Sijtsma, K., & Prins, P. M. (1986). Itemselectie in het Mokken model [Item selection in the Mokken model].Tijdschrift voor Onderwijsresearch, 11, 121–129.
Stokman, F. N., & Schuur, W. H. van (1980). Basic scaling.Quality and Quantity, 14, 5–30.
ten Berge, J. M. F., & Zegers, F. E. (1978). A series of lower bounds to the reliability of a test.Psychometrika, 43, 575–579.
ten Berge, J. M. F., Snijders, T. A. B., & Zegers, F. E. (1981). Computational aspects of the greatest lower bound to the reliability and constrained minimum trace factor analysis.Psychometrika, 46, 201–213.
Weiss, D. J., & Davison, M. L. (1981). Test theory and methods.Annual Review of Psychology, 32, 629–658.
Wollenberg, A. L. van den (1982). Two new test statistics for the Rasch model.Psychometrika, 47, 123–140.
Wood, R. (1978). Fitting the Rasch model—A heady tale.British Journal of Mathematical and Statistical Psychology, 31, 27–32.
Author information
Authors and Affiliations
Additional information
The authors are grateful for constructive comments from the reviewers and from Charles Lewis.
Rights and permissions
About this article
Cite this article
Sijtsma, K., Molenaar, I.W. Reliability of test scores in nonparametric item response theory. Psychometrika 52, 79–97 (1987). https://doi.org/10.1007/BF02293957
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02293957