Skip to main content
Log in

Reliability of test scores in nonparametric item response theory

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Three methods for estimating reliability are studied within the context of nonparametric item response theory. Two were proposed originally by Mokken (1971) and a third is developed in this paper. Using a Monte Carlo strategy, these three estimation methods are compared with four “classical” lower bounds to reliability. Finally, recommendations are given concerning the use of these estimation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Birnbaum, A. (1968). Part V. In F. M. Lord & M. R. Novick (Eds.),Statistical theories of mental test scores. Reading: Addison-Wesley.

    Google Scholar 

  • Boomsma, A. (1983).On the robustness of LISREL (maximum likelihood estimation) against small sample size and non-normality. Unpublished doctoral dissertation, University of Groningen.

  • Cliff, N. (1983). Evaluating Guttman scales: Some old and new thoughts. In H. Wainer & S. Messick (Eds.),Principals of modern psychological measurement. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Cliff, N. (1984). An improved internal consistency reliability estimate.Journal of Educational Statistics, 9, 151–161.

    Google Scholar 

  • Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests.Psychometrika, 16, 297–334.

    Google Scholar 

  • Feldt, L. S. (1965). The approximate sampling distribution of Kuder-Richardson reliability coefficient twenty.Psychometrika, 30, 357–370.

    Google Scholar 

  • Fischer, G. H. (1974).Einführung in die theorie psychologischer tests [Introduction to psychological test theory]. Bern: Huber.

    Google Scholar 

  • Gustafsson, J. E. (1977).The Rasch for dichotomous items: Theory, applications and a computer program (Internal Rep. No. 63). Institute of Education, University of Goteborg.

  • Guttman, L. (1945). A basis for analyzing test-retest reliability.Psychometrika, 10, 255–282.

    Google Scholar 

  • Guttman, L. (1950). The basis for scalogram analysis. In S. A. Stouffer, L. Guttman, E. A. Suchman, P. F. Lazarsfeld, S. A. Star, & J. A. Clausen (Eds.),Measurement and prediction. Princeton: Princeton University Press.

    Google Scholar 

  • Henning, H. J. (1976). Die Technik der Mokken-Skalenanalyse [The technique of Mokken scale analysis].Psychologische Beiträge, 18, 410–430.

    Google Scholar 

  • Horn, J. (1971). Integration of concepts of reliability and standard error of measurement.Educational and Psychological Measurement, 31, 57–74.

    Google Scholar 

  • Horst, P. (1953). Correcting the Kuder-Richardson reliability for dispersion of item difficulties.Psychological Bulletin, 50, 371–374.

    Google Scholar 

  • Jackson, P. H., & Agunwamba, C. C. (1977). Lower bounds for the reliability of the total score on a test composed of non-homogeneous items: I: Algebraic lower bounds.Psychometrika, 42, 567–578.

    Google Scholar 

  • Jansen, P. G. W. (1982a). Homogenitätsmessung mit Hilfe des Koeffizienten H von Loevinger: Eine kritische Diskussion [Measuring homogeneity by means of Loevinger's coefficient H: A critical discussion].Psychologische Beiträge, 24, 96–105.

    Google Scholar 

  • Jansen, P. G. W. (1982b). De onbruikbaarheid van Mokkenschaalanalyse [On the uselessness of Mokken scale analysis].Tijdschrift voor Onderwijsresearch, 7, 11–24.

    Google Scholar 

  • Jansen, P. G. W. (1983).Rasch analysis of attitudinal data. Unpublished doctoral dissertation. Den Haag: Rijks Psychologische Dienst.

    Google Scholar 

  • Jansen, P. G. W., Roskam, E.E.Ch.I., & Wollenberg, A. L. van den (1982). De Mokkenschaal gewogen [Weighing the Mokken scale].Tijdschrift voor Onderwijsresearch, 7, 31–42.

    Google Scholar 

  • Kristof, W. (1963). The statistical theory of stepped-up reliability coefficients when a test has been divided into several equivalent parts.Psychometrika, 28, 221–238.

    Google Scholar 

  • Lewis, C. (1983). Bayesian inference for latent abilities. In S. B. Anderson & J. S. Helmick (Eds.),On educational testing. San Francisco: Jossey-Bass.

    Google Scholar 

  • Loevinger, J. (1948). The technique of homogeneous tests compared with some aspects of “scale analysis” and factor analysis.Psychological Bulletin, 45, 507–530.

    Google Scholar 

  • Lord, F. M. (1980).Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Lord, F. M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability.Psychometrika, 48, 233–245.

    Google Scholar 

  • Lord, F. M., & Novick, M. R. (1968).Statistical theories of mental test scores. Reading: Addison-Wesley.

    Google Scholar 

  • Lumsden, J. (1976). Test theory.Annual Review of Psychology, 27, 251–280.

    Google Scholar 

  • Mokken, R. J. (1971).A theory and procedure of scale analysis. The Hague: Mouton.

    Google Scholar 

  • Mokken, R. J., & Lewis, C. (1982). A nonparametric approach to the analysis of dichotomous item responses.Applied Psychological Measurement, 6, 417–430.

    Google Scholar 

  • Molenaar, I. W. (1982a). Mokken scaling revisited.Kwantitatieve Methoden, 8, 145–164.

    Google Scholar 

  • Molenaar, I. W. (1982b). Een tweede weging van de Mokkenschaal [A second weighing of the Mokken scale].Tijdschrift voor Onderwijsresearch, 7, 172–181.

    Google Scholar 

  • Molenaar, I. W. (1982c). De beperkte bruikbaarheid van Jansen's kritiek [On the limited usefulness of Jansen's criticisms].Tijdschrift voor Onderwijsresearch, 7, 25–30.

    Google Scholar 

  • Molenaar, I. W., & Sijtsma, K. (1984). Internal consistency and reliability in Mokken's nonparametric item response model.Tijdschrift voor Onderwijsresearch, 9, 257–268.

    Google Scholar 

  • Oosterloo, S. (1984). Confidence intervals for test information and relative efficiency.Statistica Neerlandica, 38, 37–53.

    Google Scholar 

  • Samejima, F. (1977). Weakly parallel tests in latent trait theory with some criticisms of classical test theory.Psychometrika, 42, 193–198.

    Google Scholar 

  • Schulman, R. S., & Haden, R. L. (1975). A test theory model for ordinal measurements.Psychometrika, 40, 455–472.

    Google Scholar 

  • Sedere, M. U., & Feldt, L. S. (1977). The sampling distributions of the Kristof reliability coefficient, the Feldt coefficient, and Guttman's lambda-2.Journal of Educational Measurement, 14, 53–62.

    Google Scholar 

  • Sijtsma, K. (1984). Useful nonparametric scaling: A reply to Jansen.Psychologische Beiträge, 26, 423–437.

    Google Scholar 

  • Sijtsma, K., & Prins, P. M. (1986). Itemselectie in het Mokken model [Item selection in the Mokken model].Tijdschrift voor Onderwijsresearch, 11, 121–129.

    Google Scholar 

  • Stokman, F. N., & Schuur, W. H. van (1980). Basic scaling.Quality and Quantity, 14, 5–30.

    Google Scholar 

  • ten Berge, J. M. F., & Zegers, F. E. (1978). A series of lower bounds to the reliability of a test.Psychometrika, 43, 575–579.

    Google Scholar 

  • ten Berge, J. M. F., Snijders, T. A. B., & Zegers, F. E. (1981). Computational aspects of the greatest lower bound to the reliability and constrained minimum trace factor analysis.Psychometrika, 46, 201–213.

    Google Scholar 

  • Weiss, D. J., & Davison, M. L. (1981). Test theory and methods.Annual Review of Psychology, 32, 629–658.

    Google Scholar 

  • Wollenberg, A. L. van den (1982). Two new test statistics for the Rasch model.Psychometrika, 47, 123–140.

    Google Scholar 

  • Wood, R. (1978). Fitting the Rasch model—A heady tale.British Journal of Mathematical and Statistical Psychology, 31, 27–32.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

The authors are grateful for constructive comments from the reviewers and from Charles Lewis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sijtsma, K., Molenaar, I.W. Reliability of test scores in nonparametric item response theory. Psychometrika 52, 79–97 (1987). https://doi.org/10.1007/BF02293957

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02293957

Key words

Navigation