Reliability of test scores in nonparametric item response theory

Sijtsma, Klaas; Molenaar, Ivo W.

doi:10.1007/BF02293957

Reliability of test scores in nonparametric item response theory

Published: March 1987

Volume 52, pages 79–97, (1987)
Cite this article

Psychometrika Aims and scope Submit manuscript

Klaas Sijtsma¹ &
Ivo W. Molenaar²

423 Accesses
99 Citations
Explore all metrics

Abstract

Three methods for estimating reliability are studied within the context of nonparametric item response theory. Two were proposed originally by Mokken (1971) and a third is developed in this paper. Using a Monte Carlo strategy, these three estimation methods are compared with four “classical” lower bounds to reliability. Finally, recommendations are given concerning the use of these estimation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Item Response Theory

Efficient Standard Error Formulas of Ability Estimators with Dichotomous Item Response Models

Article 18 February 2015

David Magis

Numerical Differences Between Guttman’s Reliability Coefficients and the GLB

References

Birnbaum, A. (1968). Part V. In F. M. Lord & M. R. Novick (Eds.),Statistical theories of mental test scores. Reading: Addison-Wesley.
Google Scholar
Boomsma, A. (1983).On the robustness of LISREL (maximum likelihood estimation) against small sample size and non-normality. Unpublished doctoral dissertation, University of Groningen.
Cliff, N. (1983). Evaluating Guttman scales: Some old and new thoughts. In H. Wainer & S. Messick (Eds.),Principals of modern psychological measurement. Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Cliff, N. (1984). An improved internal consistency reliability estimate.Journal of Educational Statistics, 9, 151–161.
Google Scholar
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests.Psychometrika, 16, 297–334.
Google Scholar
Feldt, L. S. (1965). The approximate sampling distribution of Kuder-Richardson reliability coefficient twenty.Psychometrika, 30, 357–370.
Google Scholar
Fischer, G. H. (1974).Einführung in die theorie psychologischer tests [Introduction to psychological test theory]. Bern: Huber.
Google Scholar
Gustafsson, J. E. (1977).The Rasch for dichotomous items: Theory, applications and a computer program (Internal Rep. No. 63). Institute of Education, University of Goteborg.
Guttman, L. (1945). A basis for analyzing test-retest reliability.Psychometrika, 10, 255–282.
Google Scholar
Guttman, L. (1950). The basis for scalogram analysis. In S. A. Stouffer, L. Guttman, E. A. Suchman, P. F. Lazarsfeld, S. A. Star, & J. A. Clausen (Eds.),Measurement and prediction. Princeton: Princeton University Press.
Google Scholar
Henning, H. J. (1976). Die Technik der Mokken-Skalenanalyse [The technique of Mokken scale analysis].Psychologische Beiträge, 18, 410–430.
Google Scholar
Horn, J. (1971). Integration of concepts of reliability and standard error of measurement.Educational and Psychological Measurement, 31, 57–74.
Google Scholar
Horst, P. (1953). Correcting the Kuder-Richardson reliability for dispersion of item difficulties.Psychological Bulletin, 50, 371–374.
Google Scholar
Jackson, P. H., & Agunwamba, C. C. (1977). Lower bounds for the reliability of the total score on a test composed of non-homogeneous items: I: Algebraic lower bounds.Psychometrika, 42, 567–578.
Google Scholar
Jansen, P. G. W. (1982a). Homogenitätsmessung mit Hilfe des Koeffizienten H von Loevinger: Eine kritische Diskussion [Measuring homogeneity by means of Loevinger's coefficient H: A critical discussion].Psychologische Beiträge, 24, 96–105.
Google Scholar
Jansen, P. G. W. (1982b). De onbruikbaarheid van Mokkenschaalanalyse [On the uselessness of Mokken scale analysis].Tijdschrift voor Onderwijsresearch, 7, 11–24.
Google Scholar
Jansen, P. G. W. (1983).Rasch analysis of attitudinal data. Unpublished doctoral dissertation. Den Haag: Rijks Psychologische Dienst.
Google Scholar
Jansen, P. G. W., Roskam, E.E.Ch.I., & Wollenberg, A. L. van den (1982). De Mokkenschaal gewogen [Weighing the Mokken scale].Tijdschrift voor Onderwijsresearch, 7, 31–42.
Google Scholar
Kristof, W. (1963). The statistical theory of stepped-up reliability coefficients when a test has been divided into several equivalent parts.Psychometrika, 28, 221–238.
Google Scholar
Lewis, C. (1983). Bayesian inference for latent abilities. In S. B. Anderson & J. S. Helmick (Eds.),On educational testing. San Francisco: Jossey-Bass.
Google Scholar
Loevinger, J. (1948). The technique of homogeneous tests compared with some aspects of “scale analysis” and factor analysis.Psychological Bulletin, 45, 507–530.
Google Scholar
Lord, F. M. (1980).Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Lord, F. M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability.Psychometrika, 48, 233–245.
Google Scholar
Lord, F. M., & Novick, M. R. (1968).Statistical theories of mental test scores. Reading: Addison-Wesley.
Google Scholar
Lumsden, J. (1976). Test theory.Annual Review of Psychology, 27, 251–280.
Google Scholar
Mokken, R. J. (1971).A theory and procedure of scale analysis. The Hague: Mouton.
Google Scholar
Mokken, R. J., & Lewis, C. (1982). A nonparametric approach to the analysis of dichotomous item responses.Applied Psychological Measurement, 6, 417–430.
Google Scholar
Molenaar, I. W. (1982a). Mokken scaling revisited.Kwantitatieve Methoden, 8, 145–164.
Google Scholar
Molenaar, I. W. (1982b). Een tweede weging van de Mokkenschaal [A second weighing of the Mokken scale].Tijdschrift voor Onderwijsresearch, 7, 172–181.
Google Scholar
Molenaar, I. W. (1982c). De beperkte bruikbaarheid van Jansen's kritiek [On the limited usefulness of Jansen's criticisms].Tijdschrift voor Onderwijsresearch, 7, 25–30.
Google Scholar
Molenaar, I. W., & Sijtsma, K. (1984). Internal consistency and reliability in Mokken's nonparametric item response model.Tijdschrift voor Onderwijsresearch, 9, 257–268.
Google Scholar
Oosterloo, S. (1984). Confidence intervals for test information and relative efficiency.Statistica Neerlandica, 38, 37–53.
Google Scholar
Samejima, F. (1977). Weakly parallel tests in latent trait theory with some criticisms of classical test theory.Psychometrika, 42, 193–198.
Google Scholar
Schulman, R. S., & Haden, R. L. (1975). A test theory model for ordinal measurements.Psychometrika, 40, 455–472.
Google Scholar
Sedere, M. U., & Feldt, L. S. (1977). The sampling distributions of the Kristof reliability coefficient, the Feldt coefficient, and Guttman's lambda-2.Journal of Educational Measurement, 14, 53–62.
Google Scholar
Sijtsma, K. (1984). Useful nonparametric scaling: A reply to Jansen.Psychologische Beiträge, 26, 423–437.
Google Scholar
Sijtsma, K., & Prins, P. M. (1986). Itemselectie in het Mokken model [Item selection in the Mokken model].Tijdschrift voor Onderwijsresearch, 11, 121–129.
Google Scholar
Stokman, F. N., & Schuur, W. H. van (1980). Basic scaling.Quality and Quantity, 14, 5–30.
Google Scholar
ten Berge, J. M. F., & Zegers, F. E. (1978). A series of lower bounds to the reliability of a test.Psychometrika, 43, 575–579.
Google Scholar
ten Berge, J. M. F., Snijders, T. A. B., & Zegers, F. E. (1981). Computational aspects of the greatest lower bound to the reliability and constrained minimum trace factor analysis.Psychometrika, 46, 201–213.
Google Scholar
Weiss, D. J., & Davison, M. L. (1981). Test theory and methods.Annual Review of Psychology, 32, 629–658.
Google Scholar
Wollenberg, A. L. van den (1982). Two new test statistics for the Rasch model.Psychometrika, 47, 123–140.
Google Scholar
Wood, R. (1978). Fitting the Rasch model—A heady tale.British Journal of Mathematical and Statistical Psychology, 31, 27–32.
Google Scholar

Download references

Author information

Authors and Affiliations

Vakgroep Arbeids- en Organisatiepsychologie, Free University, De Boelelaan 1081, 1081 HV, Amsterdam, The Netherlands
Klaas Sijtsma
University of Groningen, The Netherlands
Ivo W. Molenaar

Authors

Klaas Sijtsma
View author publications
You can also search for this author in PubMed Google Scholar
Ivo W. Molenaar
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

The authors are grateful for constructive comments from the reviewers and from Charles Lewis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sijtsma, K., Molenaar, I.W. Reliability of test scores in nonparametric item response theory. Psychometrika 52, 79–97 (1987). https://doi.org/10.1007/BF02293957

Download citation

Received: 12 April 1985
Revised: 09 April 1986
Issue Date: March 1987
DOI: https://doi.org/10.1007/BF02293957

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Reliability of test scores in nonparametric item response theory

Abstract

Access this article

Similar content being viewed by others

Item Response Theory

Efficient Standard Error Formulas of Ability Estimators with Dichotomous Item Response Models

Numerical Differences Between Guttman’s Reliability Coefficients and the GLB

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Reliability of test scores in nonparametric item response theory

Abstract

Access this article

Similar content being viewed by others

Item Response Theory

Efficient Standard Error Formulas of Ability Estimators with Dichotomous Item Response Models

Numerical Differences Between Guttman’s Reliability Coefficients and the GLB

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation