Skip to main content
Log in

A Note on the Reliability Coefficients for Item Response Model-Based Ability Estimates

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Assuming item parameters on a test are known constants, the reliability coefficient for item response theory (IRT) ability estimates is defined for a population of examinees in two different ways: as (a) the product-moment correlation between ability estimates on two parallel forms of a test and (b) the squared correlation between the true abilities and estimates. Due to the bias of IRT ability estimates, the parallel-forms reliability coefficient is not generally equal to the squared-correlation reliability coefficient. It is shown algebraically that the parallel-forms reliability coefficient is expected to be greater than the squared-correlation reliability coefficient, but the difference would be negligible in a practical sense.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • AERA, APA & NCME (1985/1999). Standards for educational and psychological testing. Washington, D.C.: Author.

    Google Scholar 

  • Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459.

    Article  Google Scholar 

  • Bock, R.D., & Mislevy, R.J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431–444.

    Article  Google Scholar 

  • Green, B.F., Bock, R.D., Humphreys, L.G., Linn, R.L., & Reckase, M.D. (1984). Technical guidelines for assessing computerized adaptive tests. Journal of Educational Measurement, 21, 347–360.

    Article  Google Scholar 

  • Feldt, L.S., & Brennan, R.L. (1989). Reliability. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 105–146). New York: Macmillan.

    Google Scholar 

  • Feldt, L.S., Steffen, M., & Gupta, N.C. (1985). A comparison of five methods for estimating the standard error of measurement at specific score levels. Applied Psychological Measurement, 9, 351–361.

    Article  Google Scholar 

  • Haertel, E.H. (2006). Reliability. In R.L. Brennan (Ed.), Educational measurement (4th ed., pp. 65–110). Westport, CT: American Council on Education and Praeger.

    Google Scholar 

  • Kim, J.K., & Nicewander, W.A. (1993). Ability estimation for conventional tests. Psychometrika, 58, 587–599.

    Article  Google Scholar 

  • Lord, F.M. (1980). Applications of item response theory to practical testing applications. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Lord, F.M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48, 233–245.

    Article  Google Scholar 

  • Lord, F.M. (1986). Maximum likelihood and Bayesian parameter estimation in item response theory. Journal of Educational Measurement, 23, 157–162.

    Article  Google Scholar 

  • Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Mellenbergh, G.J. (1996). Measurement precision in test score and item response models. Psychological Methods, 1, 293–299.

    Article  Google Scholar 

  • Nicewander, W.A., & Thomasson, G.L. (1999). Some reliability estimates for computerized adaptive tests. Applied Psychological Measurement, 23, 239–247.

    Article  Google Scholar 

  • Raju, N.S., & Oshima, T.C. (2005). Two prophecy formulas for assessing the reliability of item response theory-based ability estimates. Educational and Psychological Measurement, 65, 361–375.

    Article  Google Scholar 

  • Samejima, F. (1994). Estimation of reliability coefficients using the test information and its modifications. Applied Psychological Measurement, 18, 229–244.

    Article  Google Scholar 

  • Sireci, S.G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237–247.

    Article  Google Scholar 

  • Sympson, J.B. (1980). Estimating the reliability of adaptive tests from a single test administration. Paper presented at the annual meeting of the American Educational Research Association, Boston, April 1980

  • Thissen, D. (1990). Reliability and measurement precision. In H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 161–186). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Thissen, D. (1991). MULTILOG: multiple, categorical item analysis and test scoring using item response theory [Computer program]. Chicago: Scientific Software International.

    Google Scholar 

  • Warm, T.A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427–450.

    Article  Google Scholar 

  • Weiss, D.J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6, 473–492.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seonghoon Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, S. A Note on the Reliability Coefficients for Item Response Model-Based Ability Estimates. Psychometrika 77, 153–162 (2012). https://doi.org/10.1007/s11336-011-9238-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-011-9238-0

Key words

Navigation