An IRT Model for Multiple Raters

Verhelst, Norman D.; Verstralen, Huub H. F. M.

doi:10.1007/978-1-4613-0169-1_5

Norman D. Verhelst^9,10 &
Huub H. F. M. Verstralen⁹

Part of the book series: Lecture Notes in Statistics ((LNS,volume 157))

758 Accesses
16 Citations

Abstract

An IRT model for multiple ratings is presented. If it is assumed that the quality of a student performance has a stochastic relationship with the latent variable of interest, it is shown that the ratings of several raters are not conditionally independent given the latent variable. The model gives a full account of this dependence. Several relationships with other models appear to exist. The proposed model is a special case of a nonlinear multilevel model with three levels, but it can also be seen as a linear logistic model with relaxed assumptions (LLRA). Moreover, a linearized version of the model turns out to be a special case of a generalizability model with two crossed measurement facets (items and raters) with a single first-order interaction term (persons and items). Using this linearized model, it is shown how the estimated standard errors of the parameters are affected if the dependence between the ratings is ignored.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Reducing Attenuation Bias in Regression Analyses Involving Rating Scale Data via Psychometric Modeling

Article Open access 01 March 2024

Testing Heterogeneity in Inter-Rater Reliability

A modular approach for item response theory modeling with the R package flirt

Article 15 July 2015

References

Adams, R.J., & Wilson, M.R. (1996). Formulating the Rasch model as a mixed coefficients multinomial logit: A generalized approach to fitting Rasch models. In G. Engelhard & M.R. Wilson (Eds.), Objective measurement III: Theory into practice (pp. 143–166). Norwood, NJ: Ablex.
Google Scholar
Bryk, A.S., Raudenbush, S.W., & Congdon, R.T. (1996). HLM. Hierarchical linear and nonlinear modeling with the HLM/2L and HLM/3L programs [Computer software]. Chicago: Scientific Software.
Google Scholar
Cronbach, L.J., Gleser, G.C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements. New York: Wiley
Google Scholar
Feldt, L.S., & Brennan, R.L. (1989). Reliability. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 105–146). Washington, DC: American Council on Education.
Google Scholar
Fischer G.H. (1974). Einführung in die Theorie psychologischer Tests. Bern: Huber.
MATH Google Scholar
Fischer G.H. (1995a). The linear logistic test model. In G.H. Fischer & I.W. Molenaar (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 131–155). New York: Springer-Verlag.
Google Scholar
Fischer G.H. (1995b). Linear logistic models for change. In G.H. Fischer & I.W. Molenaar (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 157–180). New York: Springer-Verlag.
Google Scholar
Glas, C.A.W., & Verhelst, N.D. (1995a). Testing the Rasch model. In G.H. Fischer & I.W. Molenaar (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 69–95). New York: Springer-Verlag.
Google Scholar
Glas, C.A.W., & Verhelst, N.D. (1995b). Tests of fit for polytomous Rasch models. In G.H. Fischer & I.W. Molenaar (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 326–352). New York: Springer-Verlag.
Google Scholar
Goldstein, H. (1995). Multilevel statistical models (2nd ed.). London: Arnold.
Google Scholar
Goldstein, H., Rasbash, J., Plewis, I., Draper, D., Browne, W., Yang, M., Woodhouse, G., & Healy, M. (1998). A user’s guide to MLwiN [Software manual]. London: Multilevel Models Project, Institute of Education, University of London.
Google Scholar
Hedeker, D., & Gibbons, R.D. (1996). MIXOR: A computer program for mixed-effects ordinal regression analysis. Computer Methods and Programs in Biomedicine, 49, 229–252.
Article Google Scholar
Holland P.W., & Wainer, H. (Eds.). (1993). Differential item functioning. Hillsdale, NJ: Erlbaum.
Google Scholar
Linn, R.L., Baker, E.L., & Dunbar, S.B. (1991). Complex, performancebased assessment: Expectations and validation criteria. Educational Researcher, 20(8), 15–21.
Google Scholar
McCullagh, P., & Neider, J.A. (1989). Generalized linear models (2nd ed.). London: Chapman and Hall.
MATH Google Scholar
Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13–23.
Google Scholar
Rodriguez, G., & Goldman, N. (1995). An assessment of estimation procedures for multilevel models with binary responses. Journal of the Royal Statistical Society, Series A, 158, 73–89.
Google Scholar
Sanders, P.F. (1992). The optimization of decision studies in generalizability theory. Unpublished doctoral dissertation, University of Amsterdam.
Google Scholar
Snijders, T.A.B., & Bosker, R.J. (1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling. London: Sage.
MATH Google Scholar
Van den Wollenberg, A.L. (1982). Two new test statistics for the Rasch model. Psychometrika, 47, 123–140.
Article MATH Google Scholar
Veldhuijzen, N.H., Goldebeld P., & Sanders, P.F. (1993). Klassieke testtheorie en generaliseerbaarheidstheorie [Classical test theory and generalizability theory]. In T.J.H.M. Eggen & P.F. Sanders (Eds.), Psychometrie in de praktijk (pp. 33–82). Arnhem: CITO.
Google Scholar
Verfielst, N.D. (1993). On the standard errors of parameter estimates in the Rasch model (Measurement and Research Department Reports, 93-1). Arnhem: CITO.
Google Scholar
Verhelst, N.D., & Glas, C.A.W. (1995). The one parameter logistic model. In G.H. Fischer & I.W. Molenaar (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 215–237). New York: Springer-Verlag.
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute for Educational Measurement (CITO), P.O. Box 1034, 6801, MG Arnhem, The Netherlands
Norman D. Verhelst & Huub H. F. M. Verstralen
Faculty of Educational Science and Technology, University of Twente, The Netherlands
Norman D. Verhelst

Authors

Norman D. Verhelst
View author publications
You can also search for this author in PubMed Google Scholar
Huub H. F. M. Verstralen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Statistics and Measurement Theory, University of Groningen, Grote Kruisstraat 2/1, 9712 TS, Groningen, The Netherlands
Anne Boomsma , Marijtje A. J. van Duijn & Tom A. B. Snijders , &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Verhelst, N.D., Verstralen, H.H.F.M. (2001). An IRT Model for Multiple Raters. In: Boomsma, A., van Duijn, M.A.J., Snijders, T.A.B. (eds) Essays on Item Response Theory. Lecture Notes in Statistics, vol 157. Springer, New York, NY. https://doi.org/10.1007/978-1-4613-0169-1_5

Download citation

DOI: https://doi.org/10.1007/978-1-4613-0169-1_5
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-95147-8
Online ISBN: 978-1-4613-0169-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

An IRT Model for Multiple Raters

Abstract

Access this chapter

Preview

Similar content being viewed by others

Reducing Attenuation Bias in Regression Analyses Involving Rating Scale Data via Psychometric Modeling

Testing Heterogeneity in Inter-Rater Reliability

A modular approach for item response theory modeling with the R package flirt

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

An IRT Model for Multiple Raters

Abstract

Access this chapter

Preview

Similar content being viewed by others

Reducing Attenuation Bias in Regression Analyses Involving Rating Scale Data via Psychometric Modeling

Testing Heterogeneity in Inter-Rater Reliability

A modular approach for item response theory modeling with the R package flirt

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation