Skip to main content

Item Response Theory

  • Chapter
  • First Online:
Modern Psychometrics with R

Part of the book series: Use R! ((USE R))

  • 6097 Accesses

Abstract

Item response theory (IRT) is a psychometric modeling framework for analyzing categorical data from questionnaires, tests, and other instruments that aim to measure underlying latent traits. Simply speaking, these models estimate a parameter for each item, as well as a parameter for each person. Depending on how many latent traits are involved, a core distinction in IRT is unidimensional vs. multidimensional IRT models. Hence, dimensionality assessment is important before fitting an IRT model, as elaborated in the first section. Subsequently, the focus is on various classical unidimensional models for dichotomous as well as polytomous input data. Afterward, three sections cover various special topics in IRT: item/test information, sample size determination, and differential item functioning, where differences in the item parameters are examined across person subgroups. Some modern IRT flavors are presented in final three sections on multidimensional IRT, longitudinal IRT, and Bayesian IRT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Detailed elaborations on differences between IRT, CTT, and factor analysis can be found in Rusch et al. (2017).

  2. 2.

    We use the easiness parameterization in order to be consistent with the implementation in the eRm package.

  3. 3.

    In the Rasch model, it does not matter which particular items are solved correctly, as long as the sum scores are the same. This property is called sufficiency .

  4. 4.

    Note that in order to be consistent with the ltm package output, we state the model in terms of difficulty parameters, i.e., (θ v − β i).

  5. 5.

    Due to space restrictions, we do not show the dimensionality assessment for this example. In practice, this should be done prior to fitting the 2-PL.

  6. 6.

    Again, due to space restrictions, we show the first six parameters only.

  7. 7.

    We continue to use ICC as abbreviation.

  8. 8.

    For connections between the GRM and the GPCM, see Ostini and Nering (2005).

  9. 9.

    In the book by de Ayala (2009), almost each chapter gives some guidelines of how large a calibration sample should be with respect to a particular IRT model.

  10. 10.

    \(\chi ^2_{13}\) is an additional LR-test for M 1 vs. M 3.

  11. 11.

    The "MathExam14W" dataset from the package is already prepared that way. Here we bring it back to the standard data frame form and illustrate how to get them in shape for the tree function call.

  12. 12.

    Note that we fitted these models already in Sect. 4.1.2 on dimensionality assessment.

  13. 13.

    There is some ambiguity in AIC/BIC in relation to the LR-test result. We could further explore the fit of the 1D solution via M2( zar1d) which suggests a slight misfit.

  14. 14.

    Note that there is also the option to specify it in lavaan and convert it into mirt syntax using the lavaan2mirt function in the sirt package. This can be especially useful for more complicated models with constraints.

  15. 15.

    See Luo and Jiao (2017) for how to specify IRT models in Stan.

  16. 16.

    Thanks to Peter Franz for sharing this dataset.

References

  • Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140.

    Article  MathSciNet  Google Scholar 

  • Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561–573.

    Article  Google Scholar 

  • Beaujean, A. A. (2014). Latent variable modeling using R : A step-by-step guide. New York: Routledge.

    Book  Google Scholar 

  • Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In: F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading: Addison-Wesley.

    Google Scholar 

  • Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51.

    Article  Google Scholar 

  • Bond, T. G., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences (3rd ed.). New York: Routledge.

    Book  Google Scholar 

  • Cai, L. (2010). A two-tier full-information item factor analysis model with applications. Psychometrika, 75, 581–612.

    Article  MathSciNet  Google Scholar 

  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. http://www.jstatsoft.org/v48/i06/

    Article  Google Scholar 

  • Chalmers, R. P. (2017). SimDesign: Structure for organizing Monte Carlo simulation designs. R package version 1.6. https://CRAN.R-project.org/package=SimDesign

  • Choi, S., Gibbons, L., & Crane, P. (2011). lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. Journal of Statistical Software, 39(1), 1–30. https://www.jstatsoft.org/index.php/jss/article/view/v039i08

    Google Scholar 

  • de Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press.

    Google Scholar 

  • De Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Hofman, A., Tuerlinckx, F., & Partchev, I. (2011). The estimation of item response models with the lmer function from the lme4 package in R. Journal of Statistical Software 39(1), 1–28. https://www.jstatsoft.org/index.php/jss/article/view/v039i12

    Google Scholar 

  • Finch, W. H., Jr., & French, B. F. (2015). Latent variable modeling with R. New York: Routledge.

    Google Scholar 

  • Fischer, G. H. (1995). Linear logistic models for change. In: G. Fischer & I. Molenaar (Eds.), Rasch models: Foundations, recent developements, and applications (pp. 157–180). New York: Springer.

    Chapter  Google Scholar 

  • Fischer, G. H., & Molenaar, I. W. (1995). Rasch models: Foundations, recent developements, and applications. New York: Springer.

    Book  Google Scholar 

  • Fox, J. P. (2010). Bayesian item response modeling. New York: Springer.

    Book  Google Scholar 

  • Funk, J. B., Fox, C. M., Chang, M., & Curtiss, K. (2008). The development of the children’s empathic attitudes questionnaire using classical and Rasch analyses. Journal of Applied Developmental Psychology, 29, 187–196.

    Article  Google Scholar 

  • Glück, J., & Spiel, C. (1997). Item response models for repeated measures designs: Application and limitations of four different approaches. Methods of Psychological Research, 2(6). http://www.dgps.de/fachgruppen/methoden/mpr-online/issue2/art6/article.html

  • Hatzinger, R., & Rusch, T. (2009). IRT models with relaxed assumptions in eRm: A manual-like instruction. Psychology Science Quarterly, 51, 87–120.

    Google Scholar 

  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R. New York: Springer.

    Book  Google Scholar 

  • Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology, 7(109), 1–10.

    Google Scholar 

  • Koller, I., & Alexandrowicz, R. W. (2010). Eine psychometrische Analyse der ZAREKI-R mittels Rasch-Modellen [A psychometric analysis of the ZAREKI-R using Rasch-models]. Diagnostica, 56, 57–67.

    Article  Google Scholar 

  • Koller, I., Levenson, M. R., & Glück, J. (2017). What do you think you are measuring? A mixed-methods procedure for assessing the content validity of test items and theory-based scaling. Frontiers in Psychology, 8(126), 1–20.

    Google Scholar 

  • Komboz, B., Zeileis, A., & Strobl, C. (2018, Forthcoming). Tree-based global model tests for polytomous Rasch models. Educational and Psychological Measurement, 78, 128–166.

    Article  Google Scholar 

  • Levenson, M. R., Jennings, P. A., Aldwin, C. M., & Shiraishi, R. W. (2005). Self-transcendence: Conceptualization and measurement. The International Journal of Aging and Human Development, 60, 127–143.

    Article  Google Scholar 

  • Luo, Y., & Jiao, H. (2017). Using the Stan program for Bayesian item response theory. Educational and Psychological Measurement, 77, 1–25.

    Google Scholar 

  • Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847–862.

    Article  Google Scholar 

  • Mair, P., & De Leeuw, J. (2017). Gifi: Multivariate analysis with optimal scaling. R package version 0.3-2. https://R-Forge.R-project.org/projects/psychor/

  • Mair, P., & Hatzinger, R. (2007a). CML based estimation of extended Rasch models with the eRm package in R. Psychology Science Quarterly, 49, 26–43.

    Google Scholar 

  • Mair, P., & Hatzinger, R. (2007b). Extended Rasch modeling: The eRm package for the application of IRT models in R. Journal of Statistical Software, 20(9), 1–20.

    Article  Google Scholar 

  • Mair, P., Hofmann, E., Gruber, K., Hatzinger, R., Zeileis, A., & Hornik, K. (2015). Motivation, values, and work design as drivers of participation in the R open source project for statistical computing. Proceedings of the National Academy of Sciences of the United States of America 112(48), 14788–14792.

    Article  Google Scholar 

  • Martin, A. D., Quinn, K. M., & Park, J. H. (2011). MCMCpack: Markov Chain Monte Carlo in R. Journal of Statistical Software, 42(9), 1–22. http://www.jstatsoft.org/v42/i09/

    Article  Google Scholar 

  • Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.

    Article  Google Scholar 

  • Maydeu-Olivares, A. (2015). Evaluating the fit of IRT models. In: S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (pp. 111–127). New York: Routledge.

    Google Scholar 

  • Maydeu-Olivares, A., & Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in 2n contingency tables: A unified framework. Journal of the American Statistical Association, 100, 1009–1020.

    Article  MathSciNet  Google Scholar 

  • Morgeson, F. P., & Humphrey, S. E. (2006). The work design questionnaire (WDQ): Developing and validating a comprehensive measure for assessing job design and the nature of work. Journal of Applied Psychology, 91, 1321–1339.

    Article  Google Scholar 

  • Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 59–71.

    Article  Google Scholar 

  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.

    Article  Google Scholar 

  • Natesan, P., Nandakumar, R., Minka, T., & Rubright, J. D. (2016). Bayesian prior choice in IRT estimation using MCMC and variational Bayes. Frontiers in Psychology, 7(1422), 1–11.

    Google Scholar 

  • Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd ed.). Thoursand Oaks: Sage.

    Book  Google Scholar 

  • Ostini, R., & Nering, M. L. (2005). Polytomous item response theory models. Thousand Oaks: Sage.

    Google Scholar 

  • Ponocny, I. (2001). Nonparametric goodness-of-fit tests for the Rasch model. Psychometrika, 66, 437–459.

    Article  MathSciNet  Google Scholar 

  • Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.

    Google Scholar 

  • Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In Proceedings of the IV. Berkeley Symposium on Mathematical Statistics and Probability (Vol. IV, pp. 321–333). Berkeley: University of California Press.

    Google Scholar 

  • Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.

    Book  Google Scholar 

  • Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47, 667–696.

    Article  Google Scholar 

  • Revelle, W. (2017). psych: Procedures for psychological, psychometric, and personality research. R package version 1.7.8. http://CRAN.R-project.org/package=psych

  • Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25. http://www.jstatsoft.org/v17/i05/

    Article  Google Scholar 

  • Robitzsch, A. (2017). sirt: Supplementary item response theory models. R package version 1.15-41. https://CRAN.R-project.org/package=sirt

  • Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software 48(2), 1–36. http://www.jstatsoft.org/v48/i02/

    Article  Google Scholar 

  • Rusch, T., Lowry, P. B., Mair, P., & Treiblmaier, H. (2017). Breaking free from the limitations of classical test theory: Developing and measuring information systems scales using item response theory. Information & Management, 54, 189–203.

    Article  Google Scholar 

  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometrika monograph supplement, Vol. 17). Chicago: Psychometric Society.

    Google Scholar 

  • Sidanius, J., & Pratto, F. (2001). Social dominance: An intergroup theory of social hierarchy and oppression. Cambridge: Cambridge University Press.

    Google Scholar 

  • Strobl, C., Kopf, J., & Zeileis, A. (2015). Rasch trees: A new method for detecting differential item functioning in the Rasch model. Psychometrika, 80, 289–316.

    Article  MathSciNet  Google Scholar 

  • Suárez-Falcón, J. C., & Glas, C. A. W. (2003). Evaluation of global testing procedures for item fit to the Rasch model. British Journal of Mathematical and Statistical Society, 56, 127–143.

    MathSciNet  Google Scholar 

  • Takane, Y., & De Leeuw, J. (1986). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408.

    Article  MathSciNet  Google Scholar 

  • Tutz, G., & Schauberger, G. (2015). A penalty approach to differential item functioning in Rasch models. Psychometrika, 80, 21–43.

    Article  MathSciNet  Google Scholar 

  • van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. http://www.jstatsoft.org/v45/i03/

    Article  Google Scholar 

  • Vaughn-Coaxum, R., Mair, P., & Weisz, J. R. (2016). Racial/ethnic differences in youth depression indicators: An item response theory analysis of symptoms reported by White, Black, Asian, and Latino youths. Clinical Psychological Science, 4, 239–253.

    Article  Google Scholar 

  • Verhelst, N. D., Hatzinger, R., & Mair, P. (2007). The Rasch sampler. Journal of Statistical Software, 20(4), 1–14. https://www.jstatsoft.org/article/view/v020i04/

    Article  Google Scholar 

  • von Aster, M., Weinhold Zulauf, M., & Horn, R. (2006). Neuropsychologische Testbatterie für Zahlenverarbeitung und Rechnen bei Kindern (ZAREKI-R) [Neuropsychological Test Battery for Number Processing and Calculation in Children]. Frankfurt: Harcourt Test Services.

    Google Scholar 

  • Wang, X., Berger, J. O., & Burdick, D. S. (2013). Bayesian analysis of dynamic item response models in educational testing. The Annals of Applied Statistics, 7, 126–153.

    Article  MathSciNet  Google Scholar 

  • Wilmer, J. B., Chabris L. G. C. F., Chatterjee, G., Gerbasi, M., & Nakayama, K. (2012). Capturing specific abilities as a window into human individuality: The example of face recognition. Cognitive Neuropsychology, 29, 360–392.

    Article  Google Scholar 

  • Wilson, G. D., & Patterson, J. R. (1968). A new measure of conservatism. British Journal of Social and Clinical Psychology, 7, 264–269.

    Article  Google Scholar 

  • Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12, 58–79.

    Article  Google Scholar 

  • Woolley, A. W., Gerbasi, M. E., Chabris, C. F., Kosslyn, S. M., & Hackman, J. R. (2008). Bringing in the experts: How team ability composition and collaborative planning jointly shape analytic effectiveness. Small Group Research, 39, 352–371.

    Article  Google Scholar 

  • Yen, W. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245–262.

    Article  Google Scholar 

  • Zeileis, A., Hothorn, T., & Hornik, K. (2008) Model-based recursive partitioning. Journal of Computational and Graphical Statistics, 17, 492–514.

    Article  MathSciNet  Google Scholar 

  • Zeileis, A., Strobl, C., Wickelmaier, F., Komboz, B., & Kopf, J. (2016). psychotools: Infrastructure for psychometric modeling. R package version 0.4-2. https://CRAN.R-project.org/package=psychotools

  • Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (Ordinal) item scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mair, P. (2018). Item Response Theory. In: Modern Psychometrics with R. Use R!. Springer, Cham. https://doi.org/10.1007/978-3-319-93177-7_4

Download citation

Publish with us

Policies and ethics