Abstract
Item response theory (IRT) is a psychometric modeling framework for analyzing categorical data from questionnaires, tests, and other instruments that aim to measure underlying latent traits. Simply speaking, these models estimate a parameter for each item, as well as a parameter for each person. Depending on how many latent traits are involved, a core distinction in IRT is unidimensional vs. multidimensional IRT models. Hence, dimensionality assessment is important before fitting an IRT model, as elaborated in the first section. Subsequently, the focus is on various classical unidimensional models for dichotomous as well as polytomous input data. Afterward, three sections cover various special topics in IRT: item/test information, sample size determination, and differential item functioning, where differences in the item parameters are examined across person subgroups. Some modern IRT flavors are presented in final three sections on multidimensional IRT, longitudinal IRT, and Bayesian IRT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Detailed elaborations on differences between IRT, CTT, and factor analysis can be found in Rusch et al. (2017).
- 2.
We use the easiness parameterization in order to be consistent with the implementation in the eRm package.
- 3.
In the Rasch model, it does not matter which particular items are solved correctly, as long as the sum scores are the same. This property is called sufficiency .
- 4.
Note that in order to be consistent with the ltm package output, we state the model in terms of difficulty parameters, i.e., (θ v − β i).
- 5.
Due to space restrictions, we do not show the dimensionality assessment for this example. In practice, this should be done prior to fitting the 2-PL.
- 6.
Again, due to space restrictions, we show the first six parameters only.
- 7.
We continue to use ICC as abbreviation.
- 8.
For connections between the GRM and the GPCM, see Ostini and Nering (2005).
- 9.
In the book by de Ayala (2009), almost each chapter gives some guidelines of how large a calibration sample should be with respect to a particular IRT model.
- 10.
\(\chi ^2_{13}\) is an additional LR-test for M 1 vs. M 3.
- 11.
The "MathExam14W" dataset from the package is already prepared that way. Here we bring it back to the standard data frame form and illustrate how to get them in shape for the tree function call.
- 12.
Note that we fitted these models already in Sect. 4.1.2 on dimensionality assessment.
- 13.
There is some ambiguity in AIC/BIC in relation to the LR-test result. We could further explore the fit of the 1D solution via M2( zar1d) which suggests a slight misfit.
- 14.
Note that there is also the option to specify it in lavaan and convert it into mirt syntax using the lavaan2mirt function in the sirt package. This can be especially useful for more complicated models with constraints.
- 15.
See Luo and Jiao (2017) for how to specify IRT models in Stan.
- 16.
Thanks to Peter Franz for sharing this dataset.
References
Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561–573.
Beaujean, A. A. (2014). Latent variable modeling using R : A step-by-step guide. New York: Routledge.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In: F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading: Addison-Wesley.
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51.
Bond, T. G., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences (3rd ed.). New York: Routledge.
Cai, L. (2010). A two-tier full-information item factor analysis model with applications. Psychometrika, 75, 581–612.
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. http://www.jstatsoft.org/v48/i06/
Chalmers, R. P. (2017). SimDesign: Structure for organizing Monte Carlo simulation designs. R package version 1.6. https://CRAN.R-project.org/package=SimDesign
Choi, S., Gibbons, L., & Crane, P. (2011). lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. Journal of Statistical Software, 39(1), 1–30. https://www.jstatsoft.org/index.php/jss/article/view/v039i08
de Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press.
De Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Hofman, A., Tuerlinckx, F., & Partchev, I. (2011). The estimation of item response models with the lmer function from the lme4 package in R. Journal of Statistical Software 39(1), 1–28. https://www.jstatsoft.org/index.php/jss/article/view/v039i12
Finch, W. H., Jr., & French, B. F. (2015). Latent variable modeling with R. New York: Routledge.
Fischer, G. H. (1995). Linear logistic models for change. In: G. Fischer & I. Molenaar (Eds.), Rasch models: Foundations, recent developements, and applications (pp. 157–180). New York: Springer.
Fischer, G. H., & Molenaar, I. W. (1995). Rasch models: Foundations, recent developements, and applications. New York: Springer.
Fox, J. P. (2010). Bayesian item response modeling. New York: Springer.
Funk, J. B., Fox, C. M., Chang, M., & Curtiss, K. (2008). The development of the children’s empathic attitudes questionnaire using classical and Rasch analyses. Journal of Applied Developmental Psychology, 29, 187–196.
Glück, J., & Spiel, C. (1997). Item response models for repeated measures designs: Application and limitations of four different approaches. Methods of Psychological Research, 2(6). http://www.dgps.de/fachgruppen/methoden/mpr-online/issue2/art6/article.html
Hatzinger, R., & Rusch, T. (2009). IRT models with relaxed assumptions in eRm: A manual-like instruction. Psychology Science Quarterly, 51, 87–120.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R. New York: Springer.
Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology, 7(109), 1–10.
Koller, I., & Alexandrowicz, R. W. (2010). Eine psychometrische Analyse der ZAREKI-R mittels Rasch-Modellen [A psychometric analysis of the ZAREKI-R using Rasch-models]. Diagnostica, 56, 57–67.
Koller, I., Levenson, M. R., & Glück, J. (2017). What do you think you are measuring? A mixed-methods procedure for assessing the content validity of test items and theory-based scaling. Frontiers in Psychology, 8(126), 1–20.
Komboz, B., Zeileis, A., & Strobl, C. (2018, Forthcoming). Tree-based global model tests for polytomous Rasch models. Educational and Psychological Measurement, 78, 128–166.
Levenson, M. R., Jennings, P. A., Aldwin, C. M., & Shiraishi, R. W. (2005). Self-transcendence: Conceptualization and measurement. The International Journal of Aging and Human Development, 60, 127–143.
Luo, Y., & Jiao, H. (2017). Using the Stan program for Bayesian item response theory. Educational and Psychological Measurement, 77, 1–25.
Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847–862.
Mair, P., & De Leeuw, J. (2017). Gifi: Multivariate analysis with optimal scaling. R package version 0.3-2. https://R-Forge.R-project.org/projects/psychor/
Mair, P., & Hatzinger, R. (2007a). CML based estimation of extended Rasch models with the eRm package in R. Psychology Science Quarterly, 49, 26–43.
Mair, P., & Hatzinger, R. (2007b). Extended Rasch modeling: The eRm package for the application of IRT models in R. Journal of Statistical Software, 20(9), 1–20.
Mair, P., Hofmann, E., Gruber, K., Hatzinger, R., Zeileis, A., & Hornik, K. (2015). Motivation, values, and work design as drivers of participation in the R open source project for statistical computing. Proceedings of the National Academy of Sciences of the United States of America 112(48), 14788–14792.
Martin, A. D., Quinn, K. M., & Park, J. H. (2011). MCMCpack: Markov Chain Monte Carlo in R. Journal of Statistical Software, 42(9), 1–22. http://www.jstatsoft.org/v42/i09/
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
Maydeu-Olivares, A. (2015). Evaluating the fit of IRT models. In: S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (pp. 111–127). New York: Routledge.
Maydeu-Olivares, A., & Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in 2n contingency tables: A unified framework. Journal of the American Statistical Association, 100, 1009–1020.
Morgeson, F. P., & Humphrey, S. E. (2006). The work design questionnaire (WDQ): Developing and validating a comprehensive measure for assessing job design and the nature of work. Journal of Applied Psychology, 91, 1321–1339.
Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 59–71.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.
Natesan, P., Nandakumar, R., Minka, T., & Rubright, J. D. (2016). Bayesian prior choice in IRT estimation using MCMC and variational Bayes. Frontiers in Psychology, 7(1422), 1–11.
Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd ed.). Thoursand Oaks: Sage.
Ostini, R., & Nering, M. L. (2005). Polytomous item response theory models. Thousand Oaks: Sage.
Ponocny, I. (2001). Nonparametric goodness-of-fit tests for the Rasch model. Psychometrika, 66, 437–459.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In Proceedings of the IV. Berkeley Symposium on Mathematical Statistics and Probability (Vol. IV, pp. 321–333). Berkeley: University of California Press.
Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.
Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47, 667–696.
Revelle, W. (2017). psych: Procedures for psychological, psychometric, and personality research. R package version 1.7.8. http://CRAN.R-project.org/package=psych
Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25. http://www.jstatsoft.org/v17/i05/
Robitzsch, A. (2017). sirt: Supplementary item response theory models. R package version 1.15-41. https://CRAN.R-project.org/package=sirt
Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software 48(2), 1–36. http://www.jstatsoft.org/v48/i02/
Rusch, T., Lowry, P. B., Mair, P., & Treiblmaier, H. (2017). Breaking free from the limitations of classical test theory: Developing and measuring information systems scales using item response theory. Information & Management, 54, 189–203.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometrika monograph supplement, Vol. 17). Chicago: Psychometric Society.
Sidanius, J., & Pratto, F. (2001). Social dominance: An intergroup theory of social hierarchy and oppression. Cambridge: Cambridge University Press.
Strobl, C., Kopf, J., & Zeileis, A. (2015). Rasch trees: A new method for detecting differential item functioning in the Rasch model. Psychometrika, 80, 289–316.
Suárez-Falcón, J. C., & Glas, C. A. W. (2003). Evaluation of global testing procedures for item fit to the Rasch model. British Journal of Mathematical and Statistical Society, 56, 127–143.
Takane, Y., & De Leeuw, J. (1986). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408.
Tutz, G., & Schauberger, G. (2015). A penalty approach to differential item functioning in Rasch models. Psychometrika, 80, 21–43.
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. http://www.jstatsoft.org/v45/i03/
Vaughn-Coaxum, R., Mair, P., & Weisz, J. R. (2016). Racial/ethnic differences in youth depression indicators: An item response theory analysis of symptoms reported by White, Black, Asian, and Latino youths. Clinical Psychological Science, 4, 239–253.
Verhelst, N. D., Hatzinger, R., & Mair, P. (2007). The Rasch sampler. Journal of Statistical Software, 20(4), 1–14. https://www.jstatsoft.org/article/view/v020i04/
von Aster, M., Weinhold Zulauf, M., & Horn, R. (2006). Neuropsychologische Testbatterie für Zahlenverarbeitung und Rechnen bei Kindern (ZAREKI-R) [Neuropsychological Test Battery for Number Processing and Calculation in Children]. Frankfurt: Harcourt Test Services.
Wang, X., Berger, J. O., & Burdick, D. S. (2013). Bayesian analysis of dynamic item response models in educational testing. The Annals of Applied Statistics, 7, 126–153.
Wilmer, J. B., Chabris L. G. C. F., Chatterjee, G., Gerbasi, M., & Nakayama, K. (2012). Capturing specific abilities as a window into human individuality: The example of face recognition. Cognitive Neuropsychology, 29, 360–392.
Wilson, G. D., & Patterson, J. R. (1968). A new measure of conservatism. British Journal of Social and Clinical Psychology, 7, 264–269.
Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12, 58–79.
Woolley, A. W., Gerbasi, M. E., Chabris, C. F., Kosslyn, S. M., & Hackman, J. R. (2008). Bringing in the experts: How team ability composition and collaborative planning jointly shape analytic effectiveness. Small Group Research, 39, 352–371.
Yen, W. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245–262.
Zeileis, A., Hothorn, T., & Hornik, K. (2008) Model-based recursive partitioning. Journal of Computational and Graphical Statistics, 17, 492–514.
Zeileis, A., Strobl, C., Wickelmaier, F., Komboz, B., & Kopf, J. (2016). psychotools: Infrastructure for psychometric modeling. R package version 0.4-2. https://CRAN.R-project.org/package=psychotools
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (Ordinal) item scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Mair, P. (2018). Item Response Theory. In: Modern Psychometrics with R. Use R!. Springer, Cham. https://doi.org/10.1007/978-3-319-93177-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-93177-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93175-3
Online ISBN: 978-3-319-93177-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)