Item Response Theory

Mair, Patrick

doi:10.1007/978-3-319-93177-7_4

Patrick Mair⁵

Part of the book series: Use R! ((USE R))

6097 Accesses

Abstract

Item response theory (IRT) is a psychometric modeling framework for analyzing categorical data from questionnaires, tests, and other instruments that aim to measure underlying latent traits. Simply speaking, these models estimate a parameter for each item, as well as a parameter for each person. Depending on how many latent traits are involved, a core distinction in IRT is unidimensional vs. multidimensional IRT models. Hence, dimensionality assessment is important before fitting an IRT model, as elaborated in the first section. Subsequently, the focus is on various classical unidimensional models for dichotomous as well as polytomous input data. Afterward, three sections cover various special topics in IRT: item/test information, sample size determination, and differential item functioning, where differences in the item parameters are examined across person subgroups. Some modern IRT flavors are presented in final three sections on multidimensional IRT, longitudinal IRT, and Bayesian IRT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Detailed elaborations on differences between IRT, CTT, and factor analysis can be found in Rusch et al. (2017).
2.
We use the easiness parameterization in order to be consistent with the implementation in the eRm package.
3.
In the Rasch model, it does not matter which particular items are solved correctly, as long as the sum scores are the same. This property is called sufficiency .
4.
Note that in order to be consistent with the ltm package output, we state the model in terms of difficulty parameters, i.e., (θ _v − β _i).
5.
Due to space restrictions, we do not show the dimensionality assessment for this example. In practice, this should be done prior to fitting the 2-PL.
6.
Again, due to space restrictions, we show the first six parameters only.
7.
We continue to use ICC as abbreviation.
8.
For connections between the GRM and the GPCM, see Ostini and Nering (2005).
9.
In the book by de Ayala (2009), almost each chapter gives some guidelines of how large a calibration sample should be with respect to a particular IRT model.
10.
\(\chi ^2_{13}\) is an additional LR-test for M ₁ vs. M ₃.
11.
The "MathExam14W" dataset from the package is already prepared that way. Here we bring it back to the standard data frame form and illustrate how to get them in shape for the tree function call.
12.
Note that we fitted these models already in Sect. 4.1.2 on dimensionality assessment.
13.
There is some ambiguity in AIC/BIC in relation to the LR-test result. We could further explore the fit of the 1D solution via M2( zar1d) which suggests a slight misfit.
14.
Note that there is also the option to specify it in lavaan and convert it into mirt syntax using the lavaan2mirt function in the sirt package. This can be especially useful for more complicated models with constraints.
15.
See Luo and Jiao (2017) for how to specify IRT models in Stan.
16.
Thanks to Peter Franz for sharing this dataset.

References

Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140.
Article MathSciNet Google Scholar
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561–573.
Article Google Scholar
Beaujean, A. A. (2014). Latent variable modeling using R : A step-by-step guide. New York: Routledge.
Book Google Scholar
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In: F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading: Addison-Wesley.
Google Scholar
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51.
Article Google Scholar
Bond, T. G., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences (3rd ed.). New York: Routledge.
Book Google Scholar
Cai, L. (2010). A two-tier full-information item factor analysis model with applications. Psychometrika, 75, 581–612.
Article MathSciNet Google Scholar
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. http://www.jstatsoft.org/v48/i06/
Article Google Scholar
Chalmers, R. P. (2017). SimDesign: Structure for organizing Monte Carlo simulation designs. R package version 1.6. https://CRAN.R-project.org/package=SimDesign
Choi, S., Gibbons, L., & Crane, P. (2011). lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. Journal of Statistical Software, 39(1), 1–30. https://www.jstatsoft.org/index.php/jss/article/view/v039i08
Google Scholar
de Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press.
Google Scholar
De Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Hofman, A., Tuerlinckx, F., & Partchev, I. (2011). The estimation of item response models with the lmer function from the lme4 package in R. Journal of Statistical Software 39(1), 1–28. https://www.jstatsoft.org/index.php/jss/article/view/v039i12
Google Scholar
Finch, W. H., Jr., & French, B. F. (2015). Latent variable modeling with R. New York: Routledge.
Google Scholar
Fischer, G. H. (1995). Linear logistic models for change. In: G. Fischer & I. Molenaar (Eds.), Rasch models: Foundations, recent developements, and applications (pp. 157–180). New York: Springer.
Chapter Google Scholar
Fischer, G. H., & Molenaar, I. W. (1995). Rasch models: Foundations, recent developements, and applications. New York: Springer.
Book Google Scholar
Fox, J. P. (2010). Bayesian item response modeling. New York: Springer.
Book Google Scholar
Funk, J. B., Fox, C. M., Chang, M., & Curtiss, K. (2008). The development of the children’s empathic attitudes questionnaire using classical and Rasch analyses. Journal of Applied Developmental Psychology, 29, 187–196.
Article Google Scholar
Glück, J., & Spiel, C. (1997). Item response models for repeated measures designs: Application and limitations of four different approaches. Methods of Psychological Research, 2(6). http://www.dgps.de/fachgruppen/methoden/mpr-online/issue2/art6/article.html
Hatzinger, R., & Rusch, T. (2009). IRT models with relaxed assumptions in eRm: A manual-like instruction. Psychology Science Quarterly, 51, 87–120.
Google Scholar
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R. New York: Springer.
Book Google Scholar
Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology, 7(109), 1–10.
Google Scholar
Koller, I., & Alexandrowicz, R. W. (2010). Eine psychometrische Analyse der ZAREKI-R mittels Rasch-Modellen [A psychometric analysis of the ZAREKI-R using Rasch-models]. Diagnostica, 56, 57–67.
Article Google Scholar
Koller, I., Levenson, M. R., & Glück, J. (2017). What do you think you are measuring? A mixed-methods procedure for assessing the content validity of test items and theory-based scaling. Frontiers in Psychology, 8(126), 1–20.
Google Scholar
Komboz, B., Zeileis, A., & Strobl, C. (2018, Forthcoming). Tree-based global model tests for polytomous Rasch models. Educational and Psychological Measurement, 78, 128–166.
Article Google Scholar
Levenson, M. R., Jennings, P. A., Aldwin, C. M., & Shiraishi, R. W. (2005). Self-transcendence: Conceptualization and measurement. The International Journal of Aging and Human Development, 60, 127–143.
Article Google Scholar
Luo, Y., & Jiao, H. (2017). Using the Stan program for Bayesian item response theory. Educational and Psychological Measurement, 77, 1–25.
Google Scholar
Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847–862.
Article Google Scholar
Mair, P., & De Leeuw, J. (2017). Gifi: Multivariate analysis with optimal scaling. R package version 0.3-2. https://R-Forge.R-project.org/projects/psychor/
Mair, P., & Hatzinger, R. (2007a). CML based estimation of extended Rasch models with the eRm package in R. Psychology Science Quarterly, 49, 26–43.
Google Scholar
Mair, P., & Hatzinger, R. (2007b). Extended Rasch modeling: The eRm package for the application of IRT models in R. Journal of Statistical Software, 20(9), 1–20.
Article Google Scholar
Mair, P., Hofmann, E., Gruber, K., Hatzinger, R., Zeileis, A., & Hornik, K. (2015). Motivation, values, and work design as drivers of participation in the R open source project for statistical computing. Proceedings of the National Academy of Sciences of the United States of America 112(48), 14788–14792.
Article Google Scholar
Martin, A. D., Quinn, K. M., & Park, J. H. (2011). MCMCpack: Markov Chain Monte Carlo in R. Journal of Statistical Software, 42(9), 1–22. http://www.jstatsoft.org/v42/i09/
Article Google Scholar
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
Article Google Scholar
Maydeu-Olivares, A. (2015). Evaluating the fit of IRT models. In: S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (pp. 111–127). New York: Routledge.
Google Scholar
Maydeu-Olivares, A., & Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in 2ⁿ contingency tables: A unified framework. Journal of the American Statistical Association, 100, 1009–1020.
Article MathSciNet Google Scholar
Morgeson, F. P., & Humphrey, S. E. (2006). The work design questionnaire (WDQ): Developing and validating a comprehensive measure for assessing job design and the nature of work. Journal of Applied Psychology, 91, 1321–1339.
Article Google Scholar
Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 59–71.
Article Google Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.
Article Google Scholar
Natesan, P., Nandakumar, R., Minka, T., & Rubright, J. D. (2016). Bayesian prior choice in IRT estimation using MCMC and variational Bayes. Frontiers in Psychology, 7(1422), 1–11.
Google Scholar
Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd ed.). Thoursand Oaks: Sage.
Book Google Scholar
Ostini, R., & Nering, M. L. (2005). Polytomous item response theory models. Thousand Oaks: Sage.
Google Scholar
Ponocny, I. (2001). Nonparametric goodness-of-fit tests for the Rasch model. Psychometrika, 66, 437–459.
Article MathSciNet Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
Google Scholar
Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In Proceedings of the IV. Berkeley Symposium on Mathematical Statistics and Probability (Vol. IV, pp. 321–333). Berkeley: University of California Press.
Google Scholar
Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.
Book Google Scholar
Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47, 667–696.
Article Google Scholar
Revelle, W. (2017). psych: Procedures for psychological, psychometric, and personality research. R package version 1.7.8. http://CRAN.R-project.org/package=psych
Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25. http://www.jstatsoft.org/v17/i05/
Article Google Scholar
Robitzsch, A. (2017). sirt: Supplementary item response theory models. R package version 1.15-41. https://CRAN.R-project.org/package=sirt
Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software 48(2), 1–36. http://www.jstatsoft.org/v48/i02/
Article Google Scholar
Rusch, T., Lowry, P. B., Mair, P., & Treiblmaier, H. (2017). Breaking free from the limitations of classical test theory: Developing and measuring information systems scales using item response theory. Information & Management, 54, 189–203.
Article Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometrika monograph supplement, Vol. 17). Chicago: Psychometric Society.
Google Scholar
Sidanius, J., & Pratto, F. (2001). Social dominance: An intergroup theory of social hierarchy and oppression. Cambridge: Cambridge University Press.
Google Scholar
Strobl, C., Kopf, J., & Zeileis, A. (2015). Rasch trees: A new method for detecting differential item functioning in the Rasch model. Psychometrika, 80, 289–316.
Article MathSciNet Google Scholar
Suárez-Falcón, J. C., & Glas, C. A. W. (2003). Evaluation of global testing procedures for item fit to the Rasch model. British Journal of Mathematical and Statistical Society, 56, 127–143.
MathSciNet Google Scholar
Takane, Y., & De Leeuw, J. (1986). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408.
Article MathSciNet Google Scholar
Tutz, G., & Schauberger, G. (2015). A penalty approach to differential item functioning in Rasch models. Psychometrika, 80, 21–43.
Article MathSciNet Google Scholar
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. http://www.jstatsoft.org/v45/i03/
Article Google Scholar
Vaughn-Coaxum, R., Mair, P., & Weisz, J. R. (2016). Racial/ethnic differences in youth depression indicators: An item response theory analysis of symptoms reported by White, Black, Asian, and Latino youths. Clinical Psychological Science, 4, 239–253.
Article Google Scholar
Verhelst, N. D., Hatzinger, R., & Mair, P. (2007). The Rasch sampler. Journal of Statistical Software, 20(4), 1–14. https://www.jstatsoft.org/article/view/v020i04/
Article Google Scholar
von Aster, M., Weinhold Zulauf, M., & Horn, R. (2006). Neuropsychologische Testbatterie für Zahlenverarbeitung und Rechnen bei Kindern (ZAREKI-R) [Neuropsychological Test Battery for Number Processing and Calculation in Children]. Frankfurt: Harcourt Test Services.
Google Scholar
Wang, X., Berger, J. O., & Burdick, D. S. (2013). Bayesian analysis of dynamic item response models in educational testing. The Annals of Applied Statistics, 7, 126–153.
Article MathSciNet Google Scholar
Wilmer, J. B., Chabris L. G. C. F., Chatterjee, G., Gerbasi, M., & Nakayama, K. (2012). Capturing specific abilities as a window into human individuality: The example of face recognition. Cognitive Neuropsychology, 29, 360–392.
Article Google Scholar
Wilson, G. D., & Patterson, J. R. (1968). A new measure of conservatism. British Journal of Social and Clinical Psychology, 7, 264–269.
Article Google Scholar
Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12, 58–79.
Article Google Scholar
Woolley, A. W., Gerbasi, M. E., Chabris, C. F., Kosslyn, S. M., & Hackman, J. R. (2008). Bringing in the experts: How team ability composition and collaborative planning jointly shape analytic effectiveness. Small Group Research, 39, 352–371.
Article Google Scholar
Yen, W. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245–262.
Article Google Scholar
Zeileis, A., Hothorn, T., & Hornik, K. (2008) Model-based recursive partitioning. Journal of Computational and Graphical Statistics, 17, 492–514.
Article MathSciNet Google Scholar
Zeileis, A., Strobl, C., Wickelmaier, F., Komboz, B., & Kopf, J. (2016). psychotools: Infrastructure for psychometric modeling. R package version 0.4-2. https://CRAN.R-project.org/package=psychotools
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (Ordinal) item scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, Harvard University, Cambridge, MA, USA
Patrick Mair

Authors

Patrick Mair
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mair, P. (2018). Item Response Theory. In: Modern Psychometrics with R. Use R!. Springer, Cham. https://doi.org/10.1007/978-3-319-93177-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-93177-7_4
Published: 21 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93175-3
Online ISBN: 978-3-319-93177-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics