Skip to main content
Log in

MCMC estimation and some model-fit analysis of multidimensional IRT models

  • Articles
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

A Bayesian procedure to estimate the three-parameter normal ogive model and a generalization of the procedure to a model with multidimensional ability parameters are presented. The procedure is a generalization of a procedure by Albert (1992) for estimating the two-parameter normal ogive model. The procedure supports analyzing data from multiple populations and incomplete designs. It is shown that restrictions can be imposed on the factor matrix for testing specific hypotheses about the ability structure. The technique is illustrated using simulated and real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ackerman, T.A. (1996a). Developments in multidimensional item response theory.Applied Psychological Measurement, 20, 309–310.

    Google Scholar 

  • Ackerman, T.A. (1996b). Graphical representation of multidimensional item response theory analyses.Applied Psychological Measurement, 20, 311–329.

    Google Scholar 

  • ACT. (1997).ACT Assessment Technical Manual. Iowa City, IA: Author.

    Google Scholar 

  • Albert, J.H. (1992). Bayesian estimation of normal ogive item response functions using Gibbs sampling.Journal of Educational Statistics, 17, 251–269.

    Google Scholar 

  • Andersen, E.B. (1973). A goodness of for test for the Rasch model.Psychometrika, 38, 123–140.

    Article  Google Scholar 

  • Baker, F.B. (1998). An investigation of item parameter recovery characteristics of a Gibbs sampling procedure.Applied Psychological Measurement, 22, 153–169.

    Google Scholar 

  • Bock, R.D., Gibbons, R.D., & Muraki, E. (1988). Full-information factor analysis.Applied Psychological Measurement, 12, 261–280.

    Google Scholar 

  • Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: An application of an EM-algorithm.Psychometrika, 46, 443–459.

    Article  Google Scholar 

  • Bock, R.D., & Schilling, S.G. (1997). High dimensional full-information item factor analysis. In M. Berkane (Ed.),Latent variable modeling and applications of causality (pp. 163–176). New York, NY: Springer.

    Google Scholar 

  • Bock, R.D., & Zimowski, M.F. (1997). Multiple group IRT. In W.J. van der Linden & R.K. Hambleton (Eds.),Handbook of modern item response theory (pp. 433–448). New York, NY: Springer.

    Google Scholar 

  • Box, G., & Tiao, G. (1973).Bayesian inference in statistical analysis. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Bradlow, E.T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets.Psychometrika, 64, 153–168.

    Article  Google Scholar 

  • Cressie, N., & Holland, P.W. (1983). Characterizing the manifest probabilities of latent trait models.Psychometrika, 48, 129–141.

    Google Scholar 

  • Fischer, G.H. (1995). Derivations of the Rasch model. In G.H. Fischer & I.W. Molenaar (Eds.),Rasch models: Foundations, recent developments and applications (pp. 15–38). New York, NY: Springer.

    Google Scholar 

  • Fox, J.P., & Glas, C.A.W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling.Psychometrika, 66, 271–288.

    Article  Google Scholar 

  • Fraser, C. (1988).NOHARM: A computer program for fitting both unidimensional and multidimensional normal ogive models of latent trait theory. Armidale, Australia: University of New England.

    Google Scholar 

  • Gelfand, A.E., & Smith, A.F.M. (1990). Sampling-based approaches to calculating marginal densities.Journal of the American Statistical Association, 85, 398–409.

    Google Scholar 

  • Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (1995).Bayesian data analysis. London: Chapman and Hall.

    Google Scholar 

  • Glas, C.A.W. (1988). The derivation of some tests for the Rasch model from the multinomial distribution.Psychometrika, 53, 525–546.

    Article  Google Scholar 

  • Glas, C.A.W. (1998). Detection of differential item functioning using Lagrange multiplier tests.Statistica Sinica, 8(1). 647–667.

    Google Scholar 

  • Glas, C.A.W. (1999). Modification indices for the 2-pl and the nominal response model.Psychometrika, 64, 273–294.

    Article  Google Scholar 

  • Glas C.A.W., & Ellis, J.L. (1993).RSP, Rasch scaling program, computer program and user's manual. Groningen: ProGAMMA.

    Google Scholar 

  • Glas, C.A.W., & Verhelst, N.D. (1989). Extensions of the partial credit model.Psychometrika, 54, 635–659.

    Article  Google Scholar 

  • Glas, C.A.W., & Verhelst, N.D. (1995). Tests of fit for polytomous Rasch models. In G.H. Fischer & I.W. Molenaar (Eds.),Rasch models: Foundations, recent developments and applications (pp. 325–352). New York, NY: Springer.

    Google Scholar 

  • Glas, C.A.W., Wainer, H., & Bradlow, E.T. (2000). MML and EAP estimates for the testlet response model. In W.J. van der Linden & C.A.W. Glas (Eds.),Computer adaptive testing: Theory and practice (pp. 271–287). Boston MA: Kluwer-Nijhoff Publishing.

    Google Scholar 

  • Hoijtink, H., & Molenaar, I.W. (1997). A multidimensional item response model: Constrained latent class analysis using the Gibbs sampler and posterior predictive checks.Psychometrika, 62, 171–189.

    Google Scholar 

  • Holland, P.W., & Rosenbaum, P.R. (1986). Conditional association and uni-dimensionality in monotone latent variable models.Annals of Statistics, 14, 1523–1543.

    Google Scholar 

  • Junker, B. (1991). Essential independence and likelihood-based ability estimation for polytomous items.Psychometrika, 56, 255–278.

    Article  Google Scholar 

  • Kelderman, H. (1984). Loglinear RM tests.Psychometrika, 49, 223–245.

    Article  Google Scholar 

  • Kelderman, H. (1989). Item bias detection using loglinear IRT.Psychometrika, 54, 681–697.

    Article  Google Scholar 

  • Lawley, D.N. (1943). On problems connected with item selection and test construction.Proceedings of the Royal Society of Edinburgh, 61, 273–287.

    Google Scholar 

  • Lawley, D.N. (1944). The factorial analysis of multiple test items.Proceedings of the Royal Society of Edinburgh, Series A, 62, 74–82.

    Google Scholar 

  • Lord, F.M. (1952). A theory of test scores.Psychometric Monograph No. 7.

  • Lord, F.M. (1953a). An application of confidence intervals and of maximum likelihood to the estimation of an examinee's ability.Psychometrika, 18, 57–75.

    Article  Google Scholar 

  • Lord, F.M. (1953b). The relation of test score to the trait underlying the test.Educational and Psychological Measurement, 13, 517–548.

    Google Scholar 

  • Lord, F.M., & Novick, M.R. (1968).Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Lord, F.M., & Wingersky, M.S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”.Applied Psychological Measurement, 8, 453–461.

    Google Scholar 

  • Martin-Löf, P. (1973).Statistika Modeller [Statistical models] (Anteckningar från seminarier Lasåret 1969–1970, utardeltade av Rolf Sunberg. Obetydligt ändrat nytryck, oktober 1973). Stockholm: Institutet för Försäkringsmatematik och Matematisk Statistik vid Stockholms Universitet.

    Google Scholar 

  • Martin Löf, P. (1974). The notion of redundancy and its use as a quantitative measure if the discrepancy between a statistical hypothesis and a set of observational data.Scandinavian Journal of Statistics, 1, 3–18.

    Google Scholar 

  • McDonald, R.P. (1967). Nonlinear factor analysis.Psychometric Monograph No. 15.

  • McDonald, R.P. (1982). Linear versus nonlinear models in item response theory.Applied Psychological Measurement, 6, 379–396.

    Google Scholar 

  • McDonald, R.P. (1997). Normal-ogive multidimensional model. In W.J. van der Linden, & R.K. Hambleton (Eds.),Handbook of modern item response theory (pp. 257–269). New York, NY: Springer.

    Google Scholar 

  • Mellenbergh, G.J. (1994). Generalized linear item response theory.Psychological Bulletin, 115, 300–307.

    Article  Google Scholar 

  • Meng, X.L., & Schilling, S.G. (1996). Fitting full-information item factor models and an empirical investigation of bridge sampling.Journal of the American Statistical Association, 91, 1254–1267.

    Google Scholar 

  • Mislevy, R.J. (1986). Bayes modal estimation in item response models.Psychometrika, 51, 177–195.

    Article  Google Scholar 

  • Mislevy, R.J., & Bock, R.D. (1990).PC-BILOG. Item analysis and test scoring with binary logistic models. Chicago, IL: Scientific Software International.

    Google Scholar 

  • Mislevy, R.J., & Wu, P.K. (1996).Missing responses and IRT ability estimation: Omits, choice, time limits and adaptive testing (ETS Research Reports RR-96-30-ONR). Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Molenaar, I.W. (1995). Estimation of item parameters. In G.H. Fischer & I.W. Molenaar (Eds.),Rasch models: Foundations, recent developments and applications (pp. 39–51). New York, NY: Springer.

    Google Scholar 

  • Patz, R.J., & Junker, B.W. (1999a). A straightforward approach to Markov chain Monte Carlo methods for item response models.Journal of Educational and Behavioral Statistics, 24, 146–178.

    Google Scholar 

  • Patz, R.J., & Junker, B.W. (1999b). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses.Journal of Educational and Behavioral Statistics, 24, 342–366.

    Google Scholar 

  • Reckase, M.D. (1985). The difficulty of test items that measure more than one ability.Applied Psychological Measurement, 9, 401–412.

    Google Scholar 

  • Reckase, M.D. (1997). A linear logistic multidimensional model for dichotomous item response data. In W.J. van der Linden & R.K. Hambleton (Eds.),Handbook of modern item response theory (pp. 271–286). New York, NY: Springer.

    Google Scholar 

  • Rasch, G. (1977).On specific objectivity: An attempt at formalizing the request for generality and validity of scientific statements. In M. Blegvad (Ed.),The Danish yearbook of philosophy (pp. 58–94). Copenhagen: Munksgaard.

    Google Scholar 

  • Reiser, M. (1996). Analysis of residuals for the multinomial item response model.Psychometrika, 61, 509–528.

    Google Scholar 

  • Rosenbaum, P.R. (1984). Testing the conditional independence and monotonicity assumptions of item response theory.Psychometrika, 49, 425–436.

    Google Scholar 

  • Rubin, D.B. (1976). Inference and missing data.Biometrika, 63, 581–592.

    Google Scholar 

  • Shi, J.Q., & Lee, S.Y. (1998). Bayesian sampling based approach for factor analysis models with continuous and polytomous data.British Journal of Mathematical and Statistical Psychology, 51, 233–252.

    Google Scholar 

  • Sijtsma, K. (1998). Methodology review: Nonparametric IRT approaches to the analysis of dichotomous item scores.Applied Psychological Measurement, 22, 3–32.

    Google Scholar 

  • Stout, W.F. (1987). A nonparametric approach for assessing latent trait dimensionality.Psychometrika, 52, 589–617.

    Article  Google Scholar 

  • Stout, W.F. (1990). A new item response theory modeling approach with applications to unidimensional assessment and ability estimation.Psychometrika, 55, 293–326.

    Google Scholar 

  • Thurstone, L.L. (1947).Multiple factor analysis. Chicago, IL: University of Chicago Press.

    Google Scholar 

  • Wainer, H., Bradlow, E.T., & Du, Z. (2000). Testlet response theory: An analog for the 3pl model useful in testlet-based adaptive testing. In W.J. van der Linden & C.A.W. Glas (Eds.),Computerized adaptive testing: Theory and practice (pp. 245–269). Boston, MA: Kluwer Academic Publishers.

    Google Scholar 

  • Wilson, D.T., Wood, R., & Gibbons, R. (1991)TESTFACT: Test scoring, item statistics, and item factor analysis [Computer program]. Chicago, IL: Scientific Software International.

    Google Scholar 

  • Yen, W.M. (1981). Using simultaneous results to choose a latent trait model.Applied Psychological Measurement, 5, 245–262.

    Google Scholar 

  • Yen, W.M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model.Applied Psychological Measurement, 8, 125–145.

    Google Scholar 

  • Zimowski, M.F., Muraki, E., Mislevy, R.J., & Bock, R.D. (1996).Bilog MG: Multiple-group IRT analysis and test maintenance for binary items. Chicago, IL: Scientific Software International.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

The authors would like to thank Norman Verhelst for his valuable comments and ACT, CITO group and SweSAT for the use of their data.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Béguin, A.A., Glas, C.A.W. MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika 66, 541–561 (2001). https://doi.org/10.1007/BF02296195

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02296195

Key words

Navigation