An Empirical Power Analysis of Quasi-Exact Tests for the Rasch Model
Measurement Invariance in Small Samples
Abstract
Measurement invariance is not only an important requirement of tests but also a central point in the examination of the Rasch model. Ponocny (2001) suggested quasi-exact tests for small samples which allow for formulating test-statistics based on matrices obtained using Monte Carlo methods. The purpose of the present study was to analyze the type-I error rates and the empirical power of two test-statistics for the assumption of measurement invariance in comparison with Andersen’s likelihood ratio test (1973). Each simulation was based on 10,000 replications and was a function of sample size (n = 30, 50, 100, 200), test length (k = 5, 9, 17), varying number of items exhibiting model violation, magnitude of violation, and different ability distributions. The results indicate that it is possible to detect large model violations on item level with samples of n = 50 or n = 100, and even weak violations with n = 200. Additionally, the results showed that it is possible to investigate very small samples where a parametric approach is not possible, which is one of the most important advantages of quasi-exact tests.
References
2002 ). Die Teststärke des Likelihood-Quotienten-Tests nach Andersen bei der Überprüfung der Modellgültigkeit des dichotomen logistischen Modells nach Rasch[The power of Andersen-Likelihood-ratio-test for the examination of the dichotomous logistic model according Rasch] . (Unpublished doctoral thesis). University of Vienna, Austria.1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140.
(1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144–152.
(2005). Exact tests for the Rasch model via sequential importance sampling. Psychometrika, 70, 11–30.
(2009). The theory and practice of item response theory. New York, NY: The Guilford Press.
(2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
(2007). A synthesis of 15 years of research on DIF in language testing: Methodological advances, challenges, and recommendations. Language Assessment Quarterly, 4, 113–148.
(1974). Einführung in die Theorie psychologischer Tests: Grundlagen und Anwendungen
([Introduction to the theory of psychological tests: Basic principles and applications] . Bern, Switzerland: Huber.1981). On the existence and uniqueness of maximum-likelihood estimates in the Rasch model. Psychometrika, 46, 59–77.
(1995a). Derivations of the Rasch Model. In , Rasch Models: Foundations, recent developments, and applications (pp. 15–38). New York, NY: Springer.
(1995b). Some neglected problems in IRT. Psychometrika, 60, 459–487.
(1995). Rasch Models: Foundations, recent developments, and applications. New York, NY: Springer.
(1998). Detection of differential item functioning using Lagrange multiplier tests. Statistica Sinica, 8, 647–667.
(1995). Testing the Rasch model. In , Rasch Models: Foundations, recent developments, and applications (pp. 4–14). New York, NY: Springer.
(2012). The effects of purification and the evaluation of differential item functioning with the likelihood ratio test. Methodology, 8, 134–145.
(1989). Detecting potentially biased test items: Comparison of IRT area and Mantel-Haenszel methods. Applied Measurement in Education, 2, 313–334.
(1988). Differential item functioning and the Mantel-Haenszel procedure. In , Test validity (pp. 129–145). Hillsdale, NJ: Erlbaum.
(1993). Differential item functioning. Hillsdale, NJ: Erlbaum.
(1989). Item bias detection using loglinear IRT. Psychometrika, 54, 681–697.
(2012). Testing measurement invariance using MIMIC: Likelihood ratio test with a critical value adjustment. Educational and Psychological Measurement, 72, 469–492.
(2012). Das Rasch Modell in der Praxis: Eine Einführung mit eRm
([The Rasch model in practical applications: An introduction with eRm] . Wien: facultas.wuv, UTB.2013). Nonparametric tests for the Rasch model: Explanation, development, and application of quasi-exact tests for small samples. Interstat, 11, 1–16.
(2007). Probleme bei der Testkonstruktion nach dem Rasch-Modell
([Some problems in calibrating an item pool according to the Rasch model] . Diagnostica, 53, 131–143.1986). Testing statistical hypotheses. New York, NY: Springer.
(2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847–862.
(2014). Supplement to Koller, Maier, & Hatzinger: “An Empirical Power Analysis of Quasi-Exact Tests for the Rasch Model: Measurement Invariance in Small Samples”. Research Report Series/Department of Statistics and Mathematics, 127. WU Vienna University of Economics and Business, Vienna. Retrieved from: epub.wu.ac.at/4340/
(2007a). CML based estimation of extended Rasch models with the eRm package in R. Psychology Science, 49, 26–43.
(2007b). Extended Rasch modeling. The eRm package for the application of IRT models in R. Journal of Statistical Software, 20, 1–20. Retrieved from www.jstatsoft.org
(2013). eRm: Extended Rasch Modeling [Computer software]. R package version 0.15-3. Vienna, Austria R Foundation Retrieved from CRAN.R-project.org/package=eRm
(1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105–118.
(1995). Some background for item response theory and the Rasch model. In , Rasch models: Foundations, recent developments, and applications (pp. 3–14). New York, NY: Springer.
(2010). Modeling DIF effects using distractor-level invariance effects: Implications for understanding the causes of DIF. Applied Psychological Measurement, 34, 151–165.
(2008). Methods for assessing item, step, and threshold invariance in polytomous items following the Partial Credit Model. Applied Psychological Measurement, 68, 717–733.
(1996 ). Kombinatorische Modelltests für das Rasch-Modell[Combinatorial goodness-of-fit tests for the Rasch model] . (Unpublished doctoral thesis). University of Vienna, Austria.2001). Nonparametric goodness-of-fit tests for the Rasch model. Psychometrika, 66, 437–460.
(2013). R: A language and environment for statistical computing [Computer software]. Vienna, Austria: R Foundation for Statistical Computing Retrieved from www.R-project.org/
. (1960). Probabilistic models for some intelligence and attainment tests. Kopenhagen, Denmark: Danish Institute for Educational Research.
(1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353–368.
(1991). Enumeration and simulation for 0–1 matrices with given marginal. Psychometrika, 56, 397–417.
(2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1292–1306.
(2008). An efficient MCMC algorithm to sample binary matrices with fixed marginals. Psychometrika, 74, 705–728.
(2007). The Rasch sampler. Journal of Statistical Software. May 20, Retrieved from www.jstatsoft.org
(