A general framework and an R package for the detection of dichotomous differential item functioning

Magis, David; Béland, Sébastien; Tuerlinckx, Francis; De Boeck, Paul

doi:10.3758/BRM.42.3.847

A general framework and an R package for the detection of dichotomous differential item functioning

Published: August 2010

Volume 42, pages 847–862, (2010)
Cite this article

Download PDF

Behavior Research Methods Aims and scope Submit manuscript

A general framework and an R package for the detection of dichotomous differential item functioning

Download PDF

David Magis^1,4,
Sébastien Béland²,
Francis Tuerlinckx¹ &
…
Paul De Boeck^1,3

4699 Accesses
167 Citations
3 Altmetric
Explore all metrics

Abstract

Differential item functioning (DIF) is an important issue of interest in psychometrics and educational measurement. Several methods have been proposed in recent decades for identifying items that function differently between two or more groups of examinees. Starting from a framework for classifying DIF detection methods and from a comparative overview of the most traditional methods, an R package for nine methods, called difR, is presented. The commands and options are briefly described, and the package is illustrated through the analysis of a data set on verbal aggression.

Article PDF

Best Practices in Detecting Bias in Cognitive Tests

Lord–Wingersky Algorithm Version 2.0 for Hierarchical Item Factor Models with Applications in Test Scoring, Scale Alignment, and Model Fit Testing

Article 19 September 2014

A Comparison of Algorithms for Dimensionality Analysis

References

Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67–91.
Article Google Scholar
Agresti, A. (1990). Categorical data analysis. New York: Wiley.
Google Scholar
Aguerri, M. E., Galibert, M. S., Attorresi, H. F., & Marañón, P. P. (2009). Erroneous detection of nonuniform DIF using the Breslow— Day test in a short test. Quality & Quantity, 43, 35–44.
Article Google Scholar
Angoff, W. H., & Ford, S. F. (1973). Item—race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10, 95–106.
Article Google Scholar
Bates, D., & Maechler, M. (2009). lme4: Linear mixed-effects models using S4 classes. R package Version 0.999375-32. Available from https://r-forge.r-project.org/R/?group_id=60.
Berk, R. A. (1982). Handbook of methods for detecting test bias. Baltimore: Johns Hopkins University Press.
Google Scholar
Breslow, N. E., & Day, N. E. (1980). Statistical methods in cancer research: Vol. 1. The analysis of case—control studies (Scientific Publication No. 32). Lyon, France: International Agency for Research on Cancer.
Google Scholar
Breslow, N. E., & Liang, K. Y. (1982). The variance of the Mantel— Haenszel estimator. Biometrics, 38, 943–952.
Article Google Scholar
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.
Google Scholar
Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253–260.
Article Google Scholar
Cardall, C., & Coffman, W. E. (1964). A method for comparing the performance of different groups on the items in a test (Research Bulletin 64–61). Princeton, NJ: Educational Testing Service.
Google Scholar
Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement, 17, 31–44.
Google Scholar
Clauser, B. E., Mazor, K. M., & Hambleton, R. K. (1993). The effects of purification of matching criterion on the identification of DIF using the Mantel-Haenszel procedure. Applied Measurement in Education, 6, 269–279.
Article Google Scholar
Cleary, T. A., & Hilton, T. L. (1968). An investigation of item bias. Educational & Psychological Measurement, 28, 61–75.
Article Google Scholar
Cook, L. L., & Eignor, D. R. (1991). NCME instructional module: IRT equating methods. Educational Measurement, 10, 37–45.
Google Scholar
De Boeck, P., & Wilson, M. (Eds.) (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer.
Google Scholar
Dorans, N. J. (1989). Two new approaches to assessing differential item functioning. Standardization and the Mantel-Haenszel method. Applied Measurement in Education, 2, 217–233.
Article Google Scholar
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35–66). Hillsdale, NJ: Erlbaum.
Google Scholar
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23, 355–368.
Article Google Scholar
Dorans, N. J., Schmitt, A. P., & Bleistein, C. A. (1992). The standardization approach to assessing comprehensive differential item functioning. Journal of Educational Measurement, 29, 309–319.
Article Google Scholar
Fidalgo, Á. M., Mellenbergh, G. J., & Muñiz, J. (2000). Effects of amount of DIF, test length, and purification type on robustness and power of Mantel-Haenszel procedures. Methods of Psychological Research, 5, 43–53.
Google Scholar
Finch, W. H., & French, B. F. (2007). Detection of crossing differential item functioning: A comparison of four methods. Educational & Psychological Measurement, 67, 565–582.
Article Google Scholar
Hanson, B. A. (1998). Uniform DIF and DIF defined by differences in item response functions. Journal of Educational & Behavioral Statistics, 23, 244–253.
Google Scholar
Hauck, W. W. (1979). The large sample variance of the Mantel-Haenszel estimator of a common odds ratio. Biometrics, 35, 817–819.
Article Google Scholar
Holland, P. W., & Thayer, D. T. (1985). An alternate definition of the ETS delta scale of item difficulty (Research Report RR-85-43). Princeton, NJ: Educational Testing Service.
Google Scholar
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Erlbaum.
Google Scholar
Ironson, G. H., & Subkoviak, M. J. (1979). A comparison of several methods of assessing item bias. Journal of Educational Measurement, 16, 209–225.
Article Google Scholar
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating Type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329–349.
Article Google Scholar
Kim, S.-H., & Cohen, A. S. (1992). IRTDIF: A computer program for IRT differential item functioning analysis. Applied Psychological Measurement, 16, 158.
Article Google Scholar
Kim, S.-H., Cohen, A. S., & Park, T.-H. (1995). Detection of differential item functioning in multiple groups. Journal of Educational Measurement, 32, 261–276.
Article Google Scholar
Lautenschlager, G. J., & Park, D.-G. (1988). IRT item bias detection procedures: Issues of model misspecification, robustness, and parameter linking. Applied Psychological Measurement, 12, 365–376.
Article Google Scholar
Li, H.-H., & Stout, W. (1994). SIBTEST: A FORTRAN-V Program for Computing the Simultaneous Item Bias DIF Statistics [Computer program]. Urbana-Champaign, IL: University of Illinois, Department of Statistics.
Google Scholar
Li, H.-H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61, 647–677.
Article Google Scholar
Lord, F. M. (1976). A study of item bias, using item characteristic curve theory. Princeton, NJ: Educational Testing Service.
Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Google Scholar
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719–748.
PubMed Google Scholar
Mazor, K. M., Clauser, B. E., & Hambleton, R. K. (1994). Identification of nonuniform differential item functioning using a variation of the Mantel-Haenszel procedure. Educational & Psychological Measurement, 54, 284–291.
Article Google Scholar
Miller, R. G., Jr. (1981). Simultaneous statistical inference (2nd ed.). New York: Springer.
Google Scholar
Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334.
Article Google Scholar
Mislevy, R. J., & Bock, R. D. (1984). BILOG: Item analysis and test scoring with binary logistic models [Computer program]. Mooresville, IN: Scientific Software.
Google Scholar
Mislevy, R. J., & Stocking, M. L. (1989). A consumer’s guide to LOGIST and BILOG. Applied Psychological Measurement, 13, 57–75.
Article Google Scholar
Nagelkerke, N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78, 691–692.
Article Google Scholar
Narayanan, P., & Swaminathan, H. (1996). Identification of items that show nonuniform DIF. Applied Psychological Measurement, 20, 257–274.
Article Google Scholar
Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd ed.). Thousand Oaks, CA: Sage.
Google Scholar
Penfield, R. D. (2001). Assessing differential item functioning among multiple groups: A comparison of three Mantel-Haenszel procedures. Applied Measurement in Education, 14, 235–259.
Article Google Scholar
Penfield, R. D. (2003). Applying the Breslow-Day test of trend in odds ratio heterogeneity to the analysis of nonuniform DIF. Alberta Journal of Educational Research, 49, 231–243.
Google Scholar
Penfield, R. D. (2005). DIFAS: Differential item functioning analysis system. Applied Psychological Measurement, 29, 150–151.
Article Google Scholar
Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics: Vol. 26. Psychometrics (pp. 125–167). Amsterdam: Elsevier.
Google Scholar
Philips, A., & Holland, P. W. (1987). Estimators of the variance of the Mantel-Haenszel log-odds-ratio estimate. Biometrics, 43, 425–431.
Article Google Scholar
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53, 495–502.
Article Google Scholar
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14, 197–207.
Article Google Scholar
Raju, N. S. (1995). DFITPU: A FORTRAN program for calculating DIF/ DTF [Computer program]. Atlanta: Georgia Institute of Technology. R Development Core Team (2008). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
Google Scholar
Rizopoulos, D. (2006). ltm: An R package for latent variable modeling and item response analysis. Journal of Statistical Software, 17, 1–25.
Google Scholar
Robins, J., Breslow, N., & Greenland, S. (1986). Estimators of the Mantel-Haenszel variance consistent in both sparse data and largestrata limiting models. Biometrics, 42, 311–323.
Article PubMed Google Scholar
Rogers, H. J., Swaminathan, H., & Hambleton, R. K. (1993). DICHODIF: A FORTRAN program for DIF analysis of dichotomously scored item response data [Computer program]. Amherst, MA: University of Massachusetts.
Google Scholar
Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type I error performance. Journal of Educational Measurement, 33, 215–230.
Article Google Scholar
Rudner, L. M., Getson, P. R., & Knight, D. L. (1980). A Monte Carlo comparison of seven biased item detection techniques. Journal of Educational Measurement, 17, 1–10.
Article Google Scholar
Scheuneman, J. (1979). A method of assessing bias in test items. Journal of Educational Measurement, 16, 143–152.
Article Google Scholar
Shealy, R., & Stout, W. [F.] (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DIF as well as item bias/DIF. Psychometrika, 58, 159–194.
Article Google Scholar
Shepard, L. [A.], Camilli, G., & Averill, M. (1981). Comparison of procedures for detecting test-item bias with both internal and external ability criteria. Journal of Educational & Behavioral Statistics, 6, 317–375.
Article Google Scholar
Smits, D. J. M., De Boeck, P., & Vansteelandt, K. (2004). The inhibition of verbally aggressive behaviour. European Journal of Personality, 18, 537–555.
Article Google Scholar
Soares, T. M., Gonçalves, F. B., & Gamerman, D. (2009). An integrated Bayesian model for DIF analysis. Journal of Educational & Behavioral Statistics, 34, 348–377.
Article Google Scholar
Somes, G. W. (1986). The generalized Mantel-Haenszel statistic. American Statistician, 40, 106–108.
Article Google Scholar
Spielberger, C. D. (1988). State-Trait Anger Expression Inventory research edition: Professional manual. Odessa, FL: Psychological Assessment Resources.
Google Scholar
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.
Article Google Scholar
Thissen, D. (2001). IRTLRDIF v.2.0b: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning [Computer software]. Chapel Hill: University of North Carolina, L. L. Thurstone Psychometric Laboratory.
Google Scholar
Thissen, D., Chen, W.-H., & Bock, R. D. (2003). MULTILOG 7 for Windows: Multiple-category item analysis and test scoring using item response theory [Computer software]. Lincolnwood, IL: Scientific Software International, Inc.
Google Scholar
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group difference in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147–170). Hillsdale, NJ: Erlbaum.
Google Scholar
Vansteelandt, K. (2000). Formal models for contextualized personality psychology. Unpublished doctoral dissertation, K.U. Leuven, Belgium.
Google Scholar
Wang, W.-C., & Su, Y.-H. (2004). Effects of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17, 113–144.
Article Google Scholar
Wang, W.-C., & Yeh, Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479–498.
Article Google Scholar
Zumbo, B. D., & Thomas, D. R. (1997). A measure of effect size for a model-based approach for studying DIF. Prince George, Canada: University of Northern British Columbia, Edgeworth Laboratory for Quantitative Behavioral Science.
Google Scholar

Download references

Author information

Authors and Affiliations

Katholieke Universiteit Leuven, Leuven, Belgium
David Magis, Francis Tuerlinckx & Paul De Boeck
University of Quebec, Montreal, Quebec, Canada
Sébastien Béland
University of Amsterdam, Amsterdam, The Netherlands
Paul De Boeck
Department of Mathematics, University of Liège, Grande Traverse 12, B-4000, Liège, Belgium
David Magis

Authors

David Magis
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Béland
View author publications
You can also search for this author in PubMed Google Scholar
Francis Tuerlinckx
View author publications
You can also search for this author in PubMed Google Scholar
Paul De Boeck
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Magis.

Additional information

This research was financially supported by the Belgian Federal Science Policy (Funds IAP/P6/03), the Research Fund GOA/2005/04 of the K.U. Leuven, Belgium, a doctoral grant “Bourse à la mobilité (hors Québec) pour l’intégration à la communauté scientifique en éducation” of the UQAM, Canada, and a postdoctoral grant “Chargé de recherches” of the National Funds for Scientific Research (FNRS), Belgium.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Magis, D., Béland, S., Tuerlinckx, F. et al. A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods 42, 847–862 (2010). https://doi.org/10.3758/BRM.42.3.847

Download citation

Received: 25 September 2009
Accepted: 14 March 2010
Issue Date: August 2010
DOI: https://doi.org/10.3758/BRM.42.3.847

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A general framework and an R package for the detection of dichotomous differential item functioning

Abstract

Article PDF

Similar content being viewed by others

Best Practices in Detecting Bias in Cognitive Tests

Lord–Wingersky Algorithm Version 2.0 for Hierarchical Item Factor Models with Applications in Test Scoring, Scale Alignment, and Model Fit Testing

A Comparison of Algorithms for Dimensionality Analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A general framework and an R package for the detection of dichotomous differential item functioning

Abstract

Article PDF

Similar content being viewed by others

Best Practices in Detecting Bias in Cognitive Tests

Lord–Wingersky Algorithm Version 2.0 for Hierarchical Item Factor Models with Applications in Test Scoring, Scale Alignment, and Model Fit Testing

A Comparison of Algorithms for Dimensionality Analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation