Skip to main content
Top
Gepubliceerd in:

01-12-2016

Impact of IRT item misfit on score estimates and severity classifications: an examination of PROMIS depression and pain interference item banks

Auteur: Yue Zhao

Gepubliceerd in: Quality of Life Research | Uitgave 3/2017

Log in om toegang te krijgen
share
DELEN

Deel dit onderdeel of sectie (kopieer de link)

  • Optie A:
    Klik op de rechtermuisknop op de link en selecteer de optie “linkadres kopiëren”
  • Optie B:
    Deel de link per e-mail

Abstract

Purpose

In patient-reported outcome research that utilizes item response theory (IRT), using statistical significance tests to detect misfit is usually the focus of IRT model-data fit evaluations. However, such evaluations rarely address the impact/consequence of using misfitting items on the intended clinical applications. This study was designed to evaluate the impact of IRT item misfit on score estimates and severity classifications and to demonstrate a recommended process of model-fit evaluation.

Methods

Using secondary data sources collected from the Patient-Reported Outcome Measurement Information System (PROMIS) wave 1 testing phase, analyses were conducted based on PROMIS depression (28 items; 782 cases) and pain interference (41 items; 845 cases) item banks. The identification of misfitting items was assessed using Orlando and Thissen’s summed-score item-fit statistics and graphical displays. The impact of misfit was evaluated according to the agreement of both IRT-derived T-scores and severity classifications between inclusion and exclusion of misfitting items.

Results

The examination of the presence and impact of misfit suggested that item misfit had a negligible impact on the T-score estimates and severity classifications with the general population sample in the PROMIS depression and pain interference item banks, implying that the impact of item misfit was insignificant.

Conclusions

Findings support the T-score estimates in the two item banks as robust against item misfit at both the group and individual levels and add confidence to the use of T-scores for severity diagnosis in the studied sample. Recommendations on approaches for identifying item misfit (statistical significance) and assessing the misfit impact (practical significance) are given.
Voetnoten
1
The overall alpha level of .05 was adjusted with the total number of items in the respective PROMIS item bank. The adjusted alpha values from the smallest to largest ranged from .0018 (.05/28) to .05 for the PROMIS-DEP and ranged from .0012 (.05/41) to .05 for the PROMIS-PI.
 
Literatuur
1.
go back to reference Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage. Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
2.
go back to reference Cella, D., Riley, W., Stone, A., Rothrock, N., Reeve, B., Yount, S., Hays, R. D., on behalf of the PROMIS Cooperative Group. (2010). Initial item banks and first wave testing of the Patient-Reported Outcomes Measurement Information System (PROMIS) network: 2005–2008. Journal of Clinical Epidemiology, 63, 1179–1194. doi:10.1016/j.jclinepi.2010.04.011.CrossRef Cella, D., Riley, W., Stone, A., Rothrock, N., Reeve, B., Yount, S., Hays, R. D., on behalf of the PROMIS Cooperative Group. (2010). Initial item banks and first wave testing of the Patient-Reported Outcomes Measurement Information System (PROMIS) network: 2005–2008. Journal of Clinical Epidemiology, 63, 1179–1194. doi:10.​1016/​j.​jclinepi.​2010.​04.​011.CrossRef
3.
go back to reference Swaminathan, H., Hambleton, R. K., & Rogers, H. J. (2007). Assessing fit in item response models. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics: Psychometrics. London: Elsevier. Swaminathan, H., Hambleton, R. K., & Rogers, H. J. (2007). Assessing fit in item response models. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics: Psychometrics. London: Elsevier.
4.
go back to reference Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life items banks: Plans for the patient-reported outcome measurement information system (PROMIS). Medical Care, 45(5), S22–S31. doi:10.1097/01.mlr.0000250483.85507.04.CrossRefPubMed Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life items banks: Plans for the patient-reported outcome measurement information system (PROMIS). Medical Care, 45(5), S22–S31. doi:10.​1097/​01.​mlr.​0000250483.​85507.​04.CrossRefPubMed
5.
go back to reference Hambleton, R. K., & Han, N. (2005). Assessing the fit of IRT models to educational and psychological test data: A five step plan and several graphical displays. In W. R. Lenderking & D. Revicki (Eds.), Advances in health outcomes research methods, measurement, statistical analysis, and clinical applications (pp. 57–78). Washington: Degnon Associates. Hambleton, R. K., & Han, N. (2005). Assessing the fit of IRT models to educational and psychological test data: A five step plan and several graphical displays. In W. R. Lenderking & D. Revicki (Eds.), Advances in health outcomes research methods, measurement, statistical analysis, and clinical applications (pp. 57–78). Washington: Degnon Associates.
6.
go back to reference Box, G. E. P., & Draper, N. R. (1987). Empirical model building and response surfaces. New York, NY: Wiley. Box, G. E. P., & Draper, N. R. (1987). Empirical model building and response surfaces. New York, NY: Wiley.
7.
go back to reference Sinharay, S., & Haberman, S. J. (2014). How often is the misfit of item response theory models practically significant? Educational Measurement: Issues and Practice, 33(1), 23–35. doi:10.1111/emip.12024.CrossRef Sinharay, S., & Haberman, S. J. (2014). How often is the misfit of item response theory models practically significant? Educational Measurement: Issues and Practice, 33(1), 23–35. doi:10.​1111/​emip.​12024.CrossRef
8.
go back to reference Zhao, Y. (2008). Approaches for addressing the fit of item response theory models to educational test data. Dissertation Abstract International, 69, 12A. (UMI No. 3337019). Zhao, Y. (2008). Approaches for addressing the fit of item response theory models to educational test data. Dissertation Abstract International, 69, 12A. (UMI No. 3337019).
10.
go back to reference Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., & Cella, D. (2011). Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS®): Depression, anxiety, and anger. Assessment, 18(3), 263–283. doi:10.1177/1073191111411667.CrossRefPubMedPubMedCentral Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., & Cella, D. (2011). Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS®): Depression, anxiety, and anger. Assessment, 18(3), 263–283. doi:10.​1177/​1073191111411667​.CrossRefPubMedPubMedCentral
15.
go back to reference Cleeland, C. S., Gonin, R., Hatfield, A. K., Edmonson, J. H., Blum, R. H., Stewart, J. A., et al. (1994). Pain and its treatment in outpatients with metastatic cancer. New England Journal of Medicine, 330(9), 592–596. doi:10.1056/NEJM199403033300902.CrossRefPubMed Cleeland, C. S., Gonin, R., Hatfield, A. K., Edmonson, J. H., Blum, R. H., Stewart, J. A., et al. (1994). Pain and its treatment in outpatients with metastatic cancer. New England Journal of Medicine, 330(9), 592–596. doi:10.​1056/​NEJM199403033300​902.CrossRefPubMed
16.
17.
go back to reference Muthén, L. K., & Muthén, B. O. (2006). Mplus [Computer software]. Los Angeles, CA: Muthén & Muthén. Muthén, L. K., & Muthén, B. O. (2006). Mplus [Computer software]. Los Angeles, CA: Muthén & Muthén.
20.
go back to reference Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. doi:10.1080/10705519909540118.CrossRef Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. doi:10.​1080/​1070551990954011​8.CrossRef
21.
23.
go back to reference Cai, L., Thissen, D., & du Toit, S. (2015). IRTPRO [Computer software]. Lincolnwood, IL: Scientific Software International. Cai, L., Thissen, D., & du Toit, S. (2015). IRTPRO [Computer software]. Lincolnwood, IL: Scientific Software International.
24.
go back to reference Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289–300. doi:10.2307/2346101. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289–300. doi:10.​2307/​2346101.
26.
go back to reference Thissen, D., Chen, W.-H., & Bock, R. D. (2003). Multilog 7.03 [Computer software]. Lincolnwood, IL: Scientific Software International. Thissen, D., Chen, W.-H., & Bock, R. D. (2003). Multilog 7.03 [Computer software]. Lincolnwood, IL: Scientific Software International.
28.
go back to reference Kim, S., & Kolen, M. J. (2004). STUIRT: A computer program for scale transformation under unidimensional item response theory models (Version 1.0). Iowa Testing Programs, University of Iowa. Kim, S., & Kolen, M. J. (2004). STUIRT: A computer program for scale transformation under unidimensional item response theory models (Version 1.0). Iowa Testing Programs, University of Iowa.
29.
go back to reference Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
31.
go back to reference Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S-X2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27(4), 289–298. doi:10.1177/0146621603027004004.CrossRef Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S-X2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27(4), 289–298. doi:10.​1177/​0146621603027004​004.CrossRef
35.
go back to reference Choi, S. W., Reise, S. P., Pilkonis, P. A., Hays, R. D., & Cella, D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research : An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 19(1), 125–136. doi:10.1007/s11136-009-9560-5.CrossRef Choi, S. W., Reise, S. P., Pilkonis, P. A., Hays, R. D., & Cella, D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research : An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 19(1), 125–136. doi:10.​1007/​s11136-009-9560-5.CrossRef
Metagegevens
Titel
Impact of IRT item misfit on score estimates and severity classifications: an examination of PROMIS depression and pain interference item banks
Auteur
Yue Zhao
Publicatiedatum
01-12-2016
Uitgeverij
Springer International Publishing
Gepubliceerd in
Quality of Life Research / Uitgave 3/2017
Print ISSN: 0962-9343
Elektronisch ISSN: 1573-2649
DOI
https://doi.org/10.1007/s11136-016-1467-3

Andere artikelen Uitgave 3/2017

Quality of Life Research 3/2017 Naar de uitgave