Top

Quality of Life Research

Gepubliceerd in:

01-12-2016

Impact of IRT item misfit on score estimates and severity classifications: an examination of PROMIS depression and pain interference item banks

Auteur: Yue Zhao

Gepubliceerd in: Quality of Life Research | Uitgave 3/2017

Abstract

Purpose

In patient-reported outcome research that utilizes item response theory (IRT), using statistical significance tests to detect misfit is usually the focus of IRT model-data fit evaluations. However, such evaluations rarely address the impact/consequence of using misfitting items on the intended clinical applications. This study was designed to evaluate the impact of IRT item misfit on score estimates and severity classifications and to demonstrate a recommended process of model-fit evaluation.

Methods

Using secondary data sources collected from the Patient-Reported Outcome Measurement Information System (PROMIS) wave 1 testing phase, analyses were conducted based on PROMIS depression (28 items; 782 cases) and pain interference (41 items; 845 cases) item banks. The identification of misfitting items was assessed using Orlando and Thissen’s summed-score item-fit statistics and graphical displays. The impact of misfit was evaluated according to the agreement of both IRT-derived T-scores and severity classifications between inclusion and exclusion of misfitting items.

Results

The examination of the presence and impact of misfit suggested that item misfit had a negligible impact on the T-score estimates and severity classifications with the general population sample in the PROMIS depression and pain interference item banks, implying that the impact of item misfit was insignificant.

Conclusions

Findings support the T-score estimates in the two item banks as robust against item misfit at both the group and individual levels and add confidence to the use of T-scores for severity diagnosis in the studied sample. Recommendations on approaches for identifying item misfit (statistical significance) and assessing the misfit impact (practical significance) are given.

vorige artikel Appraisal assessment in patient-reported outcome research: methods for uncovering the personal context and meaning of quality of life

volgende artikel Establishing clinical meaning and defining important differences for Patient-Reported Outcomes Measurement Information System (PROMIS®) measures in juvenile idiopathic arthritis using standard setting with patients, parents, and providers

The overall alpha level of .05 was adjusted with the total number of items in the respective PROMIS item bank. The adjusted alpha values from the smallest to largest ranged from .0018 (.05/28) to .05 for the PROMIS-DEP and ranged from .0012 (.05/41) to .05 for the PROMIS-PI.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.

Cella, D., Riley, W., Stone, A., Rothrock, N., Reeve, B., Yount, S., Hays, R. D., on behalf of the PROMIS Cooperative Group. (2010). Initial item banks and first wave testing of the Patient-Reported Outcomes Measurement Information System (PROMIS) network: 2005–2008. Journal of Clinical Epidemiology, 63, 1179–1194. doi:10.1016/j.jclinepi.2010.04.011.CrossRef

Swaminathan, H., Hambleton, R. K., & Rogers, H. J. (2007). Assessing fit in item response models. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics: Psychometrics. London: Elsevier.

Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life items banks: Plans for the patient-reported outcome measurement information system (PROMIS). Medical Care, 45(5), S22–S31. doi:10.1097/01.mlr.0000250483.85507.04.CrossRefPubMed

Hambleton, R. K., & Han, N. (2005). Assessing the fit of IRT models to educational and psychological test data: A five step plan and several graphical displays. In W. R. Lenderking & D. Revicki (Eds.), Advances in health outcomes research methods, measurement, statistical analysis, and clinical applications (pp. 57–78). Washington: Degnon Associates.

Box, G. E. P., & Draper, N. R. (1987). Empirical model building and response surfaces. New York, NY: Wiley.

Sinharay, S., & Haberman, S. J. (2014). How often is the misfit of item response theory models practically significant? Educational Measurement: Issues and Practice, 33(1), 23–35. doi:10.1111/emip.12024.CrossRef

Zhao, Y. (2008). Approaches for addressing the fit of item response theory models to educational test data. Dissertation Abstract International, 69, 12A. (UMI No. 3337019).

Cella, D., Yount, S., Rothrock, N., Gershon, R., Cook, K., Reeve, B., et al. (2007). The Patient-Reported Outcomes Measurement Information System (PROMIS): Progress of an NIH roadmap cooperative group during its first two years. Medical Care, 45(5 Suppl 1), S3–S11. doi:10.1097/01.mlr.0000258615.42478.55.CrossRefPubMedPubMedCentral

10.

Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., & Cella, D. (2011). Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS^®): Depression, anxiety, and anger. Assessment, 18(3), 263–283. doi:10.1177/1073191111411667.CrossRefPubMedPubMedCentral

11.

Amtmann, D. A., Cook, K. F., Jensen, M. P., Chen, W.-H., Choi, S. W., Revicki, D., et al. (2010). Development of a PROMIS item bank to measure pain interference. Pain, 150(1), 173–182. doi:10.1016/j.pain.2010.04.025.CrossRefPubMedPubMedCentral

12.

Liu, H., Cella, D., Gershon, R., Shen, J., Morales, L. S., Riley, W., et al. (2010). Representativeness of the patient-reported outcomes measurement information system internet panel. Journal of Clinical Epidemiology, 63(11), 1169–1178. doi:10.1016/j.jclinepi.2009.11.021.CrossRefPubMedPubMedCentral

13.

DeWalt, D. A., Rothrock, N., Yount, S., & Stone, A. A. (2007). Evaluation of item candidates: The PROMIS qualitative item review. Medical Care, 45(5 Suppl 1), S12–S21. doi:10.1097/01.mlr.0000254567.79743.e2.CrossRefPubMedPubMedCentral

14.

Radloff, L. S. (1977). The CES-D scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, 1(3), 385–401. doi:10.1177/014662167700100306.CrossRef

15.

Cleeland, C. S., Gonin, R., Hatfield, A. K., Edmonson, J. H., Blum, R. H., Stewart, J. A., et al. (1994). Pain and its treatment in outpatients with metastatic cancer. New England Journal of Medicine, 330(9), 592–596. doi:10.1056/NEJM199403033300902.CrossRefPubMed

16.

Cella, D., Choi, S., Garcia, S., Cook, K. F., Rosenbloom, S., Lai, J.-S., et al. (2014). Setting standards for severity of common symptoms in oncology using the PROMIS item banks and expert judgment. Quality of Life Research, 23(10), 2651–2661. doi:10.1007/s11136-014-0732-6.CrossRefPubMedPubMedCentral

17.

Muthén, L. K., & Muthén, B. O. (2006). Mplus [Computer software]. Los Angeles, CA: Muthén & Muthén.

18.

Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88(3), 588. doi:10.1037/0033-2909.88.3.588.CrossRef

19.

Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. Sage Focus Editions, 154, 136. doi:10.1177/0049124192021002005.

20.

Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. doi:10.1080/10705519909540118.CrossRef

21.

Lance, C. E., Butts, M. M., & Michels, L. C. (2006). The sources of four commonly reported cutoff criteria what did they really say? Organizational Research Methods, 9(2), 202–220. doi:10.1177/1094428105284919.CrossRef

22.

Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50–64. doi:10.1177/01466216000241003.CrossRef

23.

Cai, L., Thissen, D., & du Toit, S. (2015). IRTPRO [Computer software]. Lincolnwood, IL: Scientific Software International.

24.

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289–300. doi:10.2307/2346101.

25.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Chicago: Psychometric Society. doi:10.1002/j.2333-8504.1968.tb00153.x.

26.

Thissen, D., Chen, W.-H., & Bock, R. D. (2003). Multilog 7.03 [Computer software]. Lincolnwood, IL: Scientific Software International.

27.

Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201–210. doi:10.1177/014662168300700208.CrossRef

28.

Kim, S., & Kolen, M. J. (2004). STUIRT: A computer program for scale transformation under unidimensional item response theory models (Version 1.0). Iowa Testing Programs, University of Iowa.

29.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

30.

Yost, K. J., Eton, D. T., Garcia, S. F., & Cella, D. (2011). Minimally important differences were estimated for six PROMIS-Cancer scales in advanced-stage cancer patients. Journal of Clinical Epidemiology, 64(5), 507–516. doi:10.1016/j.jclinepi.2010.11.018.CrossRefPubMedPubMedCentral

31.

Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S-X²: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27(4), 289–298. doi:10.1177/0146621603027004004.CrossRef

32.

Kang, T., & Chen, T. (2011). Performance of the generalized S–X² item fit index for the graded response model. Asia Pacific Education Review, 12(1), 89–96. doi:10.1007/s12564-010-9082-4.CrossRef

33.

Kang, T., & Chen, T. (2008). Performance of the generalized S–X² item fit index for polytomous IRT models. Journal of Educational Measurement, 45(4), 391–406. doi:10.1111/j.1745-3984.2008.00071.x.CrossRef

34.

Smits, N. (2016). On the effect of adding clinical samples to validation studies of patient-reported outcome item banks: A simulation study. Quality of Life Research, 25(7), 1635–1644. doi:10.1007/s11136-015-1199-9.CrossRefPubMed

35.

Choi, S. W., Reise, S. P., Pilkonis, P. A., Hays, R. D., & Cella, D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research : An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 19(1), 125–136. doi:10.1007/s11136-009-9560-5.CrossRef

Titel: Impact of IRT item misfit on score estimates and severity classifications: an examination of PROMIS depression and pain interference item banks
Auteur: Yue Zhao
Publicatiedatum: 01-12-2016
Uitgeverij: Springer International Publishing
Gepubliceerd in: Quality of Life Research / Uitgave 3/2017
Print ISSN: 0962-9343
Elektronisch ISSN: 1573-2649
DOI: https://doi.org/10.1007/s11136-016-1467-3

Bohn Stafleu van Loghum

Deel dit onderdeel of sectie (kopieer de link)

Abstract

Purpose

Methods

Results

Conclusions

Log in om toegang te krijgen

Andere artikelen Uitgave 3/2017

Functioning in patients with schizophrenia: a systematic review of the literature using the International Classification of Functioning, Disability and Health (ICF) as a reference

Are the EQ-5D-3L and the ICECAP-O responsive among older adults with impaired mobility? Evidence from the Vancouver Falls Prevention Cohort Study

Appraisal assessment in patient-reported outcome research: methods for uncovering the personal context and meaning of quality of life

Quality of life with rivaroxaban in patients with non-valvular atrial fibrilation by therapeutic compliance

Art therapy based on appreciation of famous paintings and its effect on distress among cancer patients

A comparison of children and adolescent’s self-report and parental report of the PedsQL among those with and without autism spectrum disorder