Skip to main content
Top
Gepubliceerd in: Quality of Life Research 1/2007

01-08-2007 | Original Paper

IRT health outcomes data analysis project: an overview and summary

Auteurs: Karon F. Cook, Cayla R. Teal, Jakob B. Bjorner, David Cella, Chih-Hung Chang, Paul K. Crane, Laura E. Gibbons, Ron D. Hays, Colleen A. McHorney, Katja Ocepek-Welikson, Anastasia E. Raczek, Jeanne A. Teresi, Bryce B. Reeve

Gepubliceerd in: Quality of Life Research | bijlage 1/2007

Log in om toegang te krijgen
share
DELEN

Deel dit onderdeel of sectie (kopieer de link)

  • Optie A:
    Klik op de rechtermuisknop op de link en selecteer de optie “linkadres kopiëren”
  • Optie B:
    Deel de link per e-mail

Abstract

Background

In June 2004, the National Cancer Institute and the Drug Information Association co-sponsored the conference, “Improving the Measurement of Health Outcomes through the Applications of Item Response Theory (IRT) Modeling: Exploration of Item Banks and Computer-Adaptive Assessment.” A component of the conference was presentation of a psychometric and content analysis of a secondary dataset.

Objectives

A thorough psychometric and content analysis was conducted of two primary domains within a cancer health-related quality of life (HRQOL) dataset.

Research design

HRQOL scales were evaluated using factor analysis for categorical data, IRT modeling, and differential item functioning analyses. In addition, computerized adaptive administration of HRQOL item banks was simulated, and various IRT models were applied and compared.

Subjects

The original data were collected as part of the NCI-funded Quality of Life Evaluation in Oncology (Q-Score) Project. A total of 1,714 patients with cancer or HIV/AIDS were recruited from 5 clinical sites.

Measures

Items from 4 HRQOL instruments were evaluated: Cancer Rehabilitation Evaluation System–Short Form, European Organization for Research and Treatment of Cancer Quality of Life Questionnaire, Functional Assessment of Cancer Therapy and Medical Outcomes Study Short-Form Health Survey.

Results and conclusions

Four lessons learned from the project are discussed: the importance of good developmental item banks, the ambiguity of model fit results, the limits of our knowledge regarding the practical implications of model misfit, and the importance in the measurement of HRQOL of construct definition. With respect to these lessons, areas for future research are suggested. The feasibility of developing item banks for broad definitions of health is discussed.
Bijlagen
Alleen toegankelijk voor geautoriseerde gebruikers
Literatuur
1.
go back to reference Chang, C.-H., & Cella, D. (1997). Equating health-related quality of life instruments in applied oncology settings. Physical Medicine and Rehabilitation: States of the Art Reviews, 11, 397–406. Chang, C.-H., & Cella, D. (1997). Equating health-related quality of life instruments in applied oncology settings. Physical Medicine and Rehabilitation: States of the Art Reviews, 11, 397–406.
2.
go back to reference Ganz, P. A., Schag, C. A., Lee, J. J., & Sim, M. S. (1992). The CARES: A generic measure of health-related quality of life for patients with cancer. Quality of Life Research, 1, 19–29.PubMedCrossRef Ganz, P. A., Schag, C. A., Lee, J. J., & Sim, M. S. (1992). The CARES: A generic measure of health-related quality of life for patients with cancer. Quality of Life Research, 1, 19–29.PubMedCrossRef
3.
go back to reference Schag, C. A., Ganz, P. A., & Heinrich, R. L. (1991). CAncer Rehabilitation Evaluation System-short form (CARES-SF). A cancer specific rehabilitation and quality of life instrument. Cancer, 68, 1406–1413.PubMedCrossRef Schag, C. A., Ganz, P. A., & Heinrich, R. L. (1991). CAncer Rehabilitation Evaluation System-short form (CARES-SF). A cancer specific rehabilitation and quality of life instrument. Cancer, 68, 1406–1413.PubMedCrossRef
4.
go back to reference Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger, M., Cull, A., Duez, N. J., Filiberti, A., Flechtner, H., Fleishman, S. B., & de Haes, J. C. (1993). The European organization for research and treatment of cancer QLQ-C30: A quality-of-life instrument for use in international clinical trials in oncology. Journal of the National Cancer Institute, 85, 365–376.PubMedCrossRef Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger, M., Cull, A., Duez, N. J., Filiberti, A., Flechtner, H., Fleishman, S. B., & de Haes, J. C. (1993). The European organization for research and treatment of cancer QLQ-C30: A quality-of-life instrument for use in international clinical trials in oncology. Journal of the National Cancer Institute, 85, 365–376.PubMedCrossRef
5.
go back to reference Cella, D. F., & Bonomi, A. E. (1995). Measuring quality of life: 1995 update. Oncology (Williston Park), 9, 47–60. Cella, D. F., & Bonomi, A. E. (1995). Measuring quality of life: 1995 update. Oncology (Williston Park), 9, 47–60.
6.
go back to reference Cella, D. F., Tulsky, D. S., Gray, G., Sarafian, B., Linn, E., Bonomi, A., Silberman, M., Yellen, S. B., Winicour, P., Brannon, J., & et al. (1993). The Functional Assessment of Cancer Therapy Scale: Development and validation of the general measure. Journal of Clinical Oncology, 11, 570–579.PubMed Cella, D. F., Tulsky, D. S., Gray, G., Sarafian, B., Linn, E., Bonomi, A., Silberman, M., Yellen, S. B., Winicour, P., Brannon, J., & et al. (1993). The Functional Assessment of Cancer Therapy Scale: Development and validation of the general measure. Journal of Clinical Oncology, 11, 570–579.PubMed
7.
go back to reference Hays, R. D., Sherbourne, C. D., & Mazel, R. M. (1993). The RAND 36-Item Health Survey 1.0. Health Economics, 2, 217–227.PubMedCrossRef Hays, R. D., Sherbourne, C. D., & Mazel, R. M. (1993). The RAND 36-Item Health Survey 1.0. Health Economics, 2, 217–227.PubMedCrossRef
8.
go back to reference Ware, J. E., Jr., & Sherbourne, C. D. (1992). The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Medical Care, 30, 473–483.PubMedCrossRef Ware, J. E., Jr., & Sherbourne, C. D. (1992). The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Medical Care, 30, 473–483.PubMedCrossRef
9.
go back to reference Nandakumar, R. (2004). Traditional dimensionality versus essential dimensionality. Journal of Educational Measurement, 28, 99–117.CrossRef Nandakumar, R. (2004). Traditional dimensionality versus essential dimensionality. Journal of Educational Measurement, 28, 99–117.CrossRef
10.
go back to reference Smith, E. V., Jr. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement, 3, 205–231.PubMed Smith, E. V., Jr. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement, 3, 205–231.PubMed
11.
go back to reference Muthen, B. O., & Muthen, L. K. (2001). Mplus User’s Guide. Version 2. Los Angeles, CA: Muthen & Muthen. Muthen, B. O., & Muthen, L. K. (2001). Mplus User’s Guide. Version 2. Los Angeles, CA: Muthen & Muthen.
12.
go back to reference Hu, L., & Bentler, P. M. (1995). Evaluating model fit. In: R. H. Hoyle (Ed.), Structural equation modeling: concepts, issues and applications (pp. 76–79). Thousand Oaks, CA: Sage Publications. Hu, L., & Bentler, P. M. (1995). Evaluating model fit. In: R. H. Hoyle (Ed.), Structural equation modeling: concepts, issues and applications (pp. 76–79). Thousand Oaks, CA: Sage Publications.
13.
go back to reference Bentler, P. (1990). Comparative fit indices in structural models. Psychological Bulletin, 107, 238–246.PubMedCrossRef Bentler, P. (1990). Comparative fit indices in structural models. Psychological Bulletin, 107, 238–246.PubMedCrossRef
14.
go back to reference Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In: K. A. Bollen, & J. S. Long (Eds.), Testing structural equation models. Newbury Park, CA: Sage Publications. Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In: K. A. Bollen, & J. S. Long (Eds.), Testing structural equation models. Newbury Park, CA: Sage Publications.
15.
go back to reference Kline, R. B. (1998). Principles and practice of structural equation modeling. New York, NY: The Guilford Press. Kline, R. B. (1998). Principles and practice of structural equation modeling. New York, NY: The Guilford Press.
16.
go back to reference McDonald, R. P. (1999). Test theory: A unified treatment. Mahway, NJ: Lawrence Earlbaum. McDonald, R. P. (1999). Test theory: A unified treatment. Mahway, NJ: Lawrence Earlbaum.
17.
go back to reference Hu, L. T., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3, 424–453.CrossRef Hu, L. T., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3, 424–453.CrossRef
18.
go back to reference Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–173.CrossRef Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–173.CrossRef
19.
go back to reference Muraki, E. (1992). A generalized partial credit model: Application of an EM-algorithm. Applied Psychological Measurement, 16, 159.CrossRef Muraki, E. (1992). A generalized partial credit model: Application of an EM-algorithm. Applied Psychological Measurement, 16, 159.CrossRef
20.
go back to reference Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, No. 17. Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, No. 17.
21.
go back to reference Muraki, E., & Bock, R. D. (1997). PARSCALE 3: IRT based test scoring and item analysis for graded items and rating scales. Chicago, IL: Scientific Software International, Inc. Muraki, E., & Bock, R. D. (1997). PARSCALE 3: IRT based test scoring and item analysis for graded items and rating scales. Chicago, IL: Scientific Software International, Inc.
22.
go back to reference Linacre, J. M. (2002). WINSTEPS: Rasch-model computer program. Version 3.36. Chicago: MESA Press. Linacre, J. M. (2002). WINSTEPS: Rasch-model computer program. Version 3.36. Chicago: MESA Press.
23.
go back to reference Verhelst, N. D., & Glas, C. A. W. (1995). The one parameter-logistic model. New York: Springer-Verlag. Verhelst, N. D., & Glas, C. A. W. (1995). The one parameter-logistic model. New York: Springer-Verlag.
24.
go back to reference Stone, C. A., & Zhang, B. (2003). Assessing goodness of fit of item response theory models: A comparison of traditional and alternative procedures. Journal of Educational Measurement, 4, 331–352.CrossRef Stone, C. A., & Zhang, B. (2003). Assessing goodness of fit of item response theory models: A comparison of traditional and alternative procedures. Journal of Educational Measurement, 4, 331–352.CrossRef
25.
go back to reference Stone, C. A. (2000). Monte Carlo based null distribution for an alternative goodness-of-fit test statistic in IRT models. Journal of Educational Measurement, 37(1), 58–75.CrossRef Stone, C. A. (2000). Monte Carlo based null distribution for an alternative goodness-of-fit test statistic in IRT models. Journal of Educational Measurement, 37(1), 58–75.CrossRef
26.
go back to reference Stone, C. A. (2003). Empirical power and type I error rates for an IRT fit statistic that considers the precision of ability estimates. Educational and Psychological Measurement, 63, 566–586.CrossRef Stone, C. A. (2003). Empirical power and type I error rates for an IRT fit statistic that considers the precision of ability estimates. Educational and Psychological Measurement, 63, 566–586.CrossRef
27.
go back to reference Glas, C. A. W. (1999). Modification indices for the 2-PL and the nominal response model. Psychometrika, 64, 273–294.CrossRef Glas, C. A. W. (1999). Modification indices for the 2-PL and the nominal response model. Psychometrika, 64, 273–294.CrossRef
28.
go back to reference Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50–64.CrossRef Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50–64.CrossRef
29.
go back to reference Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: Mesa Press. Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: Mesa Press.
30.
go back to reference Wright, B. D. (1994). Reasonable mean-square fit. Rasch Measurement Transactions, 8, 370. Wright, B. D. (1994). Reasonable mean-square fit. Rasch Measurement Transactions, 8, 370.
31.
go back to reference Smith, R. M., & Suh, K. K. (2003). Rasch fit statistics as a test of the invariance of item parameter estimates. Journal of Applied Measurement, 4, 153–163.PubMed Smith, R. M., & Suh, K. K. (2003). Rasch fit statistics as a test of the invariance of item parameter estimates. Journal of Applied Measurement, 4, 153–163.PubMed
32.
go back to reference Groenvold, M., Bjorner, J. B., Klee, M. C., & Kreiner, S. (1995). Test for item bias in a quality of life questionnaire. Journal of Clinical Epidemiology, 48, 805–816.PubMedCrossRef Groenvold, M., Bjorner, J. B., Klee, M. C., & Kreiner, S. (1995). Test for item bias in a quality of life questionnaire. Journal of Clinical Epidemiology, 48, 805–816.PubMedCrossRef
33.
go back to reference Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.CrossRef Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.CrossRef
34.
go back to reference Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense. Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.
35.
go back to reference Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage Publishers. Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage Publishers.
36.
go back to reference Thissen, D. (1991). MULTILOG TM User’s Guide multiple, categorical item analysis and test scoring using item response theory. Chicago, IL: Scientific Software Inc. Thissen, D. (1991). MULTILOG TM User’s Guide multiple, categorical item analysis and test scoring using item response theory. Chicago, IL: Scientific Software Inc.
37.
go back to reference Thissen, D. (2001). IRTLRDIF: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. Version 2.0b. Thissen, D. (2001). IRTLRDIF: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. Version 2.0b.
38.
go back to reference Collins, W. C., Raju, N. S., & Edwards, J. E. (2000). Assessing differential functioning in a satisfaction scale. Journal of Applied Measurement, 85, 451–461. Collins, W. C., Raju, N. S., & Edwards, J. E. (2000). Assessing differential functioning in a satisfaction scale. Journal of Applied Measurement, 85, 451–461.
39.
go back to reference Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353–368.CrossRef Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353–368.CrossRef
40.
go back to reference STATA. (2004). College Station, TX: StataCorp LP STATA. (2004). College Station, TX: StataCorp LP
41.
go back to reference Crane, P. K., Jolley, L., & van Belle, G. (2003). DIFdetect. Seattle, WA: University of Sashington. Crane, P. K., Jolley, L., & van Belle, G. (2003). DIFdetect. Seattle, WA: University of Sashington.
42.
go back to reference Box, G., & Draper, N. (1987). Empirical model building and response surfaces. New York: John Wiley and Sons. Box, G., & Draper, N. (1987). Empirical model building and response surfaces. New York: John Wiley and Sons.
43.
go back to reference Stewart, A. L., & Ware, J. E., Jr. (1992). Measuring functioning and well-being: The Medical Outcomes Study Approach. London: Duke University Press. Stewart, A. L., & Ware, J. E., Jr. (1992). Measuring functioning and well-being: The Medical Outcomes Study Approach. London: Duke University Press.
44.
go back to reference Gardner, W., Kelleher, K. J., & Pajer, K. A. (2002). Multidimensional adaptive testing for mental health problems in primary care. Medical Care, 40, 812–823.PubMedCrossRef Gardner, W., Kelleher, K. J., & Pajer, K. A. (2002). Multidimensional adaptive testing for mental health problems in primary care. Medical Care, 40, 812–823.PubMedCrossRef
45.
go back to reference Petersen, M. A., Groenvold, M., Aaronson, N., Fayers, P., Sprangers, M., & Bjorner, J. B. (2006). Multidimensional computerized adaptive testing of the EORTC QLQ-C30: Basic developments and evaluations. Quality of Life Research, 15, 315–329.PubMedCrossRef Petersen, M. A., Groenvold, M., Aaronson, N., Fayers, P., Sprangers, M., & Bjorner, J. B. (2006). Multidimensional computerized adaptive testing of the EORTC QLQ-C30: Basic developments and evaluations. Quality of Life Research, 15, 315–329.PubMedCrossRef
Metagegevens
Titel
IRT health outcomes data analysis project: an overview and summary
Auteurs
Karon F. Cook
Cayla R. Teal
Jakob B. Bjorner
David Cella
Chih-Hung Chang
Paul K. Crane
Laura E. Gibbons
Ron D. Hays
Colleen A. McHorney
Katja Ocepek-Welikson
Anastasia E. Raczek
Jeanne A. Teresi
Bryce B. Reeve
Publicatiedatum
01-08-2007
Uitgeverij
Springer Netherlands
Gepubliceerd in
Quality of Life Research / Uitgave bijlage 1/2007
Print ISSN: 0962-9343
Elektronisch ISSN: 1573-2649
DOI
https://doi.org/10.1007/s11136-007-9177-5

Andere artikelen bijlage 1/2007

Quality of Life Research 1/2007 Naar de uitgave