Top

Quality of Life Research

Gepubliceerd in:

14-07-2017 | Special Section: Test Construction (by invitation only)

Ensuring content validity of patient-reported outcomes: a shadow-test approach to their adaptive measurement

Auteurs: Seung W. Choi, Wim J. van der Linden

Gepubliceerd in: Quality of Life Research | Uitgave 7/2018

Abstract

Purpose

Most computerized adaptive testing (CAT) applications in patient-reported outcomes (PRO) measurement to date are reliability-centric, with a primary objective of maximizing measurement efficiency. A key concern and a potential threat to validity is that, when left unconstrained, individual CAT administrations could have items with systematically different attributes, e.g., sub-domain coverage. This paper aims to provide a solution to the problem from an optimal test design framework using the shadow-test approach to CAT.

Methods

Following the approach, a case study was conducted using the PROMIS^® (Patient-Reported Outcomes Measurement Information System) fatigue item bank both with empirical and simulated response data. Comparisons between CAT administrations without and with the enforcement of content and item pool usage constraints were examined.

Results

The unconstrained CAT exhibited a high degree of variation in items selected from different substrata of the item bank. Contrastingly, the shadow-test approach delivered CAT administrations conforming to all specifications with a minimal loss in measurement efficiency.

Conclusions

The optimal test design and shadow-test approach to CAT provide a flexible framework for solving complex test-assembly problems with better control of their domain coverage than for the conventional use of CAT in PRO measurement. Applications in a wide array of PRO domains are expected to lead to more controlled and balanced use of CAT in the field.

vorige artikel Measurement versus prediction in the construction of patient-reported outcome questionnaires: can we have our cake and eat it?

volgende artikel Application of validity theory and methodology to patient-reported outcome measures (PROMs): building an argument for validity

Alleen toegankelijk voor geautoriseerde gebruikers

van der Linden, W. J. (2005). Linear models for optimal test design. New York: Springer.CrossRef

van der Linden, W. J., & Reese, L. M. (1998). A model for optimal constrained adaptive testing. Applied Psychological Measurement, 22, 259–270.CrossRef

van der Linden, W. J. (2016). Optimal test assembly. In S. Lane, M. Raymond, & T. M. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 507–530). New York: Routledge.

Chen, D.-S., Batson, R. G., & Dang, Y. (2010). Applied integer programming: Modeling and solution. Hoboken: Wiley.

van der Linden, W. J., & Diao, Q. (2014). Using a universal shadow-test assembler with multistage testing. In D. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 101–118). Boca Raton, FL: Chapman & Hall/CRC.

Kingsbury, G., & Zara, A. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2, 359–75.CrossRef

Stocking, M. L., & Swanson, L. (1993). A method for severely constrained item selection in adaptive testing. Applied Psychological Measurement, 17, 277–292.CrossRef

Lord, F. M. (1974). Quick estimates of the relative efficiency of two tests as a function of ability level. Journal of Educational Measurement, 11, 247–254.CrossRef

van der Linden, W. J., & Glas, C. A. W. (2000). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education, 12, 35–53.CrossRef

10.

Cheng, Y., Patton, J. M., & Shao, C. (2015). \(\alpha\)-Stratified computerized adaptive testing in the presence of calibration error. Educational and Psychological Measurement, 75, 260–283.CrossRefPubMed

11.

McBride, J. R., & Martin, J. T. (1983). Reliability and validity of adaptive ability tests in a military setting. In D. J. Weiss (Ed.), New horizons in testing (pp. 223–236). San Diego, CA: Academic Press.

12.

van der Linden, W. J., & Veldkamp, B. (2007). Conditional item-exposure control in adaptive testing using item-ineligibility probabilities. Journal of Educational and Behavioral Statistics, 32, 398–418.CrossRef

13.

Chang, H.-H., & Ying, Z. (1999). \(\alpha\)-Stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211–222.CrossRef

14.

Lai, J. S., Cella, D., Choi, S. W., Junghaenel, D. U., Christodoulou, C., Gershon, R., et al. (2011). How item banks and their application can influence measurement practice in rehabilitation medicine: A PROMIS fatigue item bank example. Archive of Physical Medicine and Rehabilitation, 92(10 Suppl), S20–S27.CrossRef

15.

Yellen, S. B., Cella, D. F., Webster, K., Blendowski, C., & Kaplan, E. (1997). Measuring fatigue and other anemia-related symptoms with the Functional Assessment of Cancer Therapy (FACT) measurement system. Journal of Pain and Symptom Management, 13, 63–74.CrossRefPubMed

16.

Yost, K. J., Waller, N. G., Lee, M. K., & Vincent, A. (2017). The PROMIS fatigue item bank has good measurement properties in patients with fibromyalgia and severe fatigue. Quality of Life Research. doi: 10.1007/s11136-017-1501-0. Epub ahead of print.

17.

Cella, D. (2015). “PROMIS 1 Wave 1”, hdl:1902.1/21134, Harvard Dataverse, V1.

18.

Choi, S. W. (2009). Firestar: Computerized adaptive testing simulation program for polytomous item response theory models. Applied Psychological Measurement, 33, 644–645.CrossRef

19.

Choi, S. W., & Swartz, R. J. (2009). Comparison of CAT item selection criteria for polytomous items. Applied Psychological Measurement, 33, 419–440.CrossRefPubMedPubMedCentral

20.

Choi, S. W., Reise, S. P., Pilkonis, P. A., Hays, R. D., & Cella, D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research, 19, 125–136.CrossRefPubMed

21.

Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145.CrossRef

22.

Choi, S. W., Moellering, K. T., Li, J., & van der Linden, W. J. (2016). Optimal reassembly of shadow tests in CAT. Applied Psychological Measurement, 40, 469–485.CrossRefPubMedPubMedCentral

23.

American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Joint Committee on Standards for Educational and Psychological Testing. (2014). Standards for Educational and Psychological Testing. Washington, DC: AERA.

24.

US Department of Health and Human Services (USDHHS). 2009. Guidance for industry. Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. Retrieved March 7, 2017 from https://www.fda.gov/downloads/drugs/guidances/ucm193282.pdf.

25.

van der Linden, W. J. (2000). Constrained adaptive testing with shadow tests. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 27–52). Boston, MA: Kluwer.CrossRef

Titel: Ensuring content validity of patient-reported outcomes: a shadow-test approach to their adaptive measurement
Auteurs: Seung W. Choi
Wim J. van der Linden
Publicatiedatum: 14-07-2017
Uitgeverij: Springer International Publishing
Gepubliceerd in: Quality of Life Research / Uitgave 7/2018
Print ISSN: 0962-9343
Elektronisch ISSN: 1573-2649
DOI: https://doi.org/10.1007/s11136-017-1650-1

Bohn Stafleu van Loghum

Deel dit onderdeel of sectie (kopieer de link)

Abstract

Purpose

Methods

Results

Conclusions

Log in om toegang te krijgen

Andere artikelen Uitgave 7/2018

Application of validity theory and methodology to patient-reported outcome measures (PROMs): building an argument for validity

Measurement versus prediction in the construction of patient-reported outcome questionnaires: can we have our cake and eat it?

A meta-analytic review of measurement equivalence study findings of the SF-36® and SF-12® Health Surveys across electronic modes compared to paper administration

The use of latent variable mixture models to identify invariant items in test construction

Importance ratings on patient-reported outcome items for survivorship care: comparison between pediatric cancer survivors, parents, and clinicians

Psychometric evaluation of the Chinese version of the Child Health Utility 9D (CHU9D-CHN): a school-based study in China