Skip to main content
Top
Gepubliceerd in: Quality of Life Research 7/2018

14-07-2017 | Special Section: Test Construction (by invitation only)

Ensuring content validity of patient-reported outcomes: a shadow-test approach to their adaptive measurement

Auteurs: Seung W. Choi, Wim J. van der Linden

Gepubliceerd in: Quality of Life Research | Uitgave 7/2018

Log in om toegang te krijgen
share
DELEN

Deel dit onderdeel of sectie (kopieer de link)

  • Optie A:
    Klik op de rechtermuisknop op de link en selecteer de optie “linkadres kopiëren”
  • Optie B:
    Deel de link per e-mail

Abstract

Purpose

Most computerized adaptive testing (CAT) applications in patient-reported outcomes (PRO) measurement to date are reliability-centric, with a primary objective of maximizing measurement efficiency. A key concern and a potential threat to validity is that, when left unconstrained, individual CAT administrations could have items with systematically different attributes, e.g., sub-domain coverage. This paper aims to provide a solution to the problem from an optimal test design framework using the shadow-test approach to CAT.

Methods

Following the approach, a case study was conducted using the PROMIS® (Patient-Reported Outcomes Measurement Information System) fatigue item bank both with empirical and simulated response data. Comparisons between CAT administrations without and with the enforcement of content and item pool usage constraints were examined.

Results

The unconstrained CAT exhibited a high degree of variation in items selected from different substrata of the item bank. Contrastingly, the shadow-test approach delivered CAT administrations conforming to all specifications with a minimal loss in measurement efficiency.

Conclusions

The optimal test design and shadow-test approach to CAT provide a flexible framework for solving complex test-assembly problems with better control of their domain coverage than for the conventional use of CAT in PRO measurement. Applications in a wide array of PRO domains are expected to lead to more controlled and balanced use of CAT in the field.
Bijlagen
Alleen toegankelijk voor geautoriseerde gebruikers
Literatuur
1.
go back to reference van der Linden, W. J. (2005). Linear models for optimal test design. New York: Springer.CrossRef van der Linden, W. J. (2005). Linear models for optimal test design. New York: Springer.CrossRef
2.
go back to reference van der Linden, W. J., & Reese, L. M. (1998). A model for optimal constrained adaptive testing. Applied Psychological Measurement, 22, 259–270.CrossRef van der Linden, W. J., & Reese, L. M. (1998). A model for optimal constrained adaptive testing. Applied Psychological Measurement, 22, 259–270.CrossRef
3.
go back to reference van der Linden, W. J. (2016). Optimal test assembly. In S. Lane, M. Raymond, & T. M. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 507–530). New York: Routledge. van der Linden, W. J. (2016). Optimal test assembly. In S. Lane, M. Raymond, & T. M. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 507–530). New York: Routledge.
4.
go back to reference Chen, D.-S., Batson, R. G., & Dang, Y. (2010). Applied integer programming: Modeling and solution. Hoboken: Wiley. Chen, D.-S., Batson, R. G., & Dang, Y. (2010). Applied integer programming: Modeling and solution. Hoboken: Wiley.
5.
go back to reference van der Linden, W. J., & Diao, Q. (2014). Using a universal shadow-test assembler with multistage testing. In D. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 101–118). Boca Raton, FL: Chapman & Hall/CRC. van der Linden, W. J., & Diao, Q. (2014). Using a universal shadow-test assembler with multistage testing. In D. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 101–118). Boca Raton, FL: Chapman & Hall/CRC.
6.
go back to reference Kingsbury, G., & Zara, A. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2, 359–75.CrossRef Kingsbury, G., & Zara, A. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2, 359–75.CrossRef
7.
go back to reference Stocking, M. L., & Swanson, L. (1993). A method for severely constrained item selection in adaptive testing. Applied Psychological Measurement, 17, 277–292.CrossRef Stocking, M. L., & Swanson, L. (1993). A method for severely constrained item selection in adaptive testing. Applied Psychological Measurement, 17, 277–292.CrossRef
8.
go back to reference Lord, F. M. (1974). Quick estimates of the relative efficiency of two tests as a function of ability level. Journal of Educational Measurement, 11, 247–254.CrossRef Lord, F. M. (1974). Quick estimates of the relative efficiency of two tests as a function of ability level. Journal of Educational Measurement, 11, 247–254.CrossRef
9.
go back to reference van der Linden, W. J., & Glas, C. A. W. (2000). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education, 12, 35–53.CrossRef van der Linden, W. J., & Glas, C. A. W. (2000). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education, 12, 35–53.CrossRef
10.
go back to reference Cheng, Y., Patton, J. M., & Shao, C. (2015). \(\alpha\)-Stratified computerized adaptive testing in the presence of calibration error. Educational and Psychological Measurement, 75, 260–283.CrossRefPubMed Cheng, Y., Patton, J. M., & Shao, C. (2015). \(\alpha\)-Stratified computerized adaptive testing in the presence of calibration error. Educational and Psychological Measurement, 75, 260–283.CrossRefPubMed
11.
go back to reference McBride, J. R., & Martin, J. T. (1983). Reliability and validity of adaptive ability tests in a military setting. In D. J. Weiss (Ed.), New horizons in testing (pp. 223–236). San Diego, CA: Academic Press. McBride, J. R., & Martin, J. T. (1983). Reliability and validity of adaptive ability tests in a military setting. In D. J. Weiss (Ed.), New horizons in testing (pp. 223–236). San Diego, CA: Academic Press.
12.
go back to reference van der Linden, W. J., & Veldkamp, B. (2007). Conditional item-exposure control in adaptive testing using item-ineligibility probabilities. Journal of Educational and Behavioral Statistics, 32, 398–418.CrossRef van der Linden, W. J., & Veldkamp, B. (2007). Conditional item-exposure control in adaptive testing using item-ineligibility probabilities. Journal of Educational and Behavioral Statistics, 32, 398–418.CrossRef
13.
go back to reference Chang, H.-H., & Ying, Z. (1999). \(\alpha\)-Stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211–222.CrossRef Chang, H.-H., & Ying, Z. (1999). \(\alpha\)-Stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211–222.CrossRef
14.
go back to reference Lai, J. S., Cella, D., Choi, S. W., Junghaenel, D. U., Christodoulou, C., Gershon, R., et al. (2011). How item banks and their application can influence measurement practice in rehabilitation medicine: A PROMIS fatigue item bank example. Archive of Physical Medicine and Rehabilitation, 92(10 Suppl), S20–S27.CrossRef Lai, J. S., Cella, D., Choi, S. W., Junghaenel, D. U., Christodoulou, C., Gershon, R., et al. (2011). How item banks and their application can influence measurement practice in rehabilitation medicine: A PROMIS fatigue item bank example. Archive of Physical Medicine and Rehabilitation, 92(10 Suppl), S20–S27.CrossRef
15.
go back to reference Yellen, S. B., Cella, D. F., Webster, K., Blendowski, C., & Kaplan, E. (1997). Measuring fatigue and other anemia-related symptoms with the Functional Assessment of Cancer Therapy (FACT) measurement system. Journal of Pain and Symptom Management, 13, 63–74.CrossRefPubMed Yellen, S. B., Cella, D. F., Webster, K., Blendowski, C., & Kaplan, E. (1997). Measuring fatigue and other anemia-related symptoms with the Functional Assessment of Cancer Therapy (FACT) measurement system. Journal of Pain and Symptom Management, 13, 63–74.CrossRefPubMed
16.
go back to reference Yost, K. J., Waller, N. G., Lee, M. K., & Vincent, A. (2017). The PROMIS fatigue item bank has good measurement properties in patients with fibromyalgia and severe fatigue. Quality of Life Research. doi: 10.1007/s11136-017-1501-0. Epub ahead of print. Yost, K. J., Waller, N. G., Lee, M. K., & Vincent, A. (2017). The PROMIS fatigue item bank has good measurement properties in patients with fibromyalgia and severe fatigue. Quality of Life Research. doi: 10.1007/s11136-017-1501-0. Epub ahead of print.
17.
go back to reference Cella, D. (2015). “PROMIS 1 Wave 1”, hdl:1902.1/21134, Harvard Dataverse, V1. Cella, D. (2015). “PROMIS 1 Wave 1”, hdl:1902.1/21134, Harvard Dataverse, V1.
18.
go back to reference Choi, S. W. (2009). Firestar: Computerized adaptive testing simulation program for polytomous item response theory models. Applied Psychological Measurement, 33, 644–645.CrossRef Choi, S. W. (2009). Firestar: Computerized adaptive testing simulation program for polytomous item response theory models. Applied Psychological Measurement, 33, 644–645.CrossRef
19.
20.
go back to reference Choi, S. W., Reise, S. P., Pilkonis, P. A., Hays, R. D., & Cella, D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research, 19, 125–136.CrossRefPubMed Choi, S. W., Reise, S. P., Pilkonis, P. A., Hays, R. D., & Cella, D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research, 19, 125–136.CrossRefPubMed
21.
go back to reference Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145.CrossRef Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145.CrossRef
22.
go back to reference Choi, S. W., Moellering, K. T., Li, J., & van der Linden, W. J. (2016). Optimal reassembly of shadow tests in CAT. Applied Psychological Measurement, 40, 469–485.CrossRefPubMedPubMedCentral Choi, S. W., Moellering, K. T., Li, J., & van der Linden, W. J. (2016). Optimal reassembly of shadow tests in CAT. Applied Psychological Measurement, 40, 469–485.CrossRefPubMedPubMedCentral
23.
go back to reference American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Joint Committee on Standards for Educational and Psychological Testing. (2014). Standards for Educational and Psychological Testing. Washington, DC: AERA. American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Joint Committee on Standards for Educational and Psychological Testing. (2014). Standards for Educational and Psychological Testing. Washington, DC: AERA.
25.
go back to reference van der Linden, W. J. (2000). Constrained adaptive testing with shadow tests. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 27–52). Boston, MA: Kluwer.CrossRef van der Linden, W. J. (2000). Constrained adaptive testing with shadow tests. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 27–52). Boston, MA: Kluwer.CrossRef
Metagegevens
Titel
Ensuring content validity of patient-reported outcomes: a shadow-test approach to their adaptive measurement
Auteurs
Seung W. Choi
Wim J. van der Linden
Publicatiedatum
14-07-2017
Uitgeverij
Springer International Publishing
Gepubliceerd in
Quality of Life Research / Uitgave 7/2018
Print ISSN: 0962-9343
Elektronisch ISSN: 1573-2649
DOI
https://doi.org/10.1007/s11136-017-1650-1

Andere artikelen Uitgave 7/2018

Quality of Life Research 7/2018 Naar de uitgave

Special Section: Test Construction (by invitation only)

Measurement invariance, the lack thereof, and modeling change