Swipe om te navigeren naar een ander artikel
Most computerized adaptive testing (CAT) applications in patient-reported outcomes (PRO) measurement to date are reliability-centric, with a primary objective of maximizing measurement efficiency. A key concern and a potential threat to validity is that, when left unconstrained, individual CAT administrations could have items with systematically different attributes, e.g., sub-domain coverage. This paper aims to provide a solution to the problem from an optimal test design framework using the shadow-test approach to CAT.
Following the approach, a case study was conducted using the PROMIS® (Patient-Reported Outcomes Measurement Information System) fatigue item bank both with empirical and simulated response data. Comparisons between CAT administrations without and with the enforcement of content and item pool usage constraints were examined.
The unconstrained CAT exhibited a high degree of variation in items selected from different substrata of the item bank. Contrastingly, the shadow-test approach delivered CAT administrations conforming to all specifications with a minimal loss in measurement efficiency.
The optimal test design and shadow-test approach to CAT provide a flexible framework for solving complex test-assembly problems with better control of their domain coverage than for the conventional use of CAT in PRO measurement. Applications in a wide array of PRO domains are expected to lead to more controlled and balanced use of CAT in the field.
Log in om toegang te krijgen
Met onderstaand(e) abonnement(en) heeft u direct toegang:
van der Linden, W. J. (2005). Linear models for optimal test design. New York: Springer. CrossRef
van der Linden, W. J., & Reese, L. M. (1998). A model for optimal constrained adaptive testing. Applied Psychological Measurement, 22, 259–270. CrossRef
van der Linden, W. J. (2016). Optimal test assembly. In S. Lane, M. Raymond, & T. M. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 507–530). New York: Routledge.
Chen, D.-S., Batson, R. G., & Dang, Y. (2010). Applied integer programming: Modeling and solution. Hoboken: Wiley.
van der Linden, W. J., & Diao, Q. (2014). Using a universal shadow-test assembler with multistage testing. In D. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 101–118). Boca Raton, FL: Chapman & Hall/CRC.
Kingsbury, G., & Zara, A. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2, 359–75. CrossRef
Stocking, M. L., & Swanson, L. (1993). A method for severely constrained item selection in adaptive testing. Applied Psychological Measurement, 17, 277–292. CrossRef
Lord, F. M. (1974). Quick estimates of the relative efficiency of two tests as a function of ability level. Journal of Educational Measurement, 11, 247–254. CrossRef
van der Linden, W. J., & Glas, C. A. W. (2000). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education, 12, 35–53. CrossRef
McBride, J. R., & Martin, J. T. (1983). Reliability and validity of adaptive ability tests in a military setting. In D. J. Weiss (Ed.), New horizons in testing (pp. 223–236). San Diego, CA: Academic Press.
van der Linden, W. J., & Veldkamp, B. (2007). Conditional item-exposure control in adaptive testing using item-ineligibility probabilities. Journal of Educational and Behavioral Statistics, 32, 398–418. CrossRef
Chang, H.-H., & Ying, Z. (1999). \(\alpha\)-Stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211–222. CrossRef
Lai, J. S., Cella, D., Choi, S. W., Junghaenel, D. U., Christodoulou, C., Gershon, R., et al. (2011). How item banks and their application can influence measurement practice in rehabilitation medicine: A PROMIS fatigue item bank example. Archive of Physical Medicine and Rehabilitation, 92(10 Suppl), S20–S27. CrossRef
Yost, K. J., Waller, N. G., Lee, M. K., & Vincent, A. (2017). The PROMIS fatigue item bank has good measurement properties in patients with fibromyalgia and severe fatigue. Quality of Life Research. doi: 10.1007/s11136-017-1501-0. Epub ahead of print.
Cella, D. (2015). “PROMIS 1 Wave 1”, hdl:1902.1/21134, Harvard Dataverse, V1.
Choi, S. W. (2009). Firestar: Computerized adaptive testing simulation program for polytomous item response theory models. Applied Psychological Measurement, 33, 644–645. CrossRef
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145. CrossRef
American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Joint Committee on Standards for Educational and Psychological Testing. (2014). Standards for Educational and Psychological Testing. Washington, DC: AERA.
US Department of Health and Human Services (USDHHS). 2009. Guidance for industry. Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. Retrieved March 7, 2017 from https://www.fda.gov/downloads/drugs/guidances/ucm193282.pdf.
van der Linden, W. J. (2000). Constrained adaptive testing with shadow tests. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 27–52). Boston, MA: Kluwer. CrossRef
- Ensuring content validity of patient-reported outcomes: a shadow-test approach to their adaptive measurement
Seung W. Choi
Wim J. van der Linden
- Springer International Publishing