Original Article
An evaluation of a patient-reported outcomes found computerized adaptive testing was efficient in assessing osteoarthritis impact

https://doi.org/10.1016/j.jclinepi.2005.07.019Get rights and content

Abstract

Background and Objectives

Evaluate a patient-reported outcomes questionnaire that uses computerized adaptive testing (CAT) to measure the impact of osteoarthritis (OA) on functioning and well-being.

Materials and Methods

OA patients completed 37 questions about the impact of OA on physical, social and role functioning, emotional well-being, and vitality. Questionnaire responses were calibrated and scored using item response theory, and two scores were estimated: a Total-OA score based on patients' responses to all 37 questions, and a simulated CAT-OA score where the computer selected and scored the five most informative questions for each patient. Agreement between Total-OA and CAT-OA scores was assessed using correlations. Discriminant validity of Total-OA and CAT-OA scores was assessed with analysis of variance. Criterion measures included OA pain and severity, patient global assessment, and missed work days.

Results

Simulated CAT-OA and Total-OA scores correlated highly (r = 0.96). Both Total-OA and simulated CAT-OA scores discriminated significantly between patients differing on the criterion measures. F-statistics across criterion measures ranged from 39.0 (P < .001) to 225.1 (P < .001) for the Total-OA score, and from 40.5 (P < .001) to 221.5 (P < .001) for the simulated CAT-OA score.

Conclusions

CAT methods produce valid and precise estimates of the impact of OA on functioning and well-being with significant reduction in response burden.

Introduction

Twenty-one million Americans suffer from osteoarthritis (OA) [1], a musculoskeletal disease characterized by joint deterioration and chronic pain. In 1994, the cost of osteoarthritis was estimated at $15.5 billion, with more than half stemming from lost wages due to disability [2]. As the population ages, the burden posed by OA to patients and society increases. Osteoarthritis pain contributes to decreased physical function, disability, and poorer quality of life among those with the disease [3], [4], [5], [6]. The long-term goals of OA management are to alleviate pain and minimize functional impairment, both of which often go underestimated and undertreated in patients with OA [7]. Until recently, clinical trials evaluating OA therapies infrequently included patient-reported quality-of-life measures that would reflect achievement of those goals [8].

Several standardized questionnaires assess the impact of OA on functional status and well-being [9], [10], [11]. Such questionnaires have at least three potential uses: (1) evaluate outcomes in clinical trials; (2) identify patients in need of treatment and monitor treatment outcomes in everyday clinical practice; and (3) improve the communication between patients and doctors. Available OA-specific questionnaires have proven useful in group-level comparisons, the focus of most clinical studies and trials. Like many widely used health outcomes measures, these OA-specific questionnaires lack the reliability and precision necessary for screening and assessing outcomes for individual patients [12]. One solution might be to use long questionnaires in an attempt to increase reliability and measurement precision. This approach, however, does not offer a solution for clinical practice settings, where time is limited and long questionnaires place undue burdens on both patients and providers.

Computerized adaptive testing (CAT) methods offer a potential solution to the tradeoff between measurement precision and response burden [13], [14], [15]. CAT surveys offer a way to achieve a high degree of precision with a short-form questionnaire. By contrast to traditional fixed-length surveys that ask the same questions of everyone regardless of their health, CAT surveys individualize each assessment, asking each person only the most informative questions relevant to their particular level of health. In this way, each person is administered his or her own short form. The computer scores the responses on a standardized metric that permits comparisons among patients answering different questions from the same pool of items.

Computerized adaptive assessments offer several noteworthy advantages over the more common fixed-form health surveys [15], [16]. First, by selecting the most appropriate items for each person's health level, they optimize relevance and measurement precision for a given response burden. Second, they permit measurement precision to be tailored to the particular application. For example, if the assessment is used for screening, precision could be set to be the highest for scores near the screening cutoff to ensure greater accuracy. Third, because all items in the item pool are calibrated on a common metric, assessments can be compared even when people have answered different items. Fourth, item pools can be expanded by seeding and evaluating new items without sacrificing comparability to scores based on items in the original pool. This feature permits introduction of items from other widely used questionnaires measuring the same concept into the item pool. Once calibrated, the score from the computerized adaptive assessment can be expressed in the metric of the original source instrument [17]. This facilitates comparability to previous research results using other widely used questionnaires. Fifth, the quality of the data can be monitored in real time for each respondent by monitoring aberrant response patterns [16].

The focus of this article is on the empirical evaluation of the computerized adaptive assessment of OA impact. First, we describe the methods used to calibrate and score a pool of 52 items from the OA impact survey on a single metric. Second, we present results of analyses to assess the agreement between scale scores based on all the OA impact items that fitted the measurement model (Total-OA impact scores) and simulated CAT scale scores based on the five most informative items for each person (CAT-OA impact scores). Third, we evaluate the relative performance of each scale in validity tests using criterion measures of OA pain and severity, patient global assessments, and missed work days.

Section snippets

Sample

We developed the OA Impact survey using data from a general population survey of respondents with osteoarthritis conducted via the Internet using America Online's (AOL) Opinion Place. AOL subscribers who logged into the AOL Opinion Place were randomly assigned to one of many ongoing surveys. Participants eligible for this study: (1) were ages 18 years or older; (2) were not employed by any marketing research or advertising company; and (3) screened positive for osteoarthritis. The OA screener

Sample characteristics

The survey was completed via the Internet by 601 respondents who met the OA screening criteria. Respondent's age ranged from 20 to 85 years (mean 48.6 and SD of 13.2). Most respondents were female (69%), White (91%), with a high school or college education (87%).

Development of OA impact measure

Results of confirmatory factor analyses and tests of the fit of the GPC IRT model were satisfactory for 37 of the 52 OA impact items included in the survey (data available upon request). These 37 items all loaded highly (r > 0.78) on a

Discussion

The results of this study showed that short computerized adaptive assessments produced scores that were in high agreement with scores based on all items in the pool, lead to the same statistical conclusions about group differences, and had the same pattern of correlations with disease-specific and generic measures of health-related quality of life. CAT scores were found to be precise at most score levels. For example, among individuals with more than average OA impact (OA Impact scores ≥50) the

References (38)

  • R.F. Meenan et al.

    Measuring health status in arthritis: the Arthritis Impact Measurement Scales

    Arthritis Rheum

    (1980)
  • A.M. Jette

    The Functional Status Index: reliability and validity of a self-report functional disability measure

    J Rheumatol

    (1987)
  • C.A. McHorney et al.

    Individual-patient monitoring in clinical practice: are available health status surveys adequate?

    Qual Life Res

    (1995)
  • W.J. van der Linden et al.

    Computerized adaptive testing: Theory and practice

    (2000)
  • H. Wainer et al.

    Computerized adaptive testing: A primer

    (1990)
  • H. Wainer et al.

    Computerized adaptive testing: a primer

    (2000)
  • J.E. Ware et al.

    Applications of computerized adaptive testing (CAT) to the assessment of headache impact

    Qual Life Res

    (2003)
  • J.B. Bjorner et al.

    Using item response theory to calibrate the Headache Impact Test (HIT) to the metric of traditional headache scales

    Qual Life Res

    (2003)
  • J.B. Bjorner et al.

    Calibration of an item pool for assessing the burden of headaches: an application of item response theory to the headache impact test (HIT)

    Qual Life Res

    (2003)
  • Cited by (35)

    • Gquest: Modeling patient questionnaires and administering them through a mobile platform application

      2014, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      The advantages of CAT in designing questionnaires and surveys are increasingly being acknowledged also in many other areas involving the self-reporting of outcomes by patients. Examples are given by Anatchkova et al. [24] concerning chronic pain, by Caronni et al. [25] for estimating the QoL in adolescents affected by idiopathic scoliosis, by Kosinski et al. [26] concerning the impact of osteoarthritis or by Kocalevent et al. [27] for the patient perception of stress. This happens because CAT decreases the burden for the patient by tailoring the item set to the specific situation to be monitored.

    • Development of symptom assessments utilising item response theory and computer-adaptive testing-A practical method based on a systematic review

      2010, Critical Reviews in Oncology/Hematology
      Citation Excerpt :

      The data extraction process and development of a stepwise approach were done by the study team applying a consensual approach. Searching the online databases and hand-searching resulted in the retrieval of 192 papers, of which 32 met the inclusion criteria (Fig. 1) [9–12,17–45]. The selected studies cover a wide range of PRO instruments assessing physical functioning in different populations, cancer-related fatigue, depression, or anxiety (Table 1).

    View all citing articles on Scopus
    View full text