Validation of the depression item bank from the Patient-Reported Outcomes Measurement Information System (PROMIS®) in a three-month observational study
Introduction
The Patient-Reported Outcomes Measurement Information System (PROMIS®) is an NIH Roadmap initiative devoted to developing better measurement tools for assessing constructs relevant to the clinical investigation and treatment of all diseases—constructs such as pain, fatigue, emotional distress, sleep, physical functioning, and social participation (Buysse et al., 2010, Cella et al., 2010, Cella et al., 2007b, Fries et al., 2009, Fries et al., 2014, Pilkonis et al., 2011, Revicki et al., 2009). PROMIS has created and refined a comprehensive methodology for developing item banks of these health-related constructs using both qualitative and quantitative techniques and modern psychometric methods (item response theory, IRT) (Cella et al., 2007a, Cella et al., 2010, Hilton, 2011, Reeve et al., 2007). These item banks encompass physical, mental, and social health, consistent with the World Health Organization's tripartite framework (Cella et al., 2007a, World Health Organization, 2007).
The use of models from IRT to calibrate items not only results in greater precision at the item and test levels but also promotes greater flexibility in test administration. For example, items can be administered as computerized adaptive tests (CATs), or static short forms can be created and tailored for samples with different levels of severity of the construct being assessed. Analyses of potential differential item functioning due to gender, age, and educational attainment were performed during the development of the item banks to ensure that items performed comparably regardless of variations in these background characteristics. In general, experience with CAT suggests that the PROMIS depression item bank provides excellent precision with 4–6 items (Choi et al., 2010). A generic 8-item short form is also available, and this short form was one of the cross-cutting dimensional measures used in the DSM-5 field trials, where its feasibility was established and where it performed well with regard to test-retest reliability (Narrow et al., 2013). Following creation of the item banks, our priority has been to validate them, most often in short-term observational studies. These studies allow us to examine the psychometric properties of the item banks, their responsiveness to change, their relationships to clinically significant benchmarks of improvement, and their similarities and differences when compared with other commonly used instruments.
We report here on a prospective observational study with depressed outpatients in the early stages of a new treatment episode. For this purpose, all participants completed study assessments at three points: baseline (T1, as close to the beginning of treatment as possible but no later than four months after its start), one month following baseline (T2), and three months following baseline (T3). The protocol was designed to compare the psychometric properties of the PROMIS depression item bank (administered as a CAT) with two legacy self-report instruments: the Center for Epidemiological Studies Depression scale (CESD; Radloff, 1977) and the Patient Health Questionnaire (PHQ-9; Spitzer et al., 1999).
The study was not intended to evaluate treatment effectiveness. Rather, the main consideration was to conduct a study involving established treatments that would allow us to investigate the operating characteristics of the different measures of depression over a time frame (three months) consistent with the design of clinical trials and comparative effectiveness research. Regardless of their impact in the aggregate, treatments for depression generate considerable variability in individual outcomes, and this variability was desirable for examining psychometric issues. In our setting, the most common form of outpatient treatment for depression is a combination of antidepressant medication and supportive psychotherapy (both individual and group therapies), with smaller proportions of patients receiving medication only or psychotherapy only. No untreated or control group was included.
There have been other attempts to link PROMIS depression to legacy measures for depression. The PROsetta Stone project (Choi et al., 2012) was designed specifically to create “cross-walks” between PROMIS measures in several domains and commonly used measures (most often developed using classical test theory) in those same domains. A PROsetta Stone report provides a conversion table from raw CESD scores to PROMIS depression scores (Choi et al., 2013a). The PROMIS depression equivalent for the CESD threshold of 16 is 56.2; for the CESD threshold of 21, it is 59.1. (Note that PROMIS depression is scored with a T-score metric in which the mean of the general population is 50, with a standard deviation of 10.)
Another PROsetta Stone report provides a conversion table from raw PHQ-9 scores to PROMIS depression scores (Choi et al., 2013b). The PROMIS depression equivalent for the PHQ-9 threshold of 5 (mild depression) is 52.5; for the threshold of 10 (moderate depression), 59.9; for the threshold of 15 (moderately severe depression), 65.8; and for the threshold of 20 (severe depression), 71.5. Gibbons et al. (2011) also reported analyses linking PROMIS depression and the PHQ-9 in a sample of HIV patients. Their results were generally comparable to the PROsetta Stone linkages. However, there was some discrepancy at the mild end of the PHQ-9 where they found rather low PROMIS depression scores to be equivalent: “Mild depression (PHQ-9 score of 5–9) corresponds to scores of 42–51 on the PROMIS metric, moderate depression [10–14] to 52–63, moderately severe [15–19] to 64–72, and severe [20+] to scores of 73 and higher” (figure caption, p. 1353). In general, thresholds suggesting depression of some clinical significance (CESD = 21, PHQ-9 = 10) have been linked to a PROMIS score of about 60, the usual threshold used clinically with the T-score metric (1 SD above the mean).
Finally, in a study using two different IRT linking methods, Olino et al. (2013) compared the Beck Depression Inventory (Beck et al., 1961) the CESD, and the PROMIS depression item banks in a community sample of adolescents. Among the three measures, PROMIS depression provided information over the widest range of symptom severity while demonstrating the highest level of precision. This result was especially true for the full PROMIS depression item bank of 28 items, but it also applied to the PROMIS depression short form of 8 items, which is considerably briefer than either the BDI or the CESD.
Section snippets
Inclusion criteria
Men and women 18 years and older who were able to read and understand English and able and willing to give informed consent were enrolled in the protocol. They were required to be within the first four months of outpatient treatment for major depressive disorder (MDD) at Western Psychiatric Institute and Clinic (WPIC) and its affiliates. To ensure that participants were not too close to the floor for depression when beginning the protocol (and thus unable to show further change), we required a
Descriptive statistics
Cronbach's alpha was used to compute the reliabilities of the legacy measures at baseline, which were .86 for the CESD and .81 for the PHQ-9. For measures derived from IRT models, test information (and its converse, standard error, SE) varies along the spectrum of severity of the construct being assessed. The reliability of PROMIS depression was .92 when calculated aswhere SEbaseline is the median of the SE of PROMIS depression in a range from −3 to +3
Discussion
We report here on a prospective observational study with depressed outpatients in the early stages of a new treatment episode which was designed to compare the psychometric properties of the PROMIS depression item bank (administered as a CAT) with two legacy self-report instruments: the CESD and the PHQ-9. The study allowed us to examine the psychometric properties of the measures (frequency distributions, reliabilities), their convergent validity (correlations, linkages to commonly used
Role of funding source
PROMIS® was funded with cooperative agreements from the National Institutes of Health (NIH) Common Fund Initiative (Northwestern University, PI: David Cella, PhD, U54AR057951, U01AR052177; Northwestern University, PI: Richard C. Gershon, PhD, U54AR057943; American Institutes for Research, PI: Susan (San) D. Keller, PhD, U54AR057926; State University of New York, Stony Brook, PIs: Joan E. Broderick, PhD and Arthur A. Stone, PhD, U01AR057948, U01AR052170; University of Washington, Seattle, PIs:
Contributors
Paul A. Pilkonis, PhD, contributed to study conception and design and took responsibility for drafting the manuscript. Lan Yu, PhD, provided data analysis and interpretation. Nathan E. Dodds, BS, Kelly L. Johnston, MPH, Catherine C. Maihoefer, MS, LPC, and Suzanne M. Lawrence, MS, contributed to study implementation (preparation of the protocol in the PROMIS Assessment Center; recruitment, testing, and interviewing of participants) and manuscript preparation (literature reviews, preparation of
Conflict of interest
There are no conflicts of interest for any authors.
Acknowledgments
We acknowledge the contributions of our colleagues in Behavioral Health Services at the DuBois (PA) Regional Medical Center, who assisted in the identification and assessment of patients: Scott Turkin, MD, DFAPA; Michelle L. Hetrick, MA, NCC, LPC; Betsy Lingle, BS; and Sherry L. Murphy, MN, CNS. Angela Stover, MA, a former program coordinator at the University of Pittsburgh, was instrumental in study implementation and data collection activities in the early stages of the project. Ms. Stover is
References (25)
- et al.
The Patient-Reported Outcomes Measurement Information System (PROMIS®) developed and tested its first wave of adult self-reported outcome item banks: 2005–2008
J Clin Epidemiol
(2010) - et al.
Standardization of the depression screener patient health questionnaire (PHQ-9) in the general population
Gen Hosp Psychiatry
(2013) - et al.
Development and psychometric analysis of the PROMIS® pain behavior item bank
Pain
(2009) - et al.
An inventory for measuring depression
Arch Gen Psychiatry
(1961) - et al.
Development and validation of patient-reported outcome measure for sleep disturbance and sleep-related impairments
Sleep
(2010) - et al.
The future of outcomes measurement: item banking, tailored short forms, and computerized adaptive assessment
Qual Life Res
(2007) - et al.
The Patient-Reported Outcomes Measurement Information System (PROMIS®): progress of an NIH Roadmap cooperative group during its first two years
Med Care
(2007) - et al.(2012)
- et al.
PROsetta stone analysis report: a Rosetta stone for Patient Reported Outcomes: PROMIS depression and CES-D
(2013) - et al.
PROsetta stone analysis report: a Rosetta stone for Patient Reported Outcomes: PROMIS depression and PHQ-9
(2013)
Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms
Qual Life Res
Percentile norms and accompanying interval estimates from an Australian general adult population sample for self-report mood scales (BAI, BDI, CRSD, CES-D, DASS, DASS-21, STAI-X, STAI-Y, SRDS, and SRAS)
Aust Psychol
Cited by (264)
Depression Scores Decrease After Hip Arthroscopy for Femoroacetabular Impingement Syndrome
2024, Arthroscopy, Sports Medicine, and RehabilitationNew Dizziness Impact Measures of Positional, Functional, and Emotional Status Were Supported for Reliability, Validity, and Efficiency
2024, Archives of Rehabilitation Research and Clinical TranslationProtocol for a randomized controlled trial of brief behavioral activation among older adult cancer survivors
2024, Journal of Geriatric Oncology