Validation of the depression item bank from the Patient-Reported Outcomes Measurement Information System (PROMIS®) in a three-month observational study

doi:10.1016/j.jpsychires.2014.05.010

Journal of Psychiatric Research

Volume 56, September 2014, Pages 112-119

https://doi.org/10.1016/j.jpsychires.2014.05.010 Get rights and content

Highlights

•
PROMIS Depression demonstrates strong convergent validity with the CESD and PHQ-9.
•
A PROMIS score of 60 suggests depression of some clinical significance.
•
PROMIS scores are more normally distributed than those from the other 2 measures.
•
PROMIS Depression and the CESD classify more patients as recovered than the PHQ-9.
•
The PROMIS computerized adaptive test for depression requires a median of 4 items.

Abstract

The Patient-Reported Outcomes Measurement Information System (PROMIS^®) is an NIH Roadmap initiative devoted to developing better measurement tools for assessing constructs relevant to the clinical investigation and treatment of all diseases—constructs such as pain, fatigue, emotional distress, sleep, physical functioning, and social participation. Following creation of item banks for these constructs, our priority has been to validate them, most often in short-term observational studies. We report here on a three-month prospective observational study with depressed outpatients in the early stages of a new treatment episode (with assessments at intake, one-month follow-up, and three-month follow-up). The protocol was designed to compare the psychometric properties of the PROMIS depression item bank (administered as a computerized adaptive test, CAT) with two legacy self-report instruments: the Center for Epidemiological Studies Depression scale (CESD; Radloff, 1977) and the Patient Health Questionnaire (PHQ-9; Spitzer et al., 1999). PROMIS depression demonstrated strong convergent validity with the CESD and the PHQ-9 (with correlations in a range from .72 to .84 across all time points), as well as responsiveness to change when characterizing symptom severity in a clinical outpatient sample. Identification of patients as “recovered” varied across the measures, with the PHQ-9 being the most conservative. The use of calibrations based on models from item response theory (IRT) provides advantages for PROMIS depression both psychometrically (creating the possibility of adaptive testing, providing a broader effective range of measurement, and generating greater precision) and practically (these psychometric advantages can be achieved with fewer items—a median of 4 items administered by CAT—resulting in less patient burden).

Introduction

The use of models from IRT to calibrate items not only results in greater precision at the item and test levels but also promotes greater flexibility in test administration. For example, items can be administered as computerized adaptive tests (CATs), or static short forms can be created and tailored for samples with different levels of severity of the construct being assessed. Analyses of potential differential item functioning due to gender, age, and educational attainment were performed during the development of the item banks to ensure that items performed comparably regardless of variations in these background characteristics. In general, experience with CAT suggests that the PROMIS depression item bank provides excellent precision with 4–6 items (Choi et al., 2010). A generic 8-item short form is also available, and this short form was one of the cross-cutting dimensional measures used in the DSM-5 field trials, where its feasibility was established and where it performed well with regard to test-retest reliability (Narrow et al., 2013). Following creation of the item banks, our priority has been to validate them, most often in short-term observational studies. These studies allow us to examine the psychometric properties of the item banks, their responsiveness to change, their relationships to clinically significant benchmarks of improvement, and their similarities and differences when compared with other commonly used instruments.

We report here on a prospective observational study with depressed outpatients in the early stages of a new treatment episode. For this purpose, all participants completed study assessments at three points: baseline (T1, as close to the beginning of treatment as possible but no later than four months after its start), one month following baseline (T2), and three months following baseline (T3). The protocol was designed to compare the psychometric properties of the PROMIS depression item bank (administered as a CAT) with two legacy self-report instruments: the Center for Epidemiological Studies Depression scale (CESD; Radloff, 1977) and the Patient Health Questionnaire (PHQ-9; Spitzer et al., 1999).

The study was not intended to evaluate treatment effectiveness. Rather, the main consideration was to conduct a study involving established treatments that would allow us to investigate the operating characteristics of the different measures of depression over a time frame (three months) consistent with the design of clinical trials and comparative effectiveness research. Regardless of their impact in the aggregate, treatments for depression generate considerable variability in individual outcomes, and this variability was desirable for examining psychometric issues. In our setting, the most common form of outpatient treatment for depression is a combination of antidepressant medication and supportive psychotherapy (both individual and group therapies), with smaller proportions of patients receiving medication only or psychotherapy only. No untreated or control group was included.

There have been other attempts to link PROMIS depression to legacy measures for depression. The PROsetta Stone project (Choi et al., 2012) was designed specifically to create “cross-walks” between PROMIS measures in several domains and commonly used measures (most often developed using classical test theory) in those same domains. A PROsetta Stone report provides a conversion table from raw CESD scores to PROMIS depression scores (Choi et al., 2013a). The PROMIS depression equivalent for the CESD threshold of 16 is 56.2; for the CESD threshold of 21, it is 59.1. (Note that PROMIS depression is scored with a T-score metric in which the mean of the general population is 50, with a standard deviation of 10.)

Another PROsetta Stone report provides a conversion table from raw PHQ-9 scores to PROMIS depression scores (Choi et al., 2013b). The PROMIS depression equivalent for the PHQ-9 threshold of 5 (mild depression) is 52.5; for the threshold of 10 (moderate depression), 59.9; for the threshold of 15 (moderately severe depression), 65.8; and for the threshold of 20 (severe depression), 71.5. Gibbons et al. (2011) also reported analyses linking PROMIS depression and the PHQ-9 in a sample of HIV patients. Their results were generally comparable to the PROsetta Stone linkages. However, there was some discrepancy at the mild end of the PHQ-9 where they found rather low PROMIS depression scores to be equivalent: “Mild depression (PHQ-9 score of 5–9) corresponds to scores of 42–51 on the PROMIS metric, moderate depression [10–14] to 52–63, moderately severe [15–19] to 64–72, and severe [20+] to scores of 73 and higher” (figure caption, p. 1353). In general, thresholds suggesting depression of some clinical significance (CESD = 21, PHQ-9 = 10) have been linked to a PROMIS score of about 60, the usual threshold used clinically with the T-score metric (1 SD above the mean).

Finally, in a study using two different IRT linking methods, Olino et al. (2013) compared the Beck Depression Inventory (Beck et al., 1961) the CESD, and the PROMIS depression item banks in a community sample of adolescents. Among the three measures, PROMIS depression provided information over the widest range of symptom severity while demonstrating the highest level of precision. This result was especially true for the full PROMIS depression item bank of 28 items, but it also applied to the PROMIS depression short form of 8 items, which is considerably briefer than either the BDI or the CESD.

Section snippets

Inclusion criteria

Men and women 18 years and older who were able to read and understand English and able and willing to give informed consent were enrolled in the protocol. They were required to be within the first four months of outpatient treatment for major depressive disorder (MDD) at Western Psychiatric Institute and Clinic (WPIC) and its affiliates. To ensure that participants were not too close to the floor for depression when beginning the protocol (and thus unable to show further change), we required a

Descriptive statistics

Cronbach's alpha was used to compute the reliabilities of the legacy measures at baseline, which were .86 for the CESD and .81 for the PHQ-9. For measures derived from IRT models, test information (and its converse, standard error, SE) varies along the spectrum of severity of the construct being assessed. The reliability of PROMIS depression was .92 when calculated as $Reliability = 1 - \frac{S E_{b a s e l i n e}^{2}}{S D_{b a s e l i n e}^{2}}$ where SE_baseline is the median of the SE of PROMIS depression in a range from −3 to +3

Discussion

We report here on a prospective observational study with depressed outpatients in the early stages of a new treatment episode which was designed to compare the psychometric properties of the PROMIS depression item bank (administered as a CAT) with two legacy self-report instruments: the CESD and the PHQ-9. The study allowed us to examine the psychometric properties of the measures (frequency distributions, reliabilities), their convergent validity (correlations, linkages to commonly used

Role of funding source

PROMIS^® was funded with cooperative agreements from the National Institutes of Health (NIH) Common Fund Initiative (Northwestern University, PI: David Cella, PhD, U54AR057951, U01AR052177; Northwestern University, PI: Richard C. Gershon, PhD, U54AR057943; American Institutes for Research, PI: Susan (San) D. Keller, PhD, U54AR057926; State University of New York, Stony Brook, PIs: Joan E. Broderick, PhD and Arthur A. Stone, PhD, U01AR057948, U01AR052170; University of Washington, Seattle, PIs:

Contributors

Paul A. Pilkonis, PhD, contributed to study conception and design and took responsibility for drafting the manuscript. Lan Yu, PhD, provided data analysis and interpretation. Nathan E. Dodds, BS, Kelly L. Johnston, MPH, Catherine C. Maihoefer, MS, LPC, and Suzanne M. Lawrence, MS, contributed to study implementation (preparation of the protocol in the PROMIS Assessment Center; recruitment, testing, and interviewing of participants) and manuscript preparation (literature reviews, preparation of

Conflict of interest

There are no conflicts of interest for any authors.

Acknowledgments

We acknowledge the contributions of our colleagues in Behavioral Health Services at the DuBois (PA) Regional Medical Center, who assisted in the identification and assessment of patients: Scott Turkin, MD, DFAPA; Michelle L. Hetrick, MA, NCC, LPC; Betsy Lingle, BS; and Sherry L. Murphy, MN, CNS. Angela Stover, MA, a former program coordinator at the University of Pittsburgh, was instrumental in study implementation and data collection activities in the early stages of the project. Ms. Stover is

References (25)

D. Cella et al.
The Patient-Reported Outcomes Measurement Information System (PROMIS^®) developed and tested its first wave of adult self-reported outcome item banks: 2005–2008
J Clin Epidemiol
(2010)
R.D. Kocalevent et al.
Standardization of the depression screener patient health questionnaire (PHQ-9) in the general population
Gen Hosp Psychiatry
(2013)
D. Revicki et al.
Development and psychometric analysis of the PROMIS^® pain behavior item bank
Pain
(2009)
A.T. Beck et al.
An inventory for measuring depression
Arch Gen Psychiatry
(1961)
D.J. Buysse et al.
Development and validation of patient-reported outcome measure for sleep disturbance and sleep-related impairments
Sleep
(2010)
D. Cella et al.
The future of outcomes measurement: item banking, tailored short forms, and computerized adaptive assessment
Qual Life Res
(2007)
D. Cella et al.
The Patient-Reported Outcomes Measurement Information System (PROMIS^®): progress of an NIH Roadmap cooperative group during its first two years
Med Care
(2007)
S.W. Choi et al.
(2012)
S.W. Choi et al.
PROsetta stone analysis report: a Rosetta stone for Patient Reported Outcomes: PROMIS depression and CES-D
(2013)
S.W. Choi et al.
PROsetta stone analysis report: a Rosetta stone for Patient Reported Outcomes: PROMIS depression and PHQ-9
(2013)

S.W. Choi et al.

Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms

Qual Life Res

(2010)

J. Crawford et al.

Percentile norms and accompanying interval estimates from an Australian general adult population sample for self-report mood scales (BAI, BDI, CRSD, CES-D, DASS, DASS-21, STAI-X, STAI-Y, SRDS, and SRAS)

Aust Psychol

(2011)

Cited by (264)

Depression Scores Decrease After Hip Arthroscopy for Femoroacetabular Impingement Syndrome
2024, Arthroscopy, Sports Medicine, and Rehabilitation
To evaluate clinical depression scores and functional outcomes following arthroscopic treatment of femoroacetabular impingement syndrome in patients with elevated preoperative depressive symptoms as defined by Patient-Reported Outcomes Measurement Information System for Depression (PROMIS-D).
Patients with femoroacetabular impingement syndrome completed the PROMIS-D Computer Adaptive Test and additional patient-reported outcome (PRO) measures preoperatively and at the time of postoperative visits. Patients were categorized into preoperative clinically depressed (CD) and nonclinically depressed (NCD) groups based on preoperative PROMIS-D scores. Scores ≥55 correlate to mild clinical depression, and this cutoff was used to determine preoperative depression status. PROMIS-D scores and functional outcome scores were assessed at 6 months and a minimum of 1-year postoperatively.
In total, 100 patients were included with complete PROs at a minimum of 1-year follow-up. Of those included, 21 (21%) were categorized with preoperative CD. There were no differences in demographic or radiographic variables between the preoperative CD and NCD groups. At 6 months and 12 months postoperatively, the percentage of patients in the preoperative CD group with continued depression was 33.3% and 23.8%, respectively. Overall, 1-year change in PROMIS-D score for the CD group was –9.1 versus –0.8 in the NCD group (P = .001). There was no significant difference in rates of patients achieving patient acceptable symptom state between the preoperative CD and NCD groups.
Patients with symptoms of preoperative CD, as defined by the PROMIS-D score, demonstrated significant improvement in depressive symptoms following hip arthroscopy. In addition, patients with CD preoperatively did not show decreased rates of achieving minimum clinically important difference or patient acceptable symptom state on postoperative PROs compared with patients with NCD.
Level IV, therapeutic case series.
New Dizziness Impact Measures of Positional, Functional, and Emotional Status Were Supported for Reliability, Validity, and Efficiency
2024, Archives of Rehabilitation Research and Clinical Translation
To calibrate the 25 items from the Dizziness Handicap Inventory (DHI) patient-reported outcome measure (PROM), using item response theory (IRT), into 1 or more item banks, and assess reliability, validity, and administration efficiency of scores derived from computerized adaptive test (CAT) or short form (SF) administration modes.
Retrospective cohort study.
Outpatient rehabilitation clinics.
Patients (N=28,815; women=69%; mean age [SD]=60 [18]) included in a large national dataset and assessed for dizziness-related conditions who responded to all DHI items at intake.
Not applicable.
IRT model assumptions of unidimensionality, local item independence, item fit, and presence of differential item functioning (DIF) were evaluated. Generated scores were assessed for reliability, validity, and administration efficiency.
Patients were treated in 976 clinics from 49 US states for either vestibular-, brain injury-, or neck-related impairments. Three unidimensional item banks were calibrated, creating 3 distinct PROMs for Dizziness Functional Status (DFS, 13 items), Dizziness Positional Status (DPS, 4 items), and Dizziness Emotional Status (DES, 6 items). Two items did not fit into any domain. A DFS-CAT and a DFS 7-item SF were developed. Except for 2 items by age groups and 1 item by main impairment, no items were flagged for DIF; DIF impact was negligible. Median reliability estimates were 0.91, 0.72, and 0.79 for the DFS, DPS, and DES, respectively. Scores discriminated between patient groups in clinically logical ways and had a large effect size (>0.8), with acceptable floor and ceiling effects (<15%), except for a floor effect for DPS (20.4%). DFS-CAT scores were generated using a median of 8 items; they correlated highly with full-bank scores (r=0.99).
The 3 dizziness impact PROMs demonstrated moderate to high reliability, were valid, and highly responsive to change; thus, they are suitable for research and routine clinical administration.
Design of a Multicenter Randomized Controlled Trial comparing the effectiveness of shared decision making versus motivational interviewing plus cognitive behavioral therapy for voluntary opioid tapering: The INSPIRE study protocol
2024, Contemporary Clinical Trials
This paper describes the design and protocol of a pragmatic, randomized trial to evaluate the comparative effectiveness of shared decision making versus motivational interviewing plus cognitive behavioral therapy for chronic pain for the voluntary tapering of opioid dose in adults with chronic noncancer pain. Integrated Services for Pain: Interventions to Reduce Pain Effectively (INSPIRE) is a multicenter, randomized trial conducted at three academic health centers in the southeastern United States. Participants are adults receiving long-term opioid therapy of at least 20 morphine milligram equivalents daily for chronic noncancer pain.
Participants were randomized to either the shared decision-making intervention or the motivational interviewing session and cognitive behavioral therapy for chronic pain intervention. All participants also received guideline-concordant care supporting opioid pharmacotherapy. The primary outcome was change from baseline in average daily prescribed opioid dose at 12 months, using prescribing data from electronic health records. Secondary outcomes were Patient-Reported Outcomes Measurement Information System Pain Interference and Physical Function at 12 months.
This trial evaluates the comparative effectiveness of shared decision making versus motivational interviewing plus cognitive behavioral therapy for chronic pain for the voluntary tapering of opioid dose in adults with chronic noncancer pain. Results from this study can guide clinicians, researchers, and policymakers as they seek to reduce opioid prescribing and improve management of chronic pain.
ClinicalTrials.gov Identifier: NCT03454555 (https://clinicaltrials.gov/ct2/show/record/NCT03454555). Participant enrollment began on June 26, 2019.
Percutaneous Electrical Nerve Field Stimulation in Children and Adolescents With Functional Dyspepsia—Integrating a Behavioral Intervention
2024, Neuromodulation
Functional dyspepsia (FD) includes postprandial distress and epigastric pain syndrome. Percutaneous electrical nerve field stimulation (PENFS) in addition to behavioral interventions (BI) has shown benefits in children with functional abdominal pain but not specifically in FD. We aimed to assess the efficacy of PENFS for treating FD and compare the outcomes with those who received the combination of PENFS + BI.
Charts of patients with FD who completed four weeks of PENFS were evaluated. A subset of patients received concurrent BI. Demographic data, medical history, and symptoms were documented. Outcomes at different time points included subjective symptom responses and validated questionnaires collected clinically (Abdominal Pain Index [API], Nausea Severity Scale [NSS], Functional Disability Inventory [FDI], Pittsburgh Sleep Quality Index [PSQI], Children’s Somatic Symptoms Inventory [CSSI], Patient-Reported Outcomes Measurement Information Systems [PROMIS] Pediatric Anxiety and Depression scales).
Of 84 patients, 61% received PENFS + BI, and 39% received PENFS alone. In the entire cohort, API (p < 0.0001), NSS (p = 0.001), FDI (p = 0.001), CSSI (p < 0.0001), PSQI (p = 0.01), PROMIS anxiety (p = 0.02), and depression (p = 0.01) scores improved from baseline to three weeks and at three months. Subjective responses showed nausea improvement (p = 0.01) and a trend for improvement in abdominal pain (p = 0.07) at week three. Abdominal pain subjectively improved at week three and three months (p = 0.003 and 0.02, respectively), nausea at week three and three months (p = 0.01 and 0.04, respectively), and a trend for improvement in sleep disturbances at week three and three months (p = 0.08 and p = 0.07, respectively) in the PENFS + BI group vs PENFS alone.
Abdominal pain, nausea, functioning, somatization, sleep disturbances, anxiety, and depression improved at three weeks and three months after PENFS in pediatric FD. Subjective pain and nausea improvement were greater in the PENFS + BI group than in the group with PENFS alone, suggesting an additive effect of psychologic therapy.
Protocol for a randomized controlled trial of brief behavioral activation among older adult cancer survivors
2024, Journal of Geriatric Oncology
As many as 35% of older adult cancer survivors (OACS; i.e., ≥65 years old) have clinically significant depression. OACS often experience fatigue, mild cognitive impairment, and increased medical comorbidities post-cancer that make them susceptible to depression. Behavioral activation (BA) is an empirically supported depression treatment in geriatric psychiatry that guides individuals to reengage in pleasurable and rewarding activities and has great potential for addressing the needs of OACS. This manuscript presents the protocol for a pilot randomized controlled trial (RCT) testing the efficacy of a brief BA intervention adapted to address the needs of OACS (BBA-OACS) by telephone and videoconference delivery.
An RCT will be conducted at Memorial Sloan Kettering Cancer Center (MSK) in New York City. Participants will be randomized to either BA as a target intervention or supportive psychotherapy (SP) as a standard of care control intervention for outpatient oncology. The target intervention includes 10 weekly sessions of BA consisting of psychoeducation about depression and the rationale for BA, life areas and values assessment, compilation of a list of enjoyable and important activities across values, activity scheduling, and self-monitoring of satisfaction and mood. The standard of care control intervention includes 10 weekly sessions of SP consisting of reassurance, guidance, encouragement, and support for patients with cancer. OACS who have a history of cancer, report elevated depressive symptoms, are fluent in English, and can communicate via telephone or videoconference will be recruited from the MSK Survivorship Clinics across all disease types. Seventy participants will be recruited for the study (10 training cases, 30 in each RCT arm). The primary aim is to evaluate implementation outcomes (i.e., acceptability, feasibility, and fidelity) of BA, relative to SP, for cancer survivorship. The secondary aim is to determine the preliminary effects of BA on depressive symptoms (primary outcome), anxiety, coping, and increased activity level (secondary outcomes) compared to SP. Participants will be asked to complete a set of three surveys pre- and post-intervention.
If successful, BBA-OACS would provide frontline clinicians with an accessible, evidence-based treatment for OACS. Future research will evaluate the efficacy of BA in a larger trial and its impact on depression and other healthcare outcomes.
This study is registered under ClinicalTrials.gov (ID NCT05574127).
Prevalence of heavy menstrual bleeding and associations with physical health and wellbeing in low-income and middle-income countries: a multinational cross-sectional study
2023, The Lancet Global Health
Data on the prevalence of heavy menstrual bleeding in low-income and middle-income countries (LMICs) are scarce. We aimed to assess the validity of a scale to measure heavy menstrual bleeding and calculate its prevalence in southern Asia and sub-Saharan Africa, and to examine associations between heavy menstrual bleeding and health outcomes.
Between Aug 2, 2021 and June 14, 2022, we surveyed 6626 women across ten cities (Meherpur and Saidpur, Bangladesh; Warangal, Narsapur, and Tiruchirappalli, India; Kathmandu, Nepal; Dakar, Senegal; Nairobi, Kenya; Kampala, Uganda; and Lusaka, Zambia), including questions on demographics, health, and the SAMANTA scale, a six-item measure of heavy menstrual bleeding. We conducted confirmatory factor analysis to assess construct validity of the SAMANTA scale, calculated the prevalence of heavy menstrual bleeding, and used regression analyses to examine associations of heavy menstrual bleeding with health outcomes.
4828 women were included in the final analytic sample. Factor analysis indicated a one-factor model representing heavy menstrual bleeding. In the pooled analytic sample, 2344 (48·6%) of 4828 women were classified as experiencing heavy menstrual bleeding, and the prevalence was lowest in Dakar (126 [38·3%] of 329 women) and Kampala (158 [38·4%] of 411 women) and highest in Kathmandu (326 [77·6%] of 420 women). Experiencing heavy menstrual bleeding was significantly associated with feeling tired or short of breath during the menstrual period (risk ratio 4·12 (95% CI 3·45 to 4·94) and reporting worse self-rated physical health (adjusted odds ratio 1·27, 95% CI 1·08 to 1·51), but was not associated with subjective wellbeing (β –3·34, 95% CI –7·04 to 0·37).
Heavy menstrual bleeding is highly prevalent and adversely impacts quality of life in women across LMIC settings. Further attention is urgently needed to understand determinants and identify and implement solutions to this problem.
Bill & Melinda Gates Foundation, United States Agency for International Development, National Institutes of Health.

View all citing articles on Scopus

View full text

Validation of the depression item bank from the Patient-Reported Outcomes Measurement Information System (PROMIS®) in a three-month observational study

Highlights

Abstract

Introduction

Section snippets

Inclusion criteria

Descriptive statistics

Discussion

Role of funding source

Contributors

Conflict of interest

Acknowledgments

J Clin Epidemiol

Gen Hosp Psychiatry

Pain

An inventory for measuring depression

Arch Gen Psychiatry

Development and validation of patient-reported outcome measure for sleep disturbance and sleep-related impairments

Sleep

The future of outcomes measurement: item banking, tailored short forms, and computerized adaptive assessment

Qual Life Res

The Patient-Reported Outcomes Measurement Information System (PROMIS®): progress of an NIH Roadmap cooperative group during its first two years

Med Care

PROsetta stone analysis report: a Rosetta stone for Patient Reported Outcomes: PROMIS depression and CES-D

PROsetta stone analysis report: a Rosetta stone for Patient Reported Outcomes: PROMIS depression and PHQ-9

Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms

Qual Life Res

Percentile norms and accompanying interval estimates from an Australian general adult population sample for self-report mood scales (BAI, BDI, CRSD, CES-D, DASS, DASS-21, STAI-X, STAI-Y, SRDS, and SRAS)

Aust Psychol

Validation of the depression item bank from the Patient-Reported Outcomes Measurement Information System (PROMIS^®) in a three-month observational study

The Patient-Reported Outcomes Measurement Information System (PROMIS^®): progress of an NIH Roadmap cooperative group during its first two years