Establishing a common metric for self-reported anxiety: Linking the MASQ, PANAS, and GAD-7 to PROMIS Anxiety

https://doi.org/10.1016/j.janxdis.2013.11.006Get rights and content

Highlights

  • We produced cross-walk tables linking three popular instruments to PROMIS Anxiety.

  • The scores of our common measure (PROMIS Anxiety) are centered on the 2000 US census.

  • Users can directly compare clinical scores obtained on multiple measures of anxiety.

  • Clinical means or cut-off scores were close to one SD above the population mean.

Abstract

Researchers and clinicians wishing to assess anxiety must choose from among numerous assessment options, many of which purport to measure the same or a similar construct. A common reporting metric would have great value and can be achieved when similar instruments are administered to a single sample and then linked to each other to produce cross-walk score tables. Using item response theory (IRT), we produced cross-walk tables linking three popular “legacy” anxiety instruments – MASQ (N = 743), GAD-7 (N = 748), and PANAS (N = 1120) – to the anxiety metric of the NIH Patient Reported Outcomes Measurement Information System (PROMIS®). The linking relationships were evaluated by resampling small subsets and estimating confidence intervals for the differences between the observed and linked PROMIS scores. Our results allow clinical researchers to retrofit existing data of three commonly used anxiety measures to the PROMIS Anxiety metric and to compare clinical cut-off scores.

Introduction

Researchers and clinicians wishing to assess anxiety in a clinical or community population must choose from among numerous assessment options, many of which purport to measure the same or a similar construct (Harrington and Antony, 2008, Roemer, 2002). A recent investigation found 92 empirically based anxiety questionnaires (McHugh, Rasmussen, & Otto, 2011). In choosing a questionnaire, users need to evaluate a number of issues, including the reported reliability and validity estimates of the instrument, the reading level required, the cost of the instrument, and whether the length of the instrument would unduly burden the patient/participant. Another important consideration is score comparability, i.e., whether a report of the scores will be useful to others in the field. That is, can the results obtained using instrument X in one set of studies be compared to results obtained using instrument Y in other studies? This concern may lead investigators to choose the most “popular” instrument, which may not always be the best instrument.

Self-report instruments typically are scored by summing or averaging individual item responses, leading to different score ranges for different instruments. This makes it difficult to compare the results across studies with different measures. Absent some method for aligning scores, one cannot know, for example, whether a mean summed MASQ score of 20 for a group of mental health outpatients falls above or below the case-defining score of 10 on the GAD-7 (Spitzer et al., 2006, Watson and Clark, 1999). One possible solution is to transform results to percentile rank scores or standardized scores, but this approach can be problematic, because these scores types are highly sensitive to sample characteristics such as restricted range (Baguley, 2009). If we convert our scores into percent of total maximum scores (Cohen, Cohen, Aiken, & West, 1999), the scores become independent of the particular sample, but not necessarily comparable across instruments, as the instruments may differ in their range of coverage.

The lack of standardized measurement among patient-reported outcome instruments was an impetus for the National Institutes of Health (NIH) Patient Reported Outcomes Measurement Information System (PROMIS®; Cella et al., 2010). Adapting the World Health Organization's (2007) tripartite framework of physical, mental, and social health, PROMIS researchers developed and calibrated multiple item banks (Buysse et al., 2010, Cella et al., 2007, Cella et al., 2010, Fries et al., 2009, Revicki et al., 2009), including one for measuring anxiety symptoms (Pilkonis et al., 2011). The PROMIS Anxiety bank – comprising 29 items that include fear, anxious misery, and hyperarousal – can be administered as a brief computer adaptive test (CAT), an 8-item short form, or as an alternate subset of bank items that suit the investigator's needs (Cella et al., 2010, Pilkonis et al., 2011).

The Diagnostic and Statistical Manual of Mental Disorders-Fifth Edition (DSM-5) working groups have incorporated PROMIS items into their “review of systems” assessment (Narrow et al., 2013) and have recommend some PROMIS instruments as expanded modules (Kuhl, Kupfer, & Regier, 2011). As a result, it will be of interest to clinicians and researchers to document individual and grouped patient data in terms of PROMIS scores. However, some will continue to use instruments developed before PROMIS, and others will develop new instruments. Thus, there would be great value in having a common metric that associates PROMIS scores with scores from scales that measure the same or highly similar concepts (referred to hereafter as “legacy measures”). To create such a metric, we set out to “link” the scores from legacy measures to the PROMIS metric by establishing the mathematical relationships between legacy and PROMIS scores. Once scores are linked to a common metric, a cross-walk table can be constructed that associates scores from one measure to corresponding scores on another.

The PROMIS metric uses the T-score, which is standardized with respect to mean (50) and standard deviation (10), centered around the US general population, matching the marginal distributions of gender, age, race, marital status, income, and education in the 2000 US census (Liu et al., 2010). Thus, a PROMIS Anxiety T-score of 60 can be interpreted as being one standard deviation higher (worse) than the “average person” in the US.

In this report, we present the results of studies linking scores of anxiety instruments under the PROsetta Stone® project. A detailed overview of linking, the PROsetta Stone methodology, and sample descriptions may be found elsewhere (Choi, Schalet, Cook, & Cella, in press; see also Dorans, 2007). We linked scores from three legacy measures of general anxiety to the PROMIS Anxiety metric: the Mood and Anxiety Symptom Questionnaire (MASQ; Watson and Clark, 1991, Watson et al., 1995), the Generalized Anxiety Disorder Scale (GAD-7; Spitzer et al., 2006), and the Positive Affect and Negative Affect Schedule (PANAS; Watson, Clark, & Tellegen, 1988). Briefly, the PROsetta Stone methodology applies multiple linking methods, including those based in item response theory (IRT), as well as more traditional equipercentile methods (Lord, 1982). Such a multi-method approach is recommended by Kolen and Brennan (2004), as it serves to ensure that any violations of assumptions do not distort the results. Using a single-group design (wherein each respondent answers questions on both the legacy and the PROMIS instrument), we test the accuracy of the each linking method by comparing the actual PROMIS scores with those obtained by linking. To evaluate bias and standard errors of the different linking methods, we apply a resampling analysis such that small subsets of cases (25, 50, and 75) are randomly drawn with replacement over 10,000 replications. For each replication, the mean difference between the actual and linked PROMIS score can be computed, allowing for an estimate of the confidence interval associated with linking, as a function of sample size (Choi et al., in press).

Section snippets

PROMIS Anxiety

The PROMIS Anxiety bank consists of 29 items with a 7-day time frame and a 5-point rating scale that ranges from 1 (“Never”) to 5 (“Always”) (Cella et al., 2010, Pilkonis et al., 2011). The item bank was developed using comprehensive mixed (qualitative and quantitative) methods (DeWalt et al., 2007, Kelly et al., 2011), and focuses on fear (e.g., fearfulness, feelings of panic), anxious misery (e.g., worry, dread), hyperarousal (e.g., tension, nervousness, restlessness), and some somatic

Item content overlap

Inspection of item content revealed substantial overlap between the PROMIS and legacy measures. On the MASQ-GA, seven out of ten items described feelings of fear, unease, and tension that corresponded to the content of PROMIS items. However, three items on the MASQ-GA describe specific somatic symptoms (diarrhea, lump in throat, upset stomach) not included in the PROMIS item banks. For the GAD-7, six items clearly correspond to the content coverage of PROMIS items; one item (irritability),

Discussion

This paper represents the first effort to link multiple measures of anxiety to the PROMIS metric. Although investigators of patient-reported outcomes have linked measures in a number of domains – including depression (Choi et al., in press, Fischer et al., 2011), fatigue (Noonan et al., 2012, Holzner et al., 2006), and pain (Chen, Revicki, Lai, Cook, & Amtmann, 2009) – we are unaware of any study that has linked anxiety instruments. This work has resulted in several useful products: three

Acknowledgements

This research was part of the PROsetta Stone® project, which was funded by the National Institutes of Health/National Cancer Institute grant RC4CA157236 (David Cella, PI). For more information on PROsetta Stone, please see www.prosettastone.org. We would like to thank Joshua Rutsohn and Helena Correia for their help in the preparation of this manuscript.

References (64)

  • R. Brennan

    Linking with Equivalent Group or Single Group Design (LEGS) (Version 2.0) [Computer software]

    (2004)
  • J.A. Suckby et al.

    Clinical utility of the Mood and Anxiety Symptom Questionnaire (MASQ-GA) in a sample of young help-seekers

    BMC Psychiatry

    (2007)
  • D.J. Buysse et al.

    Development and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments

    Sleep

    (2010)
  • L. Cai et al.

    IRTPRO 2.01

    (2011)
  • D. Cella et al.

    A novel IRT-based case-ranking approach to derive expert standards for symptom severity

  • D. Cella et al.

    The Patient-Reported Outcomes Measurement Information System (PROMIS): Progress of an NIH roadmap cooperative group during its first two years

    Medical Care

    (2007)
  • S.W. Choi

    Firestar: computerized adaptive testing simulation program for polytomous IRT models

    Applied Psychological Measurement

    (2009)
  • S.W. Choi et al.

    Efficiency of static and computer adaptive short forms compared to full length measures of depressive symptoms

    Quality of Life Research

    (2010)
  • Choi, S. W., Schalet, B. D., Cook, K. F., & Cella, D. (in press). Establishing a common metric for depressive symptoms:...
  • P. Cohen et al.

    The problem of units and the circumstance for POMP

    Multivariate Behavioral Research

    (1999)
  • D.A. DeWalt et al.

    Evaluation of item candidates: the PROMIS qualitative item review

    Medical Care

    (2007)
  • N.J. Dorans

    Equating, concordance, and expectation

    Applied Psychological Measurement

    (2004)
  • N.J. Dorans

    Linking scores from multiple health outcome instruments

    Quality of Life Research

    (2007)
  • N.J. Dorans et al.

    Population invariance and the equatability of tests: Basic theory and the linear case

    Journal of Educational Measurement

    (2000)
  • J.F. Fries et al.

    Progress in assessing physical function in arthritis: PROMIS short forms and computerized adaptive testing

    Journal of Rheumatology

    (2009)
  • H. Fischer et al.

    How to compare scores from different depression scales: Equating the Patient Health Questionnaire (PHQ) and the ICD-10-Symptom Rating (ISR) using item response theory

    International Journal of Methods in Psychiatric Research

    (2011)
  • R.C. Gershon et al.

    The use of PROMIS and Assessment Center to deliver patient-reported outcome measures in clinical research

    Journal of Applied Measurement

    (2010)
  • B.F. Grant et al.

    Nicotine dependence and psychiatric disorders in the United States: Results from the national epidemiologic survey on alcohol and related conditions

    Archives of General Psychiatry

    (2004)
  • T. Haebara

    Equating logistic ability scales by a weighted least squares method

    Japanese Psychological Research

    (1980)
  • J.L. Harrington et al.

    Assessment of anxiety disorders

    Oxford Handbook of Anxiety and Related Disorders

    (2008)
  • C.J. Hopwood et al.

    How should the internal structure of personality inventories be evaluated?

    Personality and Social Psychology Review

    (2010)
  • M.A.R. Kelly et al.

    Describing depression: Congruence between patient experiences and clinical assessments

    British Journal of Clinical Psychology

    (2011)
  • Cited by (0)

    View full text