Introduction
Three-quarters of a century ago, the World Health Organization (WHO) proposed that health consists of physical, mental, and social well-being [
1]. Consistent with that, health-related quality (HRQOL) includes physical, mental, and social functioning and well-being [
2,
3]. Generic HRQOL domain scores can be used to compare different diseases or other subgroups, assess interventions, and monitor individual patients [
4,
5]. In addition, aggregates such as the Veterans RAND-36 physical and mental health summary scores provide higher-level summary information [
6].
The Patient-Reported Outcomes Measurement Information System® (PROMIS®)-29 v2.1 is a state-of-the-science HRQOL profile measure [
7]. The PROMIS-29 v2.1 assesses pain intensity using a single 0–10 numeric rating item and 7 health domains (physical function, fatigue, pain interference, depression, anxiety, ability to participate in social roles and activities, and sleep disturbance) using 4 polytomous (5 response categories) items per domain. If a study shows improvement on some scales and decrements in others, it can be difficult to draw an overall conclusion. For example, one treatment might look better than another in physical functioning, but a little worse in pain and anxiety, and not different in the ability to participate in social roles and activities. Is one treatment better than the other? To make concluding statements, it may help to summarize the multiple scale scores. The PROMIS-29 physical and mental health summary scores are weighted combinations of PROMIS-29 scale scores and are more reliable than domain scores and more likely to capture significant individual change [
8,
9].
Wilson and Cleary [
10] hypothesized a causal path from disease and treatment physiology to symptoms, then to functioning, next to general perceptions of health, and finally overall quality of life. General perceptions of health are assessed in PROMIS by 10 global health items: 5 overall rating items (physical function, fatigue, pain, emotional distress, and social health) and 5 general health perceptions items that cut across domains [
11]. Four of the items are used for scoring the PROMIS global physical scale and 4 other items are used for the PROMIS global mental health scale. The PROMIS global physical health scale correlated most strongly with a computer adaptive test administration of the PROMIS physical function domain (r = 0.77) and the PROMIS global mental health scale with a computer adaptive test administration of the PROMIS depression and anxiety domains (r’s of − 0.72 and − 0.68, respectively) in a study of 1102 patients with ischemic and hemorrhagic strokes [
12].
Schalet et al. [
13] linked the PROMIS global health scales and the Veterans RAND-12 physical and mental health summary scores using data from 2025 adults in the Op4g internet panel. However, there are few comparisons of the PROMIS global health scales with the PROMIS-29 summary scores. Because these scores are part of the same measurement system, it might be assumed that they are comparable, but this is an empirical question. Neville et al. [
14] found that the PROMIS global physical health and PROMIS-29 physical health T-scores were similar (45 and 47, respectively), but mental health scores differed (50 and 43, respectively) in a study of patients with severe COVID-19 6 months after a hospital intensive care admission.
While the PROMIS-29 physical health and mental health summary scores and the PROMIS global scales both putatively represent physical and mental health, the items and approach to deriving them differ substantially. Additional information about whether the PROMIS-29 physical and mental health summary scores and the PROMIS global health scales yield similar or different information is needed to provide guidance for their use in future research.
Methods
Samples
We analyzed data from Amazon’s Mechanical Turk (MTurk) and Ipsos’s KnowledgePanel (KP). As noted below, three longitudinal waves of data were analyzed from MTurk and one wave of data from KP. The PROMIS-29 v2.1 and PROMIS global health measures were administered to both samples. The analytic sample excluded those in the MTurk and KP samples who reported having one or both of two fake conditions (“Syndomitis” or “Chekalism”) included on the survey [
15].
MTurk
Data were collected in 2021–2022 from the MTurk internet sample. Eligible study participants had to complete a minimum of 500 previous human intelligence tasks on MTurk with a successful completion rate of at least 95%. A sample of 5,804 adults completed general health questions on the baseline survey. A subset of the sample who on this survey reported currently having back pain (n = 1972) were asked to complete follow-up surveys: 1077 completed a 3-month survey and 845 a 6-month survey.
KP
The survey was also administered once in 2022 to a sample of 4060 adults from KP, an internet probability-based panel designed to represent the general U.S. population.
Measures
The PROMIS-29 v2.1 and PROMIS global health items were administered. The PROMIS-29 physical health summary score is a combination of (in order of largest to smallest weight) physical function, pain, ability to participate in social roles and activities, fatigue, emotional distress, and sleep disturbance; the PROMIS-29 mental health summary score is a combination of (in order of largest to smallest weight) fatigue, emotional distress, ability to participate in social roles and activities, pain, sleep disturbance, and physical function.
The PROMIS global physical health score is estimated from 4 questions: (1) In general, how would you rate your physical health? (2) To what extent are you able to carry out your everyday physical activities? (3) How would you rate your pain on average? and (4) How would you rate your fatigue on average? The PROMIS global mental health score is estimated from 4 other questions: (1) In general, would you say your quality of life is… (2) In general, how would you rate your mental health? (3) In general, how would you rate your satisfaction with social activities and relationships? and (4) How often have you been bothered by emotional problems?
The physical and mental health scores for the PROMIS-29 and PROMIS global physical and mental health measures are scored on a T-score metric (mean = 50 and SD = 10 in the U.S. general population), with a higher score representing better health.
Nine retrospective change items were included in the 3-month follow-up of MTurk sample: All items used “Compared to three months ago” at the beginning. Eight of the items followed with: (1) In general, how is your physical functioning now? (2) In general, how is your ability to participate in social roles and activities now? (3) In general, how is your pain now? (4) In general, how is your fatigue now? (5) In general, how is your mood? (6) In general, how is your thinking (also known as cognition)? (7) In general, how is your sleep now? (8) how would you rate your health in general now? These items were administered using 5 response options (Much better now than three months ago; Somewhat better now than three months ago; About the same; Somewhat worse now than three months ago; Much worse now than three months ago). One retrospective change item included different response options: Compared to three months ago, is your back pain problem… (Much worse; A little worse; About the same; A little better; Moderately better; Much better; Completely gone). We scored each of the 9 items so that a higher score represented a more positive change in health.
Human subjects protection
Study participants in both samples provided electronic consent upon starting the survey. All procedures were reviewed and approved by the research team's Institutional Review Board (RAND Human Subjects Research Committee FWA00003425; IRB00000051).
Analysis plan
We estimate 3-month test–retest reliability estimates for the PROMIS-29 physical and mental health summary scores in the MTurk sample. Then, in the MTurk and KP samples, we provide mean PROMIS-29 physical and mental health summary scores and PROMIS global physical and mental health scores for 21 health conditions and for the overall sample at baseline. Based on prior estimates of the minimally important group difference [
16,
17], we indicate where important differences exist between corresponding measures (PROMIS-29 versus PROMIS global)—that is, differences of 3 T-score points or more.
In addition, we estimate product-moment correlations between the PROMIS-29 v2.1 physical and mental health summary scores and the PROMIS global health physical and mental health scores in MTurk at baseline for the overall sample, and at 3 months later and 6 months later for those with back pain. We report results for the overall KP sample at the single administration. These are presented in the multitrait-multimethod (MTTM) product-moment correlation matrices among the PROMIS scales, with two “traits” (physical and mental health) measured by two methods (PROMIS-29 and PROMIS global). The MTMM matrices are analyzed to evaluate the construct validity of the measures [
18]. Convergent validity is supported if the validity diagonal (“monotrait-heteromethod” correlations) consisting of correlations among measures of the same trait (e.g., physical health) assessed using different methods (e.g., PROMIS-29 v2.1 and PROMIS global health) are large. Discriminant validity is supported if: (1) correlations in the validity diagonal are larger than coefficients in the “heterotrait-heteromethod” and the “heterotrait-monomethod” triangles. We analyzed MTMM correlation matrices using the MTMM.EXE program [
19]. In addition, we estimated correlations among changes in the PROMIS-29 and PROMIS global physical and mental health measures from baseline to 3 months later to see if changes over time in the two traits are similar for each method.
We also computed product-moment correlations between retrospective ratings of changes and changes in the PROMIS-29 and PROMIS global physical and mental health measures. Finally, we examined predictors of the PROMIS-29 and PROMIS global physical and mental health summary scores at the 3-month follow-up to better understand what may underlie any differences in the two sets of physical and mental health scores. We fit ordinary least square regression models that included baseline health, demographic characteristics (age, race/ethnicity, education), and indicators for 21 possible health conditions as right-hand side variables. We used Goodnight maximum R
2 stepwise regression to identify significant independent variables [
20]. This method assesses the effect of switching different variables on the total amount of variance explained. The first variable is selected which produces the largest R
2 value. Once this variable is included in the model, a new variable is added that produces the largest incremental change in R
2. Variables are added (and/or deleted) at each step until the incremental change in the R
2 no longer meets a previously determined level of significance
(p < 0.05) with the addition (and/or deletion) of any new variable, or a specified number of variables that maximize R
2 have been entered.
Discussion
The mean T-scores for the corresponding PROMIS-29 and PROMIS global physical and mental health scales were similar, but the PROMIS global mental health score was lower (worse mental health) than the PROMIS-29 mental health summary score by 3 T-score points in the MTurk sample and 4 points in the KP sample. In both samples, the lower PROMIS global mental health score than the PROMIS-29 mental health summary score was seen among those who reported that a doctor or other health professional told them they had anxiety (28% of the MTurk sample and 20% of the KP sample) or depression (35% of the MTurk sample and 20% of the KP sample). PROMIS-29 and PROMIS global mental health scores 6 months after intensive care for COVID-19 showed even larger differences (7 T-score points lower for PROMIS global mental health) than the current study [
14]. So, the current study provides further evidence that the PROMIS global measure can yield lower mental health scores (indicating worse mental health) than the PROMIS-29 mental health summary score.
The correlations of 0.69–0.81 among physical health and 0.56–0.69 among mental health scales in this study are similar in magnitude to those reported by Schalet et al. [
13] between the PROMIS global health scales and the Veterans RAND-12 physical and mental health scales (product-moment correlations of 0.69 between the physical health scales and 0.63 between the mental health scales). But the MTMM correlation matrices for the three survey administrations in MTurk and the single administration in KP, and the correlations among change in the measures between baseline and 3-months later in MTurk, showed that the PROMIS mental health measures correlated as highly with physical health as with the other mental health measure. Hence, this is the first study to evaluate and find a lack of discriminant validity for the PROMIS global mental health scale.
In contrast, correlations between the SF-12 version 2 physical component summary (PCS) and PROMIS global physical health scale (r = 0.78) and between the SF-12 version 2 mental component summary (MCS) and the PROMIS global mental health (r = 0.62) exceeded correlations between the SF-12 PCS and MCS (r = 0.26) and between the PROMIS global physical health and mental health scores (r = 0.55) in a sample of older adults in the New Zealand Health, Work and Retirement longitudinal study [
21]. The authors concluded that the SF-12 PCS and PROMIS global physical health scale were similarly sensitive to hospital use and recurrent falls, but the SF-12 MCS was more sensitive to depression (CES-D score > 10) than the PROMIS global mental health scale. Schalet et al. [
13] did not examine discriminant validity, but an MTMM matrix we created (see Supplemental Table
2) from that dataset supports discriminant validity for the physical health measures. Three of the four comparisons of the 0.62 validity diagonal correlation between the PROMIS global and VR-12 mental health scales support discriminant validity, but the 0.62 correlation was significantly smaller than the 0.69 correlation between the PROMIS global physical and mental scales (t = − 4.56, p < 0.001).
It is worth noting that discriminant validity findings for the SF-12/VR-12 MCS comparisons with the PROMIS global mental health scale are somewhat more favorable in part since the SF-12 and VR-12 PCS and MCS scores were created to be uncorrelated with one another [
22,
23]. However, when the correlation between physical and mental health is estimated then noteworthy positive correlations between them have been observed. For example, product-moment correlations between physical and mental health factors at each of 3 years (baseline, 2-years post-baseline, and 4-years post-baseline) in the MOS ranged from 0.32 to 0.41 in the Medical Outcomes Study [
24]. Similarly, a correlation of 0.53 between physical and mental health factors was reported in a study of 1053 older individuals (average age 64 years) sampled from an academic general medical clinic [
25]. In addition, a correlation of 0.66 between RAND-36 physical and mental health was found in a sample of 255 females and 245 males stratified by age, race/ethnicity, and educational level to reflect the US population [
26]. Finally, a correlation of 0.64 between the PROMIS global physical and mental health scales was observed in a recent study of 2,668 nonoperative patients at the time of their first visit to a multidisciplinary spine clinic [
27]. This bolus of literature indicates that physical and mental health are positively correlated, and this can make it challenging to demonstrate discriminant validity when the methods of measure differ such as between the PROMIS-29 and the PROMIS global physical and mental health scores.
Correlations between mental health change scores and retrospective rating of change items in the MTurk sample were generally similar and small in magnitude, ranging from 0.09 (change in PROMIS global mental health with retrospective rating of change in cognition) to 0.18 (change in PROMIS-29 mental health summary score with retrospective rating of change in fatigue and with change in sleep). None of the correlations met the 0.371 level suggested for the use of anchors to estimate group-level minimally important differences [
28]. This was in part because the majority (range of 51% for mood to 70% for social) reported on the retrospective items on the 3-month survey that they were about the same as 3 months ago, and the correlations were larger if those who did not change were excluded but they were still below the threshold (results not shown). In short, retrospective ratings and prospective change in PROMIS-29 and PROMIS global physical and mental health scores were only weakly associated with one another.
The regression models indicated that baseline health was by far the strongest predictor of the physical and mental health scales at the 3-month follow-up and only a few demographic and condition indicators were significantly uniquely predictive. There was one overlap in the conditions that predicted physical health (trouble sleeping) and the significant predictors of mental health differed. Depression was uniquely predictive of PROMIS-29 mental health while anxiety predicted the PROMIS global mental health scale score.
The results of this study indicate that conclusions about mental health in studies may differ based on whether the PROMIS-29 or PROMIS global mental health measure is used. Given the noteworthy difference in the PROMIS-29 mental health summary and the PROMIS global mental health scores, it is important to explore the reasons why in future research. While theoretically assessing the same construct, the measurement approach for the PROMIS-29 and PROMIS global health items is fundamentally different. The PROMIS-29 summary scores are weighted (factor scoring coefficients) combinations of PROMIS-29 domain scores while the PROMIS global mental health scale directly assesses mental health perceptions and is scored using item parameters from an IRT graded response model. When the PROMIS-29 is administered, a more nuanced and complete picture of HRQOL can be obtained by examining the 7 domain scores and the pain intensity item in addition to the physical and mental health summary scores. The 10 PROMIS global health items have the advantage of being brief, but the PROMIS-29 provides more detailed and rich information.
In conclusion, this study documents noteworthy differences in the PROMIS mental health summary scores estimated using a weighted combination of PROMIS-29 domain scores and the PROMIS 4-item global mental health scale. Investigations are needed to shed additional light on the implications of these differences and to provide guidance about the conditions for which one or the other scores (or use of both) is appropriate.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.