The most widely used self-reported health question elicits a rating on an excellent to poor response scale.1 This item provides a general perception of health that reflects both objective health conditions and the individual’s values for different aspects of health-related quality of life. Multiple studies have found this and other global health items to be predictive of health care utilization and mortality.25 The item is also used as a case mix variable for patient experiences with care measures.6

The Patient-Reported Outcomes Measurement Information System (PROMIS®) project was funded by the National Institutes of Health (NIH) in 2004 with the objective of developing, evaluating, and disseminating publicly available survey items assessing self-reported generic health (www.nihpromis.gov).7 The PROMIS vision was to create efficient measures that would be feasible to implement in busy office practices and that could provide a system of population health surveillance normed to the U.S. general population.

Four items are used to assess global physical health. Three of these are administered using five-category response scales, and one item (rating of pain on average) uses a response scale of 0–10:8

  1. 1)

    In general, how would you rate your physical health? Excellent, Very Good, Good, Fair, Poor (global03)

  2. 2)

    To what extent are you able to carry out your everyday physical activities such as walking, climbing stairs, carrying groceries, or moving a chair? Completely, Mostly, Moderately, A little, Not at all (global06)

  3. 3)

    In the past 7 days, how would you rate your pain on average? Scale of 0 to 10, where 0 = no pain and 10 = worst pain imaginable (global07)

  4. 4)

    In the past 7 days, how would you rate your fatigue on average? None, Mild, Moderate, Severe, Very severe (global08)

We recoded global07 from its numeric 0–10 scale to five categories, based on a previous work.9 The 4-item global physical health scale had an internal consistency reliability (coefficient alpha) of 0.81. It is scored on a T-score metric, with the mean (50) and SD (10) relative to the U.S. general population.10 The scale has been included in several nationally representative surveys in the U.S., including the 2010 National Health Interview Survey (NHIS) and two cohorts of the HealthStyles survey: a non-probability mail sample in 2010 and a probability-based Internet panel in 2012.11 The average global physical health score in the HealthStyles samples was approximately 50, matching the PROMIS U.S. general population. The mean score in the NHIS was slightly higher, as it was administered using face-to-face interviews. The PROMIS global physical health scale was selected for the Healthy People 2020 initiative.

PROMIS also includes the self-reported general health item: "In general, would you say your health is: excellent, very good, good, fair, or poor?" (global01) This item was not used in scoring the PROMIS global health scale because it was so highly correlated (polychoric r = 0.95) and was “locally dependent” (residual correlation of 0.29 when a one-factor categorical confirmatory factor analytic model was estimated) with a similarly worded item in the global physical health scale ("In general, how would you rate your physical health?"). Because the excellent to poor self-reported health item is used on the majority of nationally representative health surveys in the U.S., there is interest in estimating the PROMIS global physical health scale from this item alone such that the scale score can be used for population health surveillance. This study provides estimates of PROMIS global physical health scale scores from the in general how would you rate your health (global01) item.

METHODS

PROMIS items were administered via a Web-based survey to 19,601 persons in a national Internet panel maintained by Polimetrix7 (now YouGov/Polimetrix; see http://research.yougov.com/) and to 1,532 subjects from PROMIS research sites (n = 21,133 overall). Demographic quotas were specified for the PROMIS study participants selected from the Internal panel, but the sample is not a random sample of the U.S. general population. PROMIS measures were normed using post-stratification adjustment of the sample to match the 2000 U.S. census data on gender, age groups, race/ethnicity, education, marital status and income.10,12

The PROMIS global physical health scale was scored using item response theory (IRT) methods. IRT provides potential advantages over classical test theory, such as standard errors of measurement that vary by estimated physical health score and estimation of distances between response categories rather than assuming equal spacing between them. Details regarding IRT are discussed elsewhere.13

Analysis Plan

The demographic characteristics of the PROMIS sample are provided in Table 1. We estimate product–moment and Spearman rank-order correlations of the single item (global01 in Table 2) with the PROMIS global physical health scale. To evaluate the comparability of global01 with the similar item in the PROMIS global physical health scale (global03), we substitute global01 for global03 and recalibrate the scale using the IRT graded response model, which yields a discrimination parameter and threshold parameters for each item. The discrimination parameter (a) as shown in Table 2 is similar to an item-total correlation. Higher values of this parameter are associated with items that are better able to discriminate between contiguous levels of global physical health. The threshold parameters (b1–b4) represent the level of global physical health necessary to respond above threshold with a 0.50 probability.

Table 1 Characteristics of Patient-Reported Outcomes Measurement and Information System (PROMIS) Respondents
Table 2 Item Parameters for Two 4-Item Variants of Global Physical Health (Graded Response Model)

Note that we did not calibrate global01 and global03 together with the other four global physical health items because of inflation of discrimination parameters due to local dependency (see Table 9.1 of Hays14). We compare the item parameters (discrimination and thresholds) from the 4-item variant of the global physical health scale that uses the global01 item with that of the existing scale. In addition, we use the graded response model results to estimate category response curves for global01. These category response curves represent the probability of responding in a particular category conditional on the estimated global physical health score.

Reliability can be estimated for different locations along the continuum of scores and for the average (marginal reliability) across the continuum from the graded response model. Information is analogous to reliability, and indicates the precision (reciprocal of the error variance) of an item or scale along the underlying continuum. We estimate information curves that provide reliability estimates for global01 and the existing four-item global physical health scale. The IRTPRO version 2.1 software program (Scientific Software International Inc., Skokie, IL, USA) was used to estimate the parameters of the graded response model.

We estimate mean scores on the existing PROMIS global health scale by response to the global health item (global01) using ANOVA and evaluate the significance of difference between least-squares means by global01 response category with the Tukey–Kramer adjustment for multiple comparisons using SAS software, Version 9.3 (SAS Institute Inc., Cary, NC, USA).

RESULTS

The demographics of the 21,133 respondents are shown in Table 1. The average age of the respondents in the sample was 53 years, and 52 % were female. The majority were white (80 %); 9 % were Hispanic, and 9 % African-American. Eighty-two percent had more than a high school degree. In addition, 65 % were married or living with someone, and 18 % reported having none of 25 chronic conditions (conditions assessed included hypertension, angina, coronary artery disease, heart failure, heart attack, stroke, liver disease, kidney disease, arthritis or rheumatism, osteoarthritis, migraines, asthma, chronic obstructive pulmonary disease, diabetes, cancer, depression, anxiety, alcohol or drug problems, sleep disorder, HIV/AIDS, spinal cord injury, multiple sclerosis, Parkinson’s disease, epilepsy, and amyotrophic lateral sclerosis).

The single item (global01) had a product–moment correlation of 0.81 (Spearman rho of 0.80) with the PROMIS four-item global health scale. The graded response model item parameters for the PROMIS four-item global health scale and the 4-item variant of the scale in which global01 is substituted for global03 are shown in Table 2.

The parameters for the two 4-item variants of global health were similar. For example, the discrimination parameter (a) for global01 and global03 was the same, and the corresponding threshold parameters (b1–b4) were very close to one another. Scale score estimates for the two variants of the 4-item scale were essentially identical (intraclass correlation = 0.98). Hence, it is immaterial whether global01 or global03 is used in the scale.

The category response curves for the PROMIS global health item (global01) are shown in Fig. 1. The curves provide strong support for differentiation along the physical health continuum for each response option. People with the most positive estimated physical health scores (about 2 SDs above the mean and higher) have the highest probability of selecting excellent, while those at the other extreme (2 SDs below the mean and lower) have the highest probability of selecting poor. The fair, good, and very good response options are monotonically ordered between these two extremes.

Fig. 1
figure 1

Category Response Curves for PROMIS Global Health Item (global01) (Image file: Figure 1.tif).

Figure 2 provides the scale information and corresponding standard error of measurement (SEM = 1/information1/2) for the PROMIS 4-item global health scale on a z-score metric. Figure 3 provides the same detail for the global01 item. A SEM of 0.32 is equivalent to scale information of 10 and reliability of 0.90; a SEM of 0.45 is equivalent to scale information of 5 and reliability of 0.80. Information by level of physical health varied from about 1.5 to 6.8 for the scale (Fig. 2). The information for the single item varied from about 1.2 to 2.4 (Fig. 3). Hence, the reliability (1-SEM2) for the PROMIS global health scale is 0.80 or higher from about −3 to 0.5 SDs relative to the mean, while the reliability for global01 is less than 0.80 throughout the range of physical health. The marginal reliability was 0.81 and 0.52 for the 4-item global physical health scale and the single item, respectively.

Fig. 2
figure 2

Scale Information Curve for PROMIS 4-Item Global Physical Health Scale (Image file: Figure 2.tif).

Fig. 3
figure 3

Information Curve for PROMIS Excellent to Poor Global Health Item (Image file: Figure 3.tif).

The ANOVA for the PROMIS global health scale score (dependent variable) by response category on global01 (independent variable) was statistically significant (F-statistic=9,836.82, p < .0001, dfs = 4 and 21,099) for the overall PROMIS sample and the estimated means are provided in Table 3. These T-score means range from 29 (poor self-rated health) to 62 (excellent self-rated health), a difference of greater than 3 standard deviations.

Table 3 PROMIS Global Physical Health Scale Means By Response to Excellent to Poor (global01) Item

DISCUSSION

This study provides estimated PROMIS global physical health scores from responses to the most widely used self-rated health item. Before discussing the implications of the study, it is important to recognize the tradeoffs in using a single item. While the item is strongly correlated with the 4-item global physical health scale score, its reliability is lower, and the standard error of measurement therefore higher, than the multi-item scale. In addition, future work is needed to compare the relative validity of the single item with the 4-item scale. Even with the seemingly strong correlation between the item and the scale, it is possible for the item to have different correlations with a criterion measure.15 For example, Hays et al.8 found that the PROMIS global physical health scale had a higher polyserial correlation than the single item with the EuroQol EQ-5D (0.82 vs. 0.65). In addition, the respondents to the survey were primarily members of an Internet panel. Despite this potential limitation, a previous study had shown that post-stratification adjustment produced characteristics similar to that of the U.S. general population.10

The present study reveals that individuals with excellent self-rated health are about 1.2 SDs better than the U.S. general population average, while those with very good health are about 0.4 SDs better. A self-rating of good is about 0.3 SDs worse than the U.S. general population mean, while fair and poor are 1.2 and 2.1 SDs worse, respectively. These results have implications for the scoring of self-rated general health in other studies.

Some investigators who have administered the item have collapsed responses of fair or poor together and have collapsed responses of good, very good and excellent.16,17 This study clearly shows that dichotomizing the five response levels results in the loss of important information. The more accurate scoring resulting from this study allows for better surveillance of the overall health of the U.S. population and for evaluating the attainment of Healthy People 2020 objectives.

The excellent to poor health item is one of the items in the SF-36 health survey. The original possible scoring range of 0–100 for this SF-36 item assigned 100 to excellent and 0 to poor, with 84, 61, and 25 for very good, good, and fair, respectively. This scoring was based on the logic employed by Stewart, Hays and Ware,18 who “recoded the response choices of the overall health item … to better reflect the unequal intervals of the item” (p. 727). To derive the recoded values, the authors calculated the average score for the other four current health items for each response level of the excellent to poor health item. Recoding of the item was then based on transposing (interpolating) these means into the 0–100 possible range. The same methodology yielded very similar estimates (86, 64, and 28 were obtained for very good, good, and fair, respectively) in a sample of 1,844 adults from a general population mail survey in western Switzerland.19

If the estimated scores (means) for the excellent to poor response categories in Table 3 are transformed linearly to a possible range of 0–100, then very good is 76, good is 52, and fair is 26. The present study, for which the criterion for determining the distance between response categories is derived using a more accurate item response theory model than the simple summated score previously used for the SF-36, shows very good and good to be further away from excellent and closer to fair than is implied by the SF-36 scoring.

The excellent to poor item has been administered for decades in the U.S. on the National Health Interview Survey and the Behavioral Risk Factor Surveillance System (BRFSS). The results of our study provide a basis for more accurate estimates of health in national surveys of the U.S. general population and for comparisons of subgroup differences (e.g., age, gender). The item has been shown to be predictive of important criterion variables such as health care utilization and mortality.5,20 It may also be useful for identifying patients for targeted interventions20 and as part of the evaluation and planning of care for patients in clinical practice.21 The item has been used as part of behavioral change strategies by comparing responses to it with health-related behaviors and clinical measures.22

In summary, the excellent to poor self-rated health item is useful for population-level monitoring. Because it assesses general health perceptions, it reflects what is important to the patient in evaluating their health. The item can provide a useful complement to the specific measures used routinely to evaluate patients in clinical practice.23 The study reported here provides essential information about how this item is scored and used for these purposes.