BACKGROUND

Inadequate pain assessment has been identified as a key barrier to appropriate pain management.13 Recently, important initiatives have aimed to increase awareness of pain as a clinical problem by promoting better pain assessment.46 These initiatives have led to widespread adoption of pain screening through measurement of current pain intensity.

In chronic pain, the most common type of pain seen in primary care, assessment of pain intensity alone is inadequate. Guidelines encourage comprehensive assessment that includes measurement of pain-related functioning, which may be even more relevant to patients’ overall quality of life than intensity.7,8 To facilitate chronic pain assessment, numerous multidimensional patient-reported measures have been developed;912 however, none of these have been widely adopted in the general medical settings where most chronic pain treatment is delivered.

In primary care, use of multidimensional pain measures is limited by factors such as instrument length and scoring complexity; however, a brief and straightforward multidimensional measure could potentially improve assessment of chronic pain. We sought to develop a very brief measure that would be feasible, valid, and sensitive to change in primary care. We started with the Brief Pain Inventory (BPI) because it is relatively easy to administer, score, and interpret; includes items assessing pain intensity and functional interference; and has been validated in many pain conditions.10,1317 As its name implies, the BPI is shorter than other multidimensional pain measures, but it is still too lengthy for implementation in primary care practice. We hypothesized that a shortened scale based on the BPI could be developed that would be more feasible, but just as useful, for assessing chronic pain in primary care. Our objectives were to develop an ultra-brief scale derived from the BPI and to initially assess its reliability, validity, and responsiveness.

METHODS

Participants

We used data from two sources: 1) Stepped Care for Affective Disorders and Musculoskeletal Pain (SCAMP), a longitudinal study that enrolled a total of 500 patients with chronic musculoskeletal pain, and 2) Helping Veterans Experience Less Pain (HELP-vets), a cross-sectional study of 646 veterans receiving care at VA clinics. We used data from Study 1 to develop and initially validate the ultra-brief measure and data from Study 2 to confirm reliability and validity in an independent patient population.

Study 1 (SCAMP) enrolled 500 primary care patients with persistent back, hip, or knee pain of at least moderate severity, 250 of whom had concurrent depression.18 Participants were recruited from university (n = 300) and VA-affiliated (n = 200) internal medicine clinics in Indianapolis. Patients with concurrent depression were enrolled in a trial of depression and pain treatment vs. usual care (n = 250). Those without depression were followed in a parallel observational study (n = 250). The mean age of SCAMP participants was 59 years; 52% were women, 58% were white, and 38% were black. The mean numeric rating of current pain (on a 0–10 scale) was 6.1 (SD 1.9) at baseline.

Study 2 (HELP-vets) enrolled a random visit-based sample of 646 veterans from ambulatory care clinics at two VA hospitals and six affiliated community sites in three urban California counties. Patients with chronic illness were over-sampled by design. The mean age was 63 years and 95% were male. Self-reported race/ethnicity was 54% white, 30% black, and 10% Latino. Sixty-one percent of participants reported pain at the time of enrollment and 63% had one or more pain diagnoses (33% back pain, 45% other musculoskeletal pain, 12% neuropathic pain, 5% headache). The mean rating of current pain (on a 0–10 scale) was 3.1 (SD 3.2) overall and 5.1 (SD 2.6) among those with pain.

Measures

Study 1 participants completed the BPI, Chronic Pain Grade questionnaire (CPG), Roland disability scale, and SF-36 bodily pain scale at baseline; they completed the BPI, CPG, and pain global rating of change at 6 months. Study 2 was cross-sectional; participants completed the BPI, Functional Morbidity Index, and a single-item rating of overall pain-related distress.

  • The Brief Pain Inventory (BPI) includes two scales that assess pain intensity and pain-related functional impairment (physical and emotional).13,15 The four items of the BPI severity scale assess the intensity of current pain and pain at its least, worst, and average during the past week on scales from 0 (“no pain”) to 10 (“pain as bad as you can imagine”). The BPI interference scale assesses pain-related functional interference with seven items assessing different domains (general activity, mood, walking ability, normal work, relations with other people, sleep, and enjoyment of life) rated from 0 (“does not interfere”) to 10 (“interferes completely”).

  • The Chronic Pain Grade questionnaire (CPG) includes two three-item scales (intensity and disability) that are transformed into 0–100 scores.19 An algorithm classifies pain into four graded categories: 1) low disability-low intensity, 2) low disability-high intensity, 3) high disability-moderately limiting, and 4) high disability-severely limiting. The CPG has been validated in primary care, chronic pain, and general populations.2022

  • The Roland Disability questionnaire is a pain-specific measure of physical disability validated in patients with back pain and other chronic pain conditions.23,24 It includes a checklist of 24 statements about pain effects on function; the score is the number of items endorsed.

  • The Short-Form 36-item questionnaire (SF-36) Bodily Pain Scale is a two item scale assessing pain severity and interference.25,26 Responses are transformed into a 0–100 score.

  • The Pain Global Rating of Change is a single item assessing patients’ overall impression of change in their pain. Study 1 participants were asked whether their pain was worse, about the same, or better since the start of the study. Those who reported that pain was better were asked to rate the magnitude of improvement (a little, somewhat, moderately, a lot, or completely better). Global ratings of change may be more sensitive to improvement and better correlated with patient satisfaction than serial measures.27

  • The Functional Morbidity Index was developed to assess general functional status in older adults.28 Patients indicate whether they are able perform four different activities independently, and if not, whether the impairment is due to a health problem.

  • Overall Pain Distress is a single item: “How much did overall pain distress or bother you during the past week?” Response options are not at all, a little bit, somewhat, quite a bit, and very much.

Item Selection

We used a consensus-based process, drawing on a literature review, expert opinion, and statistical data, to develop a shortened scale.29 Pre-specified criteria guided initial item selection. First, we decided to include at least one item representing each of three domains included in the BPI: pain intensity, physical functioning, and emotional functioning. We then selected items with the following characteristics: 1) easy to understand and applicable to patients with all types of pain; 2) good statistical characteristics (e.g., high response variability, high item-remainder correlation); 3) similar performance in depressed and non-depressed patients.

We chose “pain average” for the intensity item because it had a good distribution of responses, lacking the ceiling and floor effects seen with “pain worst” and “pain least,” respectively. We did not select “pain now” because we wanted to avoid duplicating information provided by the “fifth vital sign,” and capture intermittent pain. Although the ideal reporting period for pain assessment is debated, recalled average pain over one week is a valid measure of pain intensity.30,31

BPI interference items include those assessing physical status (general activity, walking, normal work), emotional status (mood, relations with others, enjoyment of life), and sleep. For physical interference, we chose “interference with general activity” because it applies equally to all patients, as opposed to “interference with work” (which may be affected by occupation, employment status, etc.) and “interference with walking” (which may not apply to non-ambulatory patients or those with upper body pain).

For emotional interference, we considered both “interference with mood” and “interference with enjoyment of life.” In our experience, “interference with relations with other people” is more difficult than other items for patients to answer. We wanted a scale that would discriminate between chronic pain and depression, which commonly co-occur.32 In our sample of patients with and without comorbid depression, we found that “interference with enjoyment of life” was more independent of depression than “interference with mood.” We also considered “interference with sleep” in place of the emotional interference items.

We reached consensus on a preferred three-item scale (“pain average,” “interference with enjoyment of life,” and “interference with general activity”) and alternative three-item and four-item scales, which we then evaluated statistically.

Reliability and Validity

We assessed reliability (internal consistency) by calculating Cronbach’s coefficient alpha. To assess construct validity, we compared the PEG with measures of pain and function using Pearson correlation coefficients. We used multiple measures for construct validity assessment, including the BPI, because no criterion standard exists for pain. We hypothesized that coefficients would be slightly higher for comparisons with pain-specific functional measures than for those with pain severity measures (because two of the three PEG items assess function). We also expected that coefficients would be larger for comparisons with pain-specific functional measures than for comparisons with generic functional measures.

Responsiveness

Assessment of responsiveness, or sensitivity to change, requires an independent standard to define change.33 We used two different measurements to define the presence or absence of patient improvement: 1) global rating of change and 2) serial CPG grade. We categorized patients according to their pain trajectory as assessed by each of the two measures. Global rating of change categories were defined by the patient’s retrospective assessment at 6 months of the change in their pain since the trial began: 1) improved (“better”), 2) unchanged (“about the same”), and 3) worse (“worse”). CPG categories were defined by the change in CPG grade from baseline to 6 months: 1) improved (pain grade decreased by ≥1 level), 2) unchanged (pain grade at baseline = pain grade at follow-up), and 3) worse (pain grade increased by ≥1 level).

Using data from Study 1, we assessed responsiveness by calculating the following three metrics: 1) change score (difference between mean score at baseline and follow-up), 2) effect size (ES; change score divided by the standard deviation of the baseline score), and 3) standardized response mean (SRM; change score divided by the standard deviation of the change score). These calculations were performed for patients in the improved, unchanged, and worse categories. Confidence intervals for SRM were calculated as + /- 1.96 divided by the square root of the sample size.34 We assessed responsiveness using all three methods because they can produce differing results and because agreement is lacking on the preferred method.34,35 We compared responsiveness of the PEG, BPI severity and BPI interference scales by comparing ES and SRM for each measure among patients in the improved category. Finally, we assessed responsiveness to varying degrees of improvement by comparing change scores for PEG and BPI scales to degree of improvement by global rating of change.

RESULTS

Item Selection

Using baseline data from the Study 1 sample, we assessed the preferred and alternate scales. Results were similar for all scales evaluated (Table 1). Our preferred scale demonstrated initial characteristics similar to or better than the alternatives, so we chose it as our final scale. From here on, we refer to this final abbreviated scale as the PEG, an acronym representing the three items: “Pain average,” “interference with Enjoyment of life,” and “interference with General activity.” Principal components analysis of the PEG in both samples demonstrated a single factor, accounting for 66% of the variance in Study 1 and 81% in Study 2.

Table 1 Reliability and Item-total Correlations for PEG and Alternate Scales in Study 1 Sample (n = 500)

The PEG comprises 1 intensity item and 2 interference items (Fig. 1). Consistent with BPI scoring, we calculated the average of individual item scores to get an overall PEG score (potential range 0–10). Table 2 shows means and standard deviations (SD) for each item and the full three-item scale in both populations.

Figure 1
figure 1

The PEG three-item scale. *Items from the Brief Pain Inventory reproduced with permission from Dr. Charles Cleeland.

Table 2 PEG and Individual Item Statistics at Baseline in Study 1 and Study 2

Reliability and Validity

Reliability of the PEG was 0.73 in Study 1 and 0.89 in Study 2. Table 3 shows correlation matrices for the PEG, BPI scales, and other pain and function measures in both study populations. Overall, construct validity of the PEG was good (r = 0.60–0.89 in Study 1 and r = 0.77–0.95 for pain-specific measures in Study 2), with correlations comparable to those of the BPI scales. As expected, PEG correlations were slightly higher for BPI interference (r = 0.89 and 0.95) than for BPI severity (r = 0.69 and 0.84) and for CPG disability (r = 0.67) than for CPG intensity (r = 0.64). Correlations were higher for the pain-specific function measures than for the Functional Morbidity Index (r = 0.54), a generic measure of function.

Table 3 Correlation between the PEG, BPI Scales, and other Measures at Baseline

Responsiveness

Six-month follow-up data was available for 210 Study 1 clinical trial participants. The proportion with pain improvement was approximately the same according to global rating of change (31.4%) and serial CPG pain grade (29.5%); however, more patients were classified as worse by global rating (29.0%) than by CPG grade (15.0%). Table 4 shows PEG scores and measures of responsiveness for patients classified as improved, unchanged, and worse at 6 months. Confidence intervals for the improved and unchanged groups did not overlap, but the unchanged and worse groups were not statistically different from each other. The improved group according to global rating of change had a mean improvement of 3.0 points (SD 2.5) on the PEG. Similarly, the improved group according to serial CPG grade had a mean change of 2.6 points (SD 2.7).

Table 4 Responsiveness of PEG among Patients Classified by Pain Global Rating of Change and Serial CPG Grade at 6 Months (n = 210)

The SRM among participants who improved at 6 months according to global rating of change were similar for the PEG (1.20, 95% CI 0.96, 1.44), BPI severity (1.04, 95% CI 0.80, 1.28), and BPI interference (1.13, 95% CI 0.89, 1.37). Results were similar for improvement according to serial CPG grade (data not shown). For all measures of improvement, ES and SRM were consistent with a large effect. Figure 2 shows 6-month change scores for the PEG, BPI severity scale, and BPI interference scale plotted against global rating of change.

Figure 2
figure 2

Mean change in PEG and BPI scales compared with global rating of change at 6 months (n = 210).

DISCUSSION

We demonstrated that the PEG, an ultra-brief three-item scale derived from the BPI, was a reliable and valid measure of pain among primary care patients with chronic musculoskeletal pain and diverse VA ambulatory patients. The PEG appears comparable to the BPI in terms of responsiveness to change. These findings support our hypothesis that an abbreviated scale derived from the BPI may be both useful and practical for chronic pain assessment in primary care and other ambulatory care settings, such as medical and surgical specialty clinics.

Strengths of this study include the confirmation of reliability and validity in an independent patient population, the diversity of the study populations, and the availability of multiple pain and functional measures with which to assess construct validity. Our choice of the BPI as the basis for our abbreviated scale development is another strength. The BPI is a widely used instrument that has been validated in numerous patient populations, clinical settings, and languages. BPI items are rated from 0–10, a format that has become familiar to patients and clinicians since assessment of pain with numeric scales has been broadly implemented in US health care settings. We took advantage of our collective experience with the BPI in observational and interventional research by employing a consensus-based process for scale shortening, consistent with recommendations to avoid over-reliance on statistical techniques.29

We believe our use of two different ambulatory study populations is a strength; however, each study has its own limitations. Study 1 was a sample of patients with chronic back and lower extremity musculoskeletal pain and included an over-representation of patients with depression (50% by design). The more clinically diverse patient population of Study 2, including ambulatory VA patients with and without chronic pain, enhances the generalizability of our findings. However, Study 2 included fewer pain measures with which to assess construct validity and was cross-sectional; therefore, we were able to assess responsiveness only in the first sample. Forty percent of Study 1 patients and 100% of Study 2 patients were recruited from VA clinics, so our findings may be less generalizable to non-VA settings.

We found that the PEG differentiated well between patients who improved and those who did not. According to responsiveness metrics, patients in the improved category had a large improvement in PEG score, whereas those in the unchanged category had a minimal change. Proper interpretation of the magnitude of change according to SRM and one group pre-post ES is not entirely clear, although authors have suggested that Cohen’s definition of small (0.2), moderate (0.5), and large (0.8) effects can be applied to interpretation of both responsiveness measures.36,37 We did not find evidence of PEG responsiveness in the worse direction (i.e., change scores between those who were unchanged and those who were worse did not significantly differ). We are limited in our ability to adequately assess sensitivity to worsening because we evaluated responsiveness in a single study population that likely had a ceiling effect for worsening due to high baseline pain severity.

The competing demands of primary care, in which visits are short and pain is only one of several problems warranting attention, make efficiency of assessment a paramount concern.38,39 A balance must be found between feasibility and key characteristics such as reliability, validity, and responsiveness. For example, ultra-brief depression measures containing two to three items perform better than single item depression measures.40 We also evaluated a four-item abbreviated scale, but found that adding an item contributed little. An abbreviated version of the BPI that eliminated a few items has been previously published,41 but the PEG is the first ultra-brief scale based on the BPI.

New assessment strategies are needed to support improved chronic pain management in primary care. We believe the PEG, which includes items assessing pain intensity, emotional function, and physical function, is an important step forward. However, further studies are needed to confirm our findings and validate the PEG in additional patient populations. Prospective research should determine whether serial pain measurement can improve the quality of clinical decision-making and pain outcomes in primary care. Given the huge clinical and societal burden associated with pain, developing efficient and effective strategies to enhance care is an important priority.