Introduction

Pain has been defined as “an unpleasant sensory or emotional experience associated with actual or potential tissue damage, or described in terms of such damage” by the International Association for the Study of Pain [1]. This definition underscores that pain is a subjective experience; therefore, unlike other chronic diseases, such as hypertension or hyperlipidemia, there is no single objective measurement to best characterize the extent of the problem or to evaluate treatment outcomes. Measuring a patient’s pain must correlate objective data with the patient’s subjective reporting to provide a comprehensive outcome representing the pain state. The purpose of this review is to explore the difficulties and opportunities unique to pain outcome measures.

Complicating the measurement of pain is the notion that the subjective experience of pain is often confused with nociception. Nociception involves peripheral signals generated by specialized receptors (nociceptors) in response to noxious stimuli. Pain requires a functioning central nervous system (e.g., brain) to interpret these nociceptive signals and produce a subjective experience. There is often a wide variability in how much pain a given stimulus or injury will cause. This variability is influenced by genetics, mood, beliefs, early life experiences with pain, sex, ethnicity, and other factors [2].

Chronic pain is often associated with an overall reduction in the patient’s quality of life encompassing domains such as depression, anxiety, impaired social and physical function, and sleep disturbance. Moreover, there appears to be relative independence between pain and these coexisting stressors. Therefore, to capture the pain experience, it is necessary to also define and characterize these related domains.

Recognizing that pain is challenging to accurately measure, why then must we strive to better evaluate outcomes in pain medicine? The Institute of Medicine estimates that 100 million Americans have chronic pain with a cost exceeding half a trillion dollars per year [3]. Current practice relies on evidence-based medicine to support clinical decision-making and to convince colleagues, patients, and payers of the most efficacious treatments. The gold standard in medicine has been the large-scale, randomized controlled trial. There is, unfortunately, a dearth of these studies in pain medicine, making it all the more imperative to accurately and consistently measure outcomes moving forward. Standardization of outcome reporting will allow for comparison and systematic review of the studies that do exist to meet the demand for evidence-based pain treatment and may help to answer the most pressing questions in the field of pain: How do we know that we have helped a patient with chronic pain, and how do we determine which treatment, and at what cost, is most appropriate for a specific patient?

Methods

As this article is intended as a current review, pertinent citations are included for each measure discussed with an emphasis on recent evidence and guidelines.

Considerations in Selecting an Outcome Measure

Any tool used to measure pain should be appropriate for the provider and patient needs. It is of little use to have a patient fill out multiple forms if the provider lacks the staff or infrastructure to use the data. This underscores the need to allocate resources efficiently when determining appropriate outcome measures. In defining a standard set of outcome measures, the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) consortium granted most weight to the following criteria [4].

  1. (A)

    Reliability. The instrument should demonstrate test–retest reliability when a patient’s status does not change over time. It should have inter-rater reliability if the scale is rated by clinicians, rather than patient self-reported, and clinicians observing the same patient should provide similar scores. There should be internal reliability if the scale contains multiple items measuring the same domain, and the scores should correlate.

  2. (B)

    Validity. The scale should measure what it is intended to measure. The scale should display convergent validity in that it must agree with other similar indicators and discriminate validity in that it must be distinguishable from related conditions.

  3. (C)

    Responsiveness. The scale must display the ability to detect changes over time and to distinguish between treatments. This requisite is of particular interest for clinical trials, wherein a treatment effect is investigated.

  4. (D)

    Appropriateness. The scale’s content should be in keeping with the measured outcome and relevant to the patient population being studied. The outcome measure must be scaled to the target patient population so that scores do not aggregate in a restricted area of the scale and should be at intervals to allow statistical flexibility.

  5. (E)

    Burden. The scale should be easy to administer, complete, and score. Desire for additional data must be balanced with time constraints and patient adherence. For example, daily as opposed to return visit assessments can yield excellent longitudinal data but may require use of paper diaries, which are prone to backfilling and recall bias, or daily phone calls, which can present an inconvenience to the patient and require significant staffing.

Univariable Measures

Unidimensional scales measure pain as a single quality varying only in intensity and, therefore, report a single outcome score. These methods are most effectively used in clinics and acute settings to provide information about current pain and need for rescue analgesics, such as postoperatively. Examples include the following.

Verbal Rating Scale

The Verbal Rating Scale (VRS) consists of a series of categorical descriptors ordered in increasing intensity (i.e., none, mild, moderate, and severe). The advantages of VRS are that it is easy to administer and report, particularly for elderly patients [5]. Disadvantages are that it has fewer response choices (shortened scale) and the categorical options limit statistical analysis. It has demonstrated ability to distinguish treatment effect, test–retest reliability, and convergent validity in cancer pain, analgesic trials, and evoked pain studies [6].

Visual Analog Scale

The Visual Analog Scale (VAS) is typically a 10-cm line anchored at one end by the label “no pain” and at the other end by a label of “worst pain". The patient marks a point on the line to indicate their pain level and the clinician measures the length of the line on a 101-point scale [7]. The advantages of VAS are that there is good evidence for responsiveness, validity, test–retest reliability, and scores can be treated as ratio data [8]. The limitations are that it can be more time consuming than other instruments in this class and elderly people may have difficulty using the scale [9].

Numerical Rating Scale

The Numerical Rating Scale (NRS) is the most frequently used univariable instrument. It consists of a rating scale from 0 to 10, with 0 signifying “no pain” and 10 signifying “worst pain". Patients may respond orally or by circling the appropriate number. A similar scale with 0 to 100 is also used. The NRS minimizes patient and provider burden during data collection and compliance is excellent. In contrast to VAS, it can be administered via a phone interview; however, scores cannot be treated as ratio data. It demonstrates sensitivity to change, test–retest reliability, and correlates well with other measures of pain intensity [6]. The NRS is recommended by IMMPACT as a core domain measure for future chronic pain clinical trials [10].

Patient Global Impression of Change

The Patient Global Impression of Change (PGIC) represents an attempt to capture pain improvement more broadly using a single item measure. The patient is asked to rate their current status compared to a previous time point from best to worst (i.e., very much improved, much improved, minimally improved, same, minimally worse, much worse, or very much worse). This scale is applicable to many conditions and treatments but lacks sensitivity [11]. It is recommended by IMMPACT as a core domain measure and can be particularly helpful in gauging the clinical importance of changes in other measures [12].

Rescue Medication Use

While not a true pain outcome scale, rescue analgesic medications can be used as a surrogate for pain, particularly when use is triggered by meeting or exceeding a set pain score (i.e., medication X to be administered for NRS >7).

Emotion Measures

There is a relationship between pain and emotional distress and there is evidence of relative independence [13]. Emotional assessment instruments, either as part of a broader multidimensional pain measure or as a specialized emotion scale, can elucidate the interplay of emotion and pain and help guide therapy, particularly when emotional distress is the primary concern. Most commonly, depression, anxiety, and fear are found to coexist and can significantly affect pain and treatment outcomes. Measurements of depression include the Patient-Reported Outcomes Measurement Information System (PROMIS®) Emotional Distress–Depression Item Bank (NIHPromis.org, Silver Spring, Maryland, USA) [14], Beck Depression Inventory (BDI) [15], Zung Self-Rating Depression Scale [16], and Hamilton Rating Scale for Depression [17]. Anxiety and fear measures include the PROMIS Emotional Distress–Anxiety Item Bank [14], Pain Anxiety Symptoms Scale [18], State-Trait Anxiety Inventory [19], and Fear-Avoidance Beliefs Questionnaire (FABQ) [20]. Of these, the BDI has been most extensively studied, demonstrating internal consistency (Cronbach alpha 0.73–0.95), test–retest reliability (Pearson’s r 0.80–0.90), and convergent validity (Pearson r mean = 0.60), leading it to be recommended by IMMPACT as a core outcome for Health-Related Quality of Life (HRQoL) as part of future clinical trials in treatments of chronic pain [10].

Multidimensional Measures

Chronic pain requires a more comprehensive assessment than a univariable or single domain measure can provide. This assessment should include reports of several dimensions of pain (quality, intensity, location), disability, emotional affect, and effect on quality of life. This complex approach to the pain experience is much more likely to reflect the impact of pain on a patient’s life. Commonly used scales include the following.

Brief Pain Inventory

The Brief Pain Inventory (BPI) was developed by the Pain Research Group of the World Health Organization (WHO) Collaborating Centre for Symptom Evaluation in Cancer Care to measure both the sensory dimension of pain (intensity) and the reactive dimension (interference in patient’s life) [21]. The BPI has been used mostly for cancer pain and consists of a 17-item scale that typically takes under 15 min to complete. It has been validated in multiple languages and demonstrates good sensitivity to pharmacologic treatment effects. The BPI interference scale, in particular, has been validated as a measure of physical functioning in multiple domains and is recommended by IMMPACT as a core HRQoL measure [10].

McGill Pain Questionnaire

The McGill Pain Questionnaire (MPQ) was developed to specify the qualities of pain [22]. Pain is scaled in three dimensions (sensory, affective, and evaluative) and the questionnaire consists of 20 sets of words for each dimension with each having from two to six descriptors that vary in intensity. Multiple studies have supported the reliability and validity of the MPQ for specific pain syndromes [23] and it is available in multiple languages. It takes approximately 15 min to complete. The Short-Form McGill Pain Questionnaire (SF-MPQ) was developed for research purposes and consists of 15 words from the sensory and affective categories from the standard long form with a four-point rating scale for each, a pain intensity VAS score, and overall assessment of pain VRS score [24].

West Haven-Yale Multidimensional Pain Inventory

The West Haven-Yale Multidimensional Pain Inventory (WHYMPI) best assesses adaptation to chronic pain [25]. It can yield clinically useful information regarding pain-coping styles, such as adaptive copers, interpersonally depressed, or dysfunctional copers. It is composed of 52 items with 12 subscales, including perceived interference of pain, response from significant others, pain intensity, emotional affect, perceived control, and participation in social or work activities. Patients respond to the questions on a seven-point scale. The WHYMPI has been validated for diverse pain syndromes and is sensitive to treatment effects. The WHYMPI interference scale correlates with physical functioning and is recommended by IMMPACT as an alternative to the BPI [10].

Medical Outcome Study 36-Item Short-Form Health Survey and Treatment Outcomes of Pain Survey

The 36-Item Short-Form Health Survey (SF-36) is a frequently used measure of function and quality of life in a variety of patient populations [26]. It consists of eight subscales, including, physical function, limitations due to physical problems, social function, pain, limitations due to emotional problems, general mental health, vitality, and general health perceptions. It takes approximately 10 min to complete and scores can be compared across multiple populations. While widely used, it features only two questions related to pain and there are concerns about insensitivity to change when measuring an individual patient.

The Treatment Outcomes of Pain Survey (TOPS) is an extension of the SF-36 specifically designed for patients with chronic pain [27, 28]. TOPS derived many of its questions from other previously discussed measures, including the SF-36, WHYMPI, BPI, and FABQ. It consists of 120 items with a 61-item follow-up and addresses pain symptoms, function, perceived disability, objective disability, satisfaction with treatment, fear avoidance, coping, life control, limitations, demographics, and substance abuse history. The scale scores are quite comprehensive, and have been found sensitive to change and have good validity; however, adherence is limited by increased questionnaire length.

Measurement of Pain in Children

Pain instruments used with children or patients with significant impairment must be compatible with cognitive abilities. The patient should be able to meaningfully interpret the scale and understand its intervals, and this ability must be assured before using the scales. Achieving this goal can often be achieved through modification of adult scales. The Colored Analog Scale (CAS) replaces a VAS with gradually increasing red coloring to indicate increasing intensity of pain, whereas the Wong-Baker FACES™ Pain Rating Scale (Wong-Baker Foundation, Oklahoma City, OK, USA) replaces a VAS with varying facial expressions from crying to smiling. A major disadvantage of these scales, however, is difficulty separating pain from other sources of sadness, anxiety, or anger.

For nonverbal adults or infants when self-report is not possible, several tools have been proposed to evaluate facial or body movements as proxies for pain [29, 30]. While these measures may be necessary clinically, they are unlikely to meet the scientific standard for reporting.

Objective Measures

Several physiologic variables have been suggested as surrogates for pain, including autonomic activity, such as skin conductance [31] and heart rate [32] or biomarkers of pain intensity [33]. Caution with interpreting these peripheral measures is urged as they can be influenced by many forms of arousal other than pain and can be modulated by nonanalgesic medications. Physical function tests, such as range of motion and strength, have been used as proxies for pain, including the timed “Up and Go” test for osteoarthritis [34], loaded forward-reach test for low back pain [35], and grip strength for rheumatoid arthritis [36]; however, these only modestly predict self-reported pain scores, suggesting that other factors heavily influence the subjective experience of pain. More recently, attempts to objectively measure pain have focused on the brain using a neuroimaging approach. Indeed, recent studies suggest that brain imaging can be used to objectively distinguish the presence of evoked painful stimuli [37] as well as the presence of chronic low back pain [38]. Despite these promising early reports using neuroimaging as an objective biomarker of pain, there is still much research to be done to validate its use. Furthermore, given the expense and time involved it is more likely that neuroimaging will primarily be used to help guide further research and understanding of brain mechanisms involved in pain—at least for the foreseeable future. All of these data further emphasize the complex interplay between sensory, cognitive, and affective components of pain, and reinforces the message that it is unlikely that an objective clinical measure for pain will soon emerge for daily use.

Clinical Trials and Outcomes Data

In addition to the clinical need to provide and document appropriate care for pain, there is clearly an impetus to provide the evidence necessary to guide and justify appropriate treatments. This has resulted in efforts involving academia, pharmaceuticals, and government agencies to define and standardize outcome measures, both for pain and similar related disease states. IMMPACT defined six core outcome domains that should be considered when designing clinical trials, including pain, physical functioning, emotional functioning, participant ratings of improvement, symptoms and adverse events, and participant disposition [39]. IMMPACT went on to define specific validated measures for each of the core outcome domains in the follow-up IMMPACT-II, including NRS, use of rescue analgesics, WHYMPI interference scale, BPI interference items, BDI, Profile of Mood States, PGIC, passive capture of adverse events, participant disposition, and tailored measures specific to the study population [4].

The National Institutes of Health recently funded PROMIS with the goal of developing valid, reliable, and standardized questionnaires to measure patient-reported outcomes. These assessment instruments were developed between 2004 and 2009 to yield calibrated item banks measuring domains, such as pain, fatigue, physical function, depression, anxiety, and social function. These banks can be used to produce short forms or computerized adaptive tests for researcher and clinician use, and are available at http://www.assessmentcenter.net [14]. The second phase of PROMIS is ongoing and focuses on the development of new tools to measure patient-reported outcomes (PROs) and validation of the current item banks.

Clinical Versus Statistical Significance

Outcome measures for pain provide a metric by which treatments and progression can be compared. Ideally, an effective intervention should demonstrate both a clinically and statistically significant difference versus alternative treatment or placebo. However, they are not always linked. For example, as sample size increases, statistical significance increases regardless of clinical effect. Thus, to interpret the results of a clinical trial, the clinically relevant effect size must first be determined. Studies suggest that for pain, a 30% reduction, corresponding with a PGIC of “much improved” or “very much improved", two-point reduction on NRS [12, 40], or 35-mm reduction on VAS represents a satisfactory result for the patient [41]. The most recent IMMPACT consensus statement addresses clinical importance of outcomes and advocates the use at least two measures from different core domains with the inclusion of at least one “anchor”-based measure to relate changes in scores to a standard that differs from the measure itself (for example, relating NRS scores to PGIC) [10].

Discussion

The assessment of pain remains a challenge but the landscape is improving in development and adoption of appropriate outcome measures. Most clinicians and researchers recognize that chronic pain is a multidimensional experience requiring appropriate attention to sensory, emotional, functional, and cognitive aspects in addition to the univariable pain intensity scores frequently used in the acute setting. Given the multitude of instruments available to assess pain outcomes, deciding upon a specific tool for any given situation can be difficult. Indeed, a recent systematic review of pain outcomes in chronic low back pain demonstrated 75 different outcome measures cited to evaluate therapy, and the reader is referred to this article for a more in-depth discussion of the validity, reliability, and responsiveness of each in this context [42]. In summary, the authors of the study recommend use of the VAS or NRS for pain responsiveness, the Oswestry Disability Index or Roland Morris Disability index for physical functioning, and the SF-36 for quality of life measures. Regardless of the measures chosen, each scale represents a compromise between factors of sensitivity and specificity, comprehensiveness and burden. It can be tempting to administer a barrage of measures but this approach can significantly increase the burden on both patient and staff, and lead to decreased compliance. The key to choosing an instrument is to be sure that it measures the appropriate domain of interest and to balance the quality and quantity of information.

The results of IMMPACT and PROMIS have suggested core outcome domains, validated measures, and item banks that can be easily accessed by researchers and clinicians alike. In addition, specific pain conditions may require tailored measurements for that population and outcome. For example, the study of acute post-surgical pain may focus on intensity of pain and need for rescue analgesics, while chronic pain conditions are more likely to require multidimensional assessment. Use of standardized outcomes and measurements, and making these readily accessible to providers and patients, holds significant promise to ensure the best delivery of care and the advancement of pain medicine.