Original Article
The SF-36 summary scores and their relation to mental disorders: Physical functioning may affect performance of the summary scores

https://doi.org/10.1016/j.jclinepi.2006.04.003Get rights and content

Abstract

Background and Objectives

The Medical Outcomes Study 36-item Short-Form Health Survey (SF-36) has been widely used as a generic measure of health status. It can be scored to provide either a profile of eight scores or two summary measures of health. Several studies demonstrated shortcomings of the summary scores in accurately reflecting patients' physical and mental health on the basis of subscale scores. The objective of this study was to compare and evaluate different scoring algorithms for the summary scores.

Methods

The analysis was based on data on 4,052 respondents from the German National Health Interview and Examination Survey. Mental disorders were assessed using a structured clinical interview. Logistic regression and receiver operating characteristic analyses were used to evaluate the association between the mental component scores and mental disorders.

Results

Subjects with mental disorders reported poorer quality of life on all SF-36 subscales and component scores compared to those without mental disorders. The presence of physical disorders resulted in different summary scores. The screening accuracy in detecting subjects with mental disorders was satisfactory for both mental summary scores.

Conclusions

The summary scores should be evaluated in relation to the profile of the eight subscales. Physical functioning should be evaluated carefully when comparing health status using summary scores.

Introduction

The Medical Outcomes Study 36-item Short-Form Health Survey (SF-36) [1] is the most widely used generic instrument for measuring quality of life (QOL). It is recommended for use in health policy evaluations, general population surveys, clinical research, and clinical practice [2], [3]. Moreover, it has proven to be useful for comparing the relative burden of different diseases on QOL [4], [5]. The SF-36 has been found to correlate substantially with frequency and severity of many specific symptoms and problems. For example, Leidy et al. [6] found that several of the scales from the SF-36 were related to measures of depression and were sensitive to changes in depression over time. Russo et al. [7] reported that the SF-36 scales were related to psychiatric symptoms as measured by the brief psychiatric rating scale. The instrument has demonstrated sound psychometric properties across diverse clinical populations.

Summary scores were developed to aggregate the most highly correlated (redundant) subscales and to simplify analyses without substantial loss of information. There are two broad classes of summary scores: Ware et al. [8] (medical outcome study [MOS] approach) developed two component summary scores for the SF-36 using principal components analysis: the mental component summary (MCS) and physical component summary (PCS). Orthogonal factor rotation was used in the construction of these summary scales. All eight subscales were used to perform both the physical and mental summary scores, leading to two measures that are statistically independent. As a result of this scoring approach, three of the four “mental” scales have negative standardized scoring coefficients in the PCS and the four “physical” scales have negative standardized scoring coefficients in the MCS. These two factors were found to account for more than 80% of the reliable variance of the standard eight subscales and were easily interpreted in a general population [8].

Hays et al. [9] presented an alternative instrument to the MOS SF-36, namely the RAND-36, which slightly differs with regard to the item summation, but not in wording of the items or the structure of the instrument. The scoring approach for the summary scores is based on the assumption that the physical and mental health factors are correlated. Therefore, only four scales contribute to the PCS and four to the MCS. The scoring coefficients of the scales were obtained from oblique factor analysis (with Promax rotation).

Summary scores have some methodological features that make them more advantageous for clinical research. These features include smaller confidence intervals (CIs), the elimination of floor and ceiling effects, simpler analysis by reducing the number of statistical tests required, and avoiding the problem of multiple testing.

The MOS summary scores have been used as primary or secondary outcome measures in clinical trials. Although a number of studies have confirmed the validity of these measures [10], there is an ongoing discussion about the scaling of the summary scores [10], [11], [12], [13], [14], [15]. Several studies showed that the discrepancies between subscale profile and component scores of the SF-36 are attributable to the way in which these summary scores are calculated. The authors of these studies argued that the main problem in the scoring algorithm derives from the use of negatively weighted subscale factor score coefficients, leading sometimes to clinically counterintuitive study results. As pointed out by Simon et al. [11], the physical functioning subscale makes a significant negative contribution to the computed MCS. Therefore, a condition with associated severe physical limitations and modest psychological distress appear to have no impact on overall mental health.

The purpose of this study was to compare and evaluate the two different scoring algorithms for the summary scores in a representative community sample. We compared the performance of the RAND and the MOS scoring for subjects suffering from psychological distress and physical limitations. In addition, we assessed the validity of the MOS and the RAND MCS with respect to mental disorders.

Section snippets

Data

Data for this report were taken from the German National Health Interview and Examination Survey (GHS). Methods of the trial are described in detail elsewhere and will be summarized here [16], [17], [18].

The GHS is based on a stratified, multistage, cross-sectional, national representative sample of 7,124 individuals aged 18–79 years from the noninstitutionalized population of Germany. The main survey consisted of a comprehensive health status examination by a medical doctor; respondents also

Results

Of 4,181 participants examined in the mental health supplement, 129 (3.1%) subjects were excluded due to missing values for the SF-36. The remaining 4,052 subjects were included in the present analyses.

In Table 1 the patterns of scale weights are presented for each summary score [8], [9]. As can be seen from the table, the physical functioning scale makes a significant negative contribution to the MOS MCS. On the other hand, the role emotional functioning scale makes a significant positive

Discussion and conclusion

The SF-36 is a widely used generic health status measure. Nevertheless, there is an ongoing discussion about the scaling of the general scores. The main problem in the original scoring algorithm derives from the use of negatively weighted subscale factor score coefficients. The aim of this study was to evaluate the effects of negative weighting for the assessment of mental health. We compared the validity of two mental scoring algorithms as screening measures for mental disorders in a

Acknowledgment

We thank Hans-Ulrich Wittchen, PhD, Heribert Stolzenberg, PhD, and Bärbel-Maria Kurth, PhD, for their assistance with the GHS public use databases.

References (36)

  • J. Russo et al.

    The MOS 36-item short form health survey—reliability, validity, and preliminary findings in schizophrenic outpatients

    Med Care

    (1998)
  • J.E. Ware et al.

    SF-36 physical and mental summary scales: a user's manual

    (1994)
  • R.D. Hays et al.

    RAND-36 Health Status Inventory

    (1998)
  • J.E. Ware et al.

    Interpreting SF-36 summary health measures: a response

    Qual Life Res

    (2001)
  • G.E. Simon et al.

    SF-36 summary scores: are physical and mental health truly distinct?

    Med Care

    (1998)
  • D. Wilson et al.

    The SF-36 summary scales: problems and solutions

    Soz Praventivmed

    (2000)
  • M.W. Nortvedt et al.

    Performance of the SF-36, SF-12, and RAND-36 summary scales in a multiple sclerosis population

    Med Care

    (2000)
  • C. Taft et al.

    Do SF-36 summary component scores accurately summarize subscale scores?

    Qual Life Res

    (2001)
  • Cited by (16)

    View all citing articles on Scopus
    View full text