Original Article
Psychometric and Clinical Tests of Validity of the Japanese SF-36 Health Survey

https://doi.org/10.1016/S0895-4356(98)00096-1Get rights and content

Abstract

Cross-sectional data from a representative sample of the general population in Japan were analyzed to test the validity of Japanese SF-36 Health Survey scales as measures of physical and mental health. Results from psychometric and clinical tests of validity were compared. Principal components analyses were used to test for the hypothesized physical and mental dimensions of health and the pattern of scale correlations with those components. To test the clinical validity of SF-36 scale scores, self-reports of chronic medical conditions and the Zung Self-Rating Depression Scale were used to create mutually exclusive groups differing in the severity of physical and mental conditions. The pattern of correlations between the SF-36 scales and the two empirically derived components generally confirmed hypotheses for most scales. Results of psychometric and clinical tests of validity were in agreement for the Physical Functioning, Role-Physical, Vitality, Social Functioning, and Mental Health scales. Relatively less agreement between psychometric and clinical tests of validity was observed for the Bodily Pain, General Health, and Role-Emotional scales, and the physical and mental health factor content of those scales was not consistent with hypotheses. In clinical tests of validity, the General Health, Bodily Pain, and Physical Functioning scales were the most valid scales in discriminating between groups with and without a severe physical condition. Scales that correlated highest with mental health in the components analysis (Mental Health and Vitality) also were most valid in discriminating between groups with and without depression. The results of this study provide preliminary interpretation guidelines for all SF-36 scales, although caution is recommended in the interpretation of the Role-Emotional, Bodily Pain, and General Health scales pending further studies in Japan.

Introduction

Increasing interest in evaluating and comparing health status across countries has led to an increase in research devoted to the translation and cross-cultural adaptation of health status questionnaires. In conducting this type of research, two objectives must be addressed simultaneously: health status questionnaires developed and validated in one country should measure the same health attributes when translated for use in other countries, but should also be meaningful within the culture of other countries [1]. The IQOLA Project was initiated to translate and validate the SF-36 Health Survey for use in cross-cultural research, following a comprehensive methodological approach that addressed both objectives 2, 3, 4, 5.

Results of studies to date in many Western European countries and Australia suggest that the SF-36 can be successfully translated, validated, and normed for use in other countries (in addition to the articles in this volume, see 6, 7, 8, 9, 10, 11, 12, 13). However, there are fewer linguistic and cultural differences between the United States and Western European countries than between the United States and Asian countries. The SF-36 Health Survey has been translated in Japan and other Asian countries using the IQOLA methodology. A description of the translation process within Japan and information on tests of scaling assumptions and reliability of the Japanese SF-36 are reported elsewhere in this issue [14]. This article reports results from empirical tests of the validity of the Japanese SF-36 Health Survey.

Validity studies evaluate whether a scale score measures what it is supposed to measure; that is, whether it has the intended meaning [15]. Results from validity studies provide information that can be used in establishing interpretation guidelines for scales [5]. A comprehensive approach to studying the validity of the SF-36 includes factor analysis and criterion-based approaches which use tests that closely approximate the intended applications of the SF-36 [5]. Factor analysis is a traditional psychometric approach to validation, in which the congruence between hypothesized constructs and the scales constructed to measure these constructs is evaluated. Previous factor analytic studies investigating the dimensionality of the eight SF-36 scales have confirmed the presence of two distinct physical and mental health components in the United States 16, 17, 18 and in nine Western European countries [19]. It has also been shown that such traditional psychometric approaches can be supplemented to address other key validity issues, such as the relation of SF-36 scales to clinical criteria [16]. Thus, by examining the relationship of each SF-36 scale to both physical and mental health factors and to clinical variables, using a strong theoretical foundation, the validity of each scale and more comprehensive guidelines for interpretation can be developed.

In this study, we examined whether the SF-36 scale scores reproduce the hypothesized physical and mental dimensions of health in Japan, as previously demonstrated in factor analytic studies of the SF-36 16, 17, 19 and whether the pattern of relationships between those factors and SF-36 scales is predictive of their associations with external criteria of physical and mental health.

Section snippets

Sample and Data Collection

Data for this analysis came from a survey of the Japanese general population conducted in 1995. A two-stage stratified random sampling frame was constructed to select a representative sample of the Japanese general population age 16 and older. First, a total of 300 districts were randomly selected from 50 strata, consisting of 10 major regions of Japan and five city sizes. Then within each of the 300 districts, 15 respondents were randomly selected. Of the 4500 subjects selected for this study,

Psychometric Validity

As hypothesized, eigenvalues for the first two components exceeded one, supporting the two-dimensional model. The two components explained 66% of the total variance and 81.9% of the reliable variance in SF-36 scale scores, and explained at least two-thirds of the reliable variance in each SF-36 scale score.

The pattern of correlations between the eight SF-36 scales and the two rotated components was as hypothesized for four scales (Table 1). The Role-Physical and Physical Functioning scales had

Discussion

The SF-36 Health Survey has been shown to yield reliable scores for scales measuring eight dimensions of health status and for summary measures of physical and mental health components in many countries 5, 17, 22, 23. Factor analyses, like those reported here, have proven useful in the construct validation of the SF-36 and in developing interpretation guidelines 16, 17, 18, 19. However, because of the circularity inherent in interpreting summary components on the basis of their correlations

Acknowledgements

This study was in part supported by Grants for Scientific Research Expenses for Health and Welfare Programs; Funds for Comprehensive Research on Long Term Chronic Disease. The authors thank Nippon Research Center and IBRD for their assistance in the general population survey. The authors would also like to acknowledge Joseph Green, Ph.D. for his thoughtful suggestions.

References (24)

  • J.E. Ware et al.

    The MOS 36-Item Short-Form Health Survey (SF-36)I. Conceptual framework and item selection

    Med Care

    (1992)
  • J.E. Ware et al.

    SF-36 Health Survey Manual and Interpretation Guide

    (1993)
  • Cited by (0)

    View full text