Original Article
Use of Structural Equation Modeling to Test the Construct Validity of the SF-36 Health Survey in Ten Countries: Results from the IQOLA Project

https://doi.org/10.1016/S0895-4356(98)00110-3Get rights and content

Abstract

A crucial prerequisite to the use of the SF-36 Health Survey in multinational studies is the reproduction of the conceptual model underlying its scoring and interpretation. Structural equation modeling (SEM) was used to test these aspects of the construct validity of the SF-36 in ten IQOLA countries: Denmark, France, Germany, Italy, the Netherlands, Norway, Spain, Sweden, the United Kingdom, and the United States. Data came from general population surveys fielded to gather normative data. Measurement and structural models developed in the United States were cross-validated in random halves of the sample in each country. SEM analyses supported the eight first-order factor model of health that underlies the scoring of SF-36 scales and two second-order factors that are the basis for summary physical and mental health measures. A single third-order factor was also observed in support of the hypothesis that all responses to the SF-36 are generated by a single, underlying construct—health. In addition, a third second-order factors, interpreted as general well-being, was shown to improve the fit of the model. This model (including eight first-order factors, three second-order factors, and one third-order factor) was cross-validated using a holdout sample within the United States and in each of the nine other countries. These results confirm the hypothesized relationships between SF-36 items and scales and justify their scoring in each country using standard algorithms. Results also suggest that SF-36 scales and summary physical and mental health measures will have similar interpretations across countries. The practical implications of a third second-order SF-36 factor (general well-being) warrant further study.

Introduction

The measurement of health status begins with a model of health that determines the content of a questionnaire and serves as the “blueprint” for scoring multi-item scales and summary measures. This model also provides a confirmatory strategy for evaluating the construct validity of a questionnaire. The reproducibility of such models, an issue of much debate in the international literature (see, for example, 1, 2, 3, 4), is a crucial prerequisite to the scoring and interpretation of a questionnaire in multinational comparisons of health and trials of treatment effectiveness. To date, such comparisons have been hampered by a lack of standardization of questionnaire content across nations [5] and a lack of systematic, comprehensive approaches to the validation of translations of health status measures 6, 7. The International Quality of Life Assessment (IQOLA) Project translated the SF-36 Health Survey to test the feasibility of standardizing comparisons of health across countries [8].

Reported here are the results of the use of structural equation modelling (SEM) to test the cross-cultural construct validity of the SF-36 by comparing its factor structure across ten IQOLA countries, including Denmark, France, Germany, Italy, the Netherlands, Norway, Spain, Sweden, the United Kingdom, and the United States. Confirmatory factor analysis (CFA) has been used to support the cross-cultural construct validity of measures used in psychology and education (e.g., 9, 10, 11, 12), health care utilization [13], epidemiology [14], and physical [15] and mental health status [16]. However, we know of no other instance where this method has been applied to test the cross-cultural construct validity of a comprehensive, generic model of health status or where it has been applied to evaluate the similarity of questionnaire structure across more than three different cultures. The analyses reported here provide information about the generalizability of the structure of health in ten nations differing in language, geography, and economic and political systems.

The basic question addressed in this research is whether a questionnaire designed as a generic measure of health status in one country can be translated with comparable validity. We hypothesized that the SF-36 model of the relationships of items to scales and scales to each other that fit the data in one country would be replicated in nine other countries sufficiently to warrant its use and interpretation in those countries. Based on the conceptual background of the content of the SF-36 [17] as well as the results of previous research in the United States 18, 19, 20, we hypothesized that a model with the following characteristics would fit SF-36 data: (1) the model would include eight first-order factors corresponding to the eight scales of the SF-36 (described below under Methods); (2) items used in scoring each scale would have stronger relationships with their corresponding first-order factors than with competing first-order factors; (3) second-order physical and mental factors would account for correlations among the eight scales; and (4) a single third-order health factor would account for the correlations among the second-order factors. We also tested the fit of models that added a third second-order factor as explained below.

Section snippets

Samples

Samples of respondents were drawn within each country according to methods specified in detail elsewhere 21, 22, 23, 24, 25. Nationally representative samples were selected in Denmark, France, Germany, Italy, Norway, Spain, and the United States. A supplemental sample of the elderly in France (N = 348) and respondents who required assistance completing the questionnaire in Italy (N = 548) were not included in the analysis. The Spanish data were limited to those patients who had complete data

First-Order Factor Structure

Figure 2 describes the pattern of relationships between the SF-36 items and the eight first-order factors. Dashed lines indicate secondary loadings that were consistently greater than 0.30 across countries. As summarized in Table 1, standardized regression coefficients to predict items from factors were approximately 0.40 or higher for each item with its hypothesized factor and lower with competing factors across countries. However, there were a few exceptions to this pattern.

The “vigorous

Discussion

These results support a model of relationships between SF-36 items and factors and factors to each other developed in the U.S. database, across databases from nine other countries. In general, results across ten countries confirmed the eight first-order factor structure of the SF-36. With few exceptions, standardized regression coefficients to predict items from factors were 0.40 or higher for each item with its hypothesized factor (demonstrating convergent validity) and lower with competing

References (65)

  • W. Kuyken et al.

    Quality of life assessment across cultures

    Int J Ment Health

    (1994)
  • U.S. Congress, Office of Technology Assessment. International Health Statistics: What the Numbers Mean for the United...
  • R.T. Anderson et al.

    Critical review of the international assessments of health-related quality of life

    Qual Life Res

    (1993)
  • M. Bullinger et al.

    Developing and evaluating cross-cultural instruments from minimum requirements to optimal models

    Qual Life Res

    (1993)
  • J.E. Ware et al.

    The IQOLA Project Group. The SF-36 Health SurveyDevelopment and use in mental health research and the IQOLA Project

    Int J Ment Health

    (1994)
  • M. Almagor et al.

    The two-factor model of self-reported moodA cross-cultural replication

    J Pers Assess

    (1989)
  • N.A. Fouad et al.

    Convergent validity of the Spanish and English forms of the Strong-Campbell Interest Inventory for bilingual Hispanic high school students

    J Coun Psychol

    (1984)
  • L.-M.P. Lee et al.

    Confirmatory factor analyses of the Wechsler Intelligence Scale for children-revised and the Hong Kong–Wechsler Intelligence Scale for children

    Educ Psychol Meas

    (1988)
  • R.J. Vallerand et al.

    The Academic Motivation ScaleA measure of intrinsic, extrinsic, and a motivation in education

    Educ Psychol Meas

    (1992)
  • T. Bice et al.

    Cross-national comparative research on the utilization of medical services

    Med Care

    (1971)
  • W.W. Dressler et al.

    Comparative research in social epidemiologyMeasurement issues

    Ethn Dis

    (1991)
  • J. Liang et al.

    The structure of self-reported physical health among the aged in the United States and Japan

    Med Care

    (1991)
  • J. Liang et al.

    The structure of the mental health inventory among Chinese in Taiwan

    Med Care

    (1992)
  • J.E. Ware et al.

    The MOS 36-Item Short-Form Health Survey (SF-36). IConceptual framework and item selection

    Med Care

    (1992)
  • C.A. McHorney et al.

    The MOS 36-Item Short-Form Health Survey (SF-36). IIPsychometric and clinical tests of validity in measuring physical and mental health constructs

    Med Care

    (1993)
  • C.A. McHorney et al.

    The MOS 36-Item Short-Form Health Survey (SF-36). IIITests of data quality, scaling assumptions and reliability across diverse patient groups

    Med Care

    (1994)
  • C.A. McHorney et al.

    Comparisons of the costs and quality of norms for the SF-36 Health Survey collected by mail versus telephone interviewResults from a national survey

    Med Care

    (1994)
  • J.E. Brazier et al.

    Valdating the SF-36 Health Survey QuestionnaireNew outcome measure for primary care

    Br Med J

    (1992)
  • L. Thalji et al.

    1990 National Survey of Functional Health StatusFinal Report

    (1991)
  • R.D. Hays et al.

    The structure of self-reported health in chronic disease patients

    Psychol Assess

    (1990)
  • J.E. Ware et al.

    Conceptualization and Measurement of Health for Adults in the Health Insurance Study, Vol. VI, R-1987/6-HEW

    (1980)
  • J.E. Ware et al.

    SF-36 Health Survey Manual and Interpretation Guide

    (1993)
  • Cited by (202)

    • Exploring the direct and indirect impacts of climate variability on armed conflict in South Asia

      2022, iScience
      Citation Excerpt :

      We use infant mortality rate from the Global Subnational Infant Mortality Rates, Version1 (GSIMR.v1) (Interagency Group for Child Mortality Estimation, 2011) as a proxy of socioeconomic status because the infant mortality rate can also serve as a broad measure of socioeconomic status when measures like gross domestic product per capita are difficult to obtain on grid-scale (Interagency Group for Child Mortality Estimation, 2011; O'Loughlin et al., 2012). We used the SEM to fit data because it enables us to both propose hypotheses about how each variable works (correlations, direct, and indirect relationships among variables) and to test these relationships with real data (Keller et al., 1998). Therefore, SEM is particularly suitable for confirming the direct and indirect effects of climate and non-climate factors on armed conflict (Adedeji et al., 2016).

    • The independent association of source-specific transportation noise exposure, noise annoyance and noise sensitivity with health-related quality of life

      2020, Environment International
      Citation Excerpt :

      At each survey, the regional ethics committees granted ethics approval and participants provided written informed consent prior to participation. HRQoL was assessed using the 36-Item Short-Form Health Survey (SF-36), a widely used HRQoL assessment tool, which was validated in large population-based surveys as well as in clinical settings (Hart et al., 2015; Keller et al., 1998). The questionnaire provides a summary of physical component scores (PCS) and mental component scores (MCS), based on eight domains.

    View all citing articles on Scopus
    View full text