There are well-documented gender differences in health and health behaviors as reflected in various measures of morbidity, mortality and health care utilization [1
]. In 2002, age-adjusted death rates for 8 out of 10 leading causes of death in the United States were greater for men than for women, and men were more likely to suffer from chronic conditions that are more severe [1
]. Men have more life-threatening chronic diseases, including coronary heart disease, cancer, cerebrovascular disease, emphysema, cirrhosis of the liver, kidney disease and atherosclerosis, while women have higher rates of chronic disabling disorders [3
]. Women are more likely to suffer from autoimmune and rheumatologic disorders and from other non-life-threatening diseases, such as anemia, thyroid conditions, gallbladder conditions, migraines, arthritis and eczema [3
]. Women are also more likely to experience depressive and anxiety disorders while men more commonly experience antisocial behavior, substance abuse and suicide [1
]. Differences in morbidity and mortality between men and women create a complex relationship between gender and health.
Preference scored, self-reported health-related quality-of-life (HRQoL) indexes are widely used to summarize the overall health of individuals or groups and to estimate quality-adjusted life-years for use in cost-effectiveness analyses [5
]. Although men may seem less healthy than women based on their lower life expectancy at all ages and a greater likelihood of suffering from life-threatening diseases, several studies have found women to have lower HRQoL scores than men [5
]. However, the evidence to date is not clear, because population-based analyses of HRQoL more often control for potential confounding by gender than study gender as an independent variable. No study has systematically examined the relationship of gender and HRQoL across surveys and measures.
Research on HRQoL and more simple measures of self-reported health status demonstrate that self-rated health differs across gender and other sociodemographic and socioeconomic status (SES) characteristics, such as race, marital status, education and income [6
]. Liu and Umberson (2008) found self-reported health of those widowed, divorced and separated to be poor relative to those who are married, especially among women [12
]. The percentage of women who are married decreases with age and is lower than that of men after age 45 [15
]. Moreover, HRQoL varies significantly by SES [13
], and it is well known that women have lower individual income than men [14
]. Hence, gender differences in HRQoL may be partly due to differences in sociodemographic and SES factors.
Several publicly available, nationally representative surveys of HRQoL present an opportunity to examine the consistency of gender difference in HRQoL. Each survey utilized at least one of five commonly used HRQoL indexes, the Short Form 6 dimension (SF-6D) [16
], EuroQol 5 dimension (EQ-5D) [18
], the Health Utilities Index Mark 2 (HUI2) [20
] and Mark 3 (HUI3) [21
] and the Quality of Well-Being Scale Self-Administered form (QWB-SA) [22
], and each index, except for the QWB-SA, was used by at least two different surveys. Each survey also collected information on respondent sociodemographic and SES characteristics.
In this study, parallel analyses within each dataset are performed, as well as analyses that pool the surveys. The following hypotheses are tested: (1) women report lower HRQoL than men on the five HRQoL indexes, and (2) differences in sociodemographic and SES characteristics between men and women explain gender differences in HRQoL.
The five health indexes, SF-6D, EQ-5D, HUI2, HUI3 and QWB-SA, are preference-based generic HRQoL measures anchored on a cardinal scale by 0 (dead) and 1 (full health). The EQ-5D, HUI2 and HUI3 allow health states with utilities less than zero (worse than dead). Although the five indexes are assumed to be measuring the same concept of health on the same theoretical scale, they in fact differ in construction, in coverage of health domains and in their numerical ranges.
The SF-6D index was developed as a summary utility scale based on items from the SF-36v2TM
(a subset of SF-36v2TM
) questionnaires [16
]. It refers to health in the “past 4 weeks” and covers six health domains: physical function, role limitation, social function, pain, mental health and vitality. The SF-6D index produces single summary utility scores ranging from 0.35 to 1.0. In this study, the SF-12v2TM
version of SF-6D is used.
The EuroQoL EQ-5D refers to health “today” and incorporates five domains of health, mobility, self-care, usual activities, pain/discomfort and anxiety/depression. This index produces summary utility scores that range from −0.11 to 1 [18
]. US weights are used in this study [19
The HUI2 and HUI3 indexes refer to health “in the past week.” HUI2 has six domains (sensation, mobility, emotion, cognition, self-care, pain), and HUI3 has eight domains (vision, hearing, speech, ambulation, dexterity, emotion, cognition, pain) for which data are collected using the proprietary Health Utilities Index questionnaire [20
]. Summary utility scores range from −0.03 to 1.0 for HUI2 and from −0.36 to 1.0 for HUI3.
The QWB-SA index refers to the past three days and covers four domains: mobility/self-care, physical activity, self-care/usual activity and acute/chronic symptoms [22
]. The QWB-SA produces a summary utility score ranging from 0.09 to 1.0.
Data and variables
These analyses use four publicly available data sets that were collected during similar time frames and contain HRQoL measures of interest as described in Table 1
. All four data sets contain survey weights for producing estimates generalizable to the non-institutionalized adult population in the United States. All participants gave written informed consent at the inception of each of the four surveys. The use of these publically available data sets for this study was approved by the University of Wisconsin-Madison Health Sciences Institutional Review Board.
Descriptions of four US representative surveys
Year of administration
Full year of 2003
June 2005–August 2006
November 2002–March 2003
Form of administration
Age of respondents
18+ years old
18–90 years old
35–89 years old
18+ years old
No. of US respondents (proxy, non-proxy)
No. of US respondentsa (non-proxy, ages 35–89, Blacks + Whites only)
2,471 (includes 719 non-Hispanic Blacks, 1,752 Hispanics and Whites)
13,195 (includes 38 Hispanic Blacks, 2,582 Hispanic Whites)
3,648 (includes 23 Hispanic Blacks, 47 Hispanic Whites)
3,186 (includes 3 Hispanic Blacks, 59 Hispanic Whites)
Response rate (proxy, non-proxy)
| || |
| || |
The US Valuation of the EuroQol EQ-5D Health States Survey (USVEQ) produced a nationally representative sample from civilian non-institutionalized residents of ages 18 and older within the 50 US states and the District of Columbia [19
]. Data were collected between June and October of 2002. It was administered via paper-and-pencil in face-to-face interviews (PAPI). Survey weights were post-stratified by age (18+), gender and race (Hispanic, non-Hispanic black and other).
The Medical Expenditure Panel Survey (MEPS) has been administered annually to the US non-institutionalized population ages 18–90 years old (for confidentiality purposes, MEPS coded all ages >85 as ‘85’) since 1996 to obtain information on health care utilization and expenditures via mailed self-administered questionnaire (SAQ) [25
]. Survey weights were post-stratified by six variables (race/ethnicity: Hispanic, black non-Hispanic, and other; sex; age; poverty status; census region; metropolitan statistical area). The 2003 MEPS data are used in the current study to best match the time frame of the other nationally representative data sets.
The National Health Measurement Study (NHMS) is a nationally representative computer-assisted telephone interview (CATI) survey of non-institutionalized adults between the ages of 35 and 89 residing in the continental US [6
]. The survey was conducted between June 2005 and August 2006. The survey weights were post-stratified by gender, race (black, white and other) and age (35–44, 45–66, 65+).
The Joint Canada/US Survey of Health (JCUSH) is a cross-sectional random-digit-dialed telephone survey conducted in both Canada and the United States, administered via a CATI [24
]. Data collection took place between November 2002 and March 2003. The study included people ages 18 and over in both countries and excluded people who were institutionalized or living in United States or Canadian territories. JCUSH survey weights were post-stratified by age (18–44, 44–64, 65+) and gender.
The current analyses target non-proxy US respondents whose ages were between 35 and 89 years and who reported their race/ethnicity as either white/Caucasian or black/African American (Table 1
). The ‘other’ race/ethnicity category was excluded from our analyses since these data contained too few individuals from each of these many racial and ethnic subgroups of the US population. NHMS, JCUSH and MEPS specified their race categories as white/Caucasian and black/African American while race categories in USVEQ were black/African American (non-Hispanic), white (non-Hispanic) and Hispanic/Latino. For comparability with the other surveys, we coded Hispanic/Latino from USVEQ as white/Caucasian. The majority (at least 67%) of Hispanic respondents in the other three studies self-reported themselves as white.
Our analyses adjusted for sociodemographic and SES variables. The variables in the different data sets were coded as similarly as possible. Table 2
describes the coding and shows the distributions of the resulting independent variables by dataset and gender.
Weighted proportions (shown in percents) of sample characteristics by dataset and gender
| || |
Total N (unweighted)
| || |
Married/living with partner
Less than high school (HS)
Some post-HS education
College degree or higher
| || || || || || || || || || || || || |
Lowest income level
2nd lowest income level
3rd lowest income level
Highest income level
The datasets differed only slightly in coding of the income variable. All studies but MEPS measured total household income, whereas MEPS measured income as family income in terms of percent of the poverty line based on number of family members. According to Department of Health and Human Services Poverty Guidelines, income as percent of the Federal Poverty Line (FPL), is classified as poor: <100%, near poor: 100–125%, low: 125–200%, middle: 200–400% and high: 400% or greater. FPL varies according to number of members in a family [26
]. The number of family members and midpoints of FPL percentage ranges were used to convert the MEPS income classifications into dollar values. These were grouped into four income categories comparable to those of the other three studies.
The categorization of age was the same in all four data sets. Marital status and education questions had variations in phrasing and answer options. Categories were created to be relatively similar across surveys.
We fit survey-weighted least squares (WLS) regression models separately for each data set with HRQoL scores as outcome variables. Gender, age and race (model 1) were included in all models; and marital status (model 2), education (model 3) and income (model 4), were added one at a time and simultaneously (model 5). All covariates were modeled as indicator variables. Gender differences used women as the reference category by coding men as “1” and women as “0”.
Estimates of gender difference were standardized via model-based adjustment to the marginal Census 2000 proportions (www.factfinder.census.gov
) of variables not already used for post-stratification as shown in Table 2
. This was achieved by including interactions of gender with these sociodemographic and SES variables centered at their percentage representation in the Census.
Additional analyses were conducted by pooling data sets: MEPS and NHMS (for SF-6D); USVEQ, MEPS and NHMS (for EQ-5D); JCUSH, NHMS and USVEQ (for HUI3); and USVEQ and NHMS (for HUI2). In the WLS analyses of the pooled data, each dataset was specified as a sampling stratum, and indicator variables for the particular data sets were included in the models.
All analyses were conducted using SAS/STAT® System for Windows (version 9.1) applying procedures PROC SURVEYFREQ, SURVEYMEANS and SURVEYREG, which incorporated survey weights to produce US nationally representative estimates (Copyright 2002–2003 SAS Institute Inc., Cary, NC, USA).
This study confirms that women self-report worse health than men on five commonly used HRQoL indexes. This difference in HRQoL appears to be explained in large part by differences in sociodemographic and SES characteristics between men and women in the US population. Large differences in HRQoL by SES have been previously documented [13
]. The lower average income of women appears to account for much of their disadvantage in HRQoL.
The gender-associated differences seen in HRQoL depend somewhat on the particular HRQoL index used. The estimates range from 0.02 to 0.03 (higher HRQoL for men) without adjustment and from −0.01 to 0.03 with adjustment for marital status, education and income. Only the SF-6D and QWB-SA displayed magnitudes of gender differences that approached 0.03, considered substantively important for preference-based HRQoL indexes [10
The gender differences on all HRQoL indexes followed parallel trajectories of change with adjustment for sociodemographic and SES variables. Taking age and race differences into account generally did not explain the gender difference in HRQoL and produced estimates that were quite similar across measures (except somewhat higher for SF-6D and lower for EQ-5D). Once further adjustments for marital status, education and income were taken into account, gender differences became small on most HRQoL measures. Taking income differentials into account resulted in estimates that were again quite consistent across measures with a few exceptions. EQ-5D retained a statistically significant gender difference in the pooled analysis due to the large sample size of MEPS, although the magnitude of the difference was similar to the non-significant differences in other measures. In contrast, after controlling for income or marital status or all covariates simultaneously, the gender difference in SF-6D remained moderate and statistically significant in MEPS and pooled analyses, as did the difference in QWB-SA from NHMS.
Several circumstances may have contributed to the greater and persistent gender differences in SF-6D and QWB-SA. It is well known that the five HRQoL measures differ in construction (e.g., HRQoL domains, time frames, elicitation methods and scoring equations) and distributional properties (e.g., ceiling/floor effects, numerical range), although they purport to represent the same evaluation of a given level of health [6
]. The observed gender differences in HRQoL reported here may then be in part an artifact of the particular HRQoL index used. For example, SF-6D and QWB-SA have minimal ceiling effects compared to the other three indexes [6
], which may lead to these two measures identifying gender differences between relatively healthy men and women. The QWB-SA is also the only measure that incorporates in its summary score a list of 58 acute/chronic symptoms and health conditions, and such symptoms may differ by gender. The fact that the remaining gender difference in SF-6D after adjustment for marital status and income (in the pooled analysis) arose almost entirely from MEPS and not from the NHMS raises the possibility that differences in adjustment variable categorization may have led to differences in results, such as differences in measurement of the income variable in MEPS.
MEPS and NHMS also differ in mode of administration. Hanmer et al. (2007) [7
] showed that self-administered surveys (e.g., MEPS) yield lower HRQoL scores than telephone surveys (e.g., NHMS) on the EQ-5D, Visual Analog Scale, HUI3 and general self-rated health question. Hays et al. (2009) [29
] found HRQoL scores are more positive for phone administration following mail administration with differences between modes: 0.06 (SF-6D), 0.03 (QWB-SA), 0.08 (EQ-5D), 0.04 (HUI2) and 0.10 (HUI3). Further analyses (not shown) revealed that distributions on domains of SF-6D in MEPS and NHMS are similar except for the mental health domain. Greater proportions of both men and women reported lower levels of mental health in MEPS than in NHMS; however, the proportions differed more for women than men between the two datasets. This may indicate that differences in SF-6D results between MEPS and NHMS may be related to a greater mode effect among women on SF-6D.
Alternatively, persistent gender differences in SF-6D and QWB-SA may reflect unique health variation captured by these indexes but not by EQ-5D, HUI2 or HUI3, or gender biases in item responses. Fleishman and Lawrence (2003) found, based on the MEPS data, that some questions on SF-12v2TM
are prone to differential item functioning (DIF) between the genders, i.e., different responses to the questions by men and women having the same underlying health [30
]. In particular, the two questions that are used to estimate the mental heath component of SF-6D (“felt downhearted”, “had energy”) were found to have DIF by gender [17
]. Fleishman and Lawrence (2003) showed that adjusting for DIF generally reduced gender differences in mental health [30
HUI3 in NHMS and JCUSH was the only HRQoL index that showed a reverse direction of the gender difference, though not statistically significant, with simultaneous adjustment for age, race, marital status, education and income. HUI3 was also more affected by adjustments for SES than were other measures. The relatively greater sensitivity to SES adjustments on HUI3 is consistent with findings of Robert et al. (2009) [13
] who showed that income-associated disparities appear wider for the HUI3 than for SF-6D or EQ-5D.
There may be several reasons for minor differences in results between the studies. Random variation, differences in sampling and mode effects (JCUSH and NHMS were phone surveys; MEPS and USVEQ were self-administered) may have led to such differences [7
]. Our analysis aims to represent the sociodemographic and SES mix of the US population and may be smaller or larger in certain subgroups or in populations with a different sociodemographic and SES mix. Future research on such interactions is warranted. Furthermore, the analysis adjusted the results to marginal percentages in age, race and socioeconomic subgroups, but not to percentages in subgroups formed by 2 or 3-way cross-tabulation of these variables, which may also have been affected by response biases.
Small gender differences in HRQoL that persist even after adjustments for available sociodemographic and SES characteristics indicate either that this variation has been incompletely measured, or that other factors are contributing to gender differences in HRQoL. We may not be measuring accurately the full distribution of marital status, education and income, especially as we have collapsed these variables in our attempt to achieve consistency across data sets. In addition, it may be that dynamics of marital status and income over the life course contribute significantly to gender differences in HRQoL—dynamic effects that our cross-sectional analyses cannot detect. Clearly, age, race, marital status, education and income are likely not the only sociodemographic and SES variables that contribute to HRQoL differences between men and women.
Our analysis focuses on whether there are gender differences in HRQoL and whether these differences are related to sociodemographic and SES differentials between men and women. Alternatively, the analysis could have examined whether other measures of self-perceived health (e.g., general self-rated health question, symptoms/conditions and morbidity indexes) explain gender differences in HRQoL. Such analyses would aim to answer different questions and may lead to different conclusion regarding gender differences. Several considerations would affect the interpretation of analyses attempting to adjust for such measures, including measurement error or bias in the adjustment variables themselves. HRQoL measures have been developed to capture the multiple dimensions of the health experience beyond the presence of disease. However, examining gender differences in HRQoL among men and women with the same illness or disease may be a fruitful direction of future research.
This study has several limitations. These analyses are based on cross-sectional data so it was not possible to assess gender differences in changes in HRQoL over time. The inability to consistently control for household or family size in measuring household income may be a weakness of this study. Additionally, this study is based on a non-institutionalized sample and at least two institutionalized segments of the US population, those who are hospitalized or in nursing homes, are likely to have the worst HRQoL than people in our study and are predominantly women [32
]. Due to sample age and race restrictions in the analyses, only community-residing 35–89-year-old black and white US adults are represented in the results.
The primary strength of the study is the simultaneous use of four recently conducted, large nationally representative surveys among US adults and several commonly used preference-based measures of HRQoL. To further understand the scope of gender differences in HRQoL, it is important to conduct future research on gender and HRQoL in other subpopulations of the United States, including residents ages <35, other racial subgroups of the population and people living in institutions. Incorporating longitudinal data would allow an assessment of how aging affects changes in HRQoL in men and women. It is also important to assess whether gender differences in HRQoL are larger in some subgroups of the population.