Introduction
Subjective quality of life (QoL) is an important and widely used measure of health [
1]. Quality-of-life assessments generally require respondents to rate their physical or psychological health status, or overall life satisfaction, on an ordinal Likert scale from ‘poor’ or ‘very poor’ to ‘very good’ or ‘excellent’. Single items or overall measures can be very useful indicators of health and health inequalities [
2,
3]. Additionally, the brevity of single-item measures can reduce survey respondent burden and costs [
3]. They are however prone to greater measurement error, which, if overlooked, may lead to inaccurate assumptions and conclusions.
Self-assessed scale measures can fail to provide meaningful results when there are differences in reporting behaviours across populations. Depending on their experiences and expectations, individuals interpret and respond to scale categories in different ways. Regardless of their underlying state of being, some people have a tendency to respond in the affirmative rather than to disagree, while others have a tendency to use the extreme or middle points of a scale. When this behaviour is systematic across population groups, it can lead to distorted or biased research findings. A number of terms have been used to describe these differences in scaling behaviour including ‘scale of reference bias’ [
4], ‘response category cut-point shift’ [
5], ‘reporting heterogeneity’ [
6,
7], ‘differential item functioning’ [
8,
9], and ‘scale perception bias’ [
10].
In Western societies, people are generally positive about their overall QoL and will typically rate themselves towards the healthier end of a scale [
11,
12]. However, differences in scale rating of QoL have been observed across age and gender, socio-economic, culture, and language groups [
6,
12‐
14]. What makes subjective QoL so challenging to measure is that there is no universal agreement on how it is defined. As a result, many different instruments have been developed, each derived from a different conceptual understanding of QoL [
15,
16]. Patient or survey respondents asked to rate their QoL may also interpret QoL differently, based on their own definition of QoL which is not necessarily in accord with definition presupposed by the researchers [
17].
Given the importance of QoL as a health measure [
1], disentangling reporting behaviour, incongruent interpretations of QoL, and population thresholds from latent well-being are essential for meaningful interpretation and comparison of subjective QoL data. The use of anchoring vignettes is one method for revealing scale perception bias and evaluating otherwise incomparable data. Vignettes are descriptions of hypothetical persons or situations that respondents are asked to rate on the same construct as a question about their own experience. The vignettes are rated on the same scale as the self-rated question [
18]. The vignettes act as a set of reference points which are used to expose individual thresholds on a common scale. This allows the individual’s self-assessed responses to be assessed on the same dimension.
To date, few studies have used anchoring vignettes in the interpretation of QoL outcomes. Murray et al. [
5] first applied vignettes to measure self-rated health across the WHO Multi-country Household Study on Health and Responsiveness. The methodology has since been applied to QoL measures including self-rated health and life satisfaction in only a few incidences, which is surprising given the large number of studies which have investigated QoL outcomes [
19‐
25]. Often, researchers fail to investigate the presence of scale bias and provide biased results, or choose to remove the bias by discarding or analyse groups separately and avoid comparisons [
26]. This is an unnecessary loss and can be avoided through application of the anchoring vignette approach.
It is possible that the low take-up of anchoring vignettes may be due to the perceived technicality of the anchoring vignette approach. Nonparametric rescaling of data and sophisticated multilevel regression modelling have been proposed as analysis methods [
27,
28]. Nonparametric models recalibrate the distribution of responses to a comparable scale, by adjusting for the individual’s scale behaviour. In other words, the thresholds the individual used when they rated the hypothetical vignettes on a scale are then used to reinterpret and rescale the responses to a question about their own perceptions. The parametric models go further than simply rescaling the data by providing parameter estimates, and adjust for the variance of the individual thresholds in the scale responses. As both parametric and nonparametric methods have strengths and weaknesses, we apply both to compare QoL association with transport outcomes.
The Sydney Travel and Health Study (STAHS) is a longitudinal study of residents living in the inner-city suburbs of Sydney, Australia, which aims to measure the health (including QoL), transport, and economic impact of new cycling infrastructure [
29]. How QoL is affected by changes in the urban built environment such as traffic and transport is an increasingly important issue in public health [
30]. The detrimental effect of commuting stress on physical and psychological well-being is increasingly recognised [
31,
32], while the benefits of more active modes of travel (primarily cycling and walking) are also gradually being understood [
33,
34]. However, very few studies have sought to investigate QoL and transportation and compare differences between travel modes, specifically between active travel modes, and fewer still have included cycling. No transport and QoL study has as yet used anchoring vignettes and adjusted for scale perception bias.
With this in mind, the two primary purposes of this paper were to (1) examine scale perception bias in two single-item QoL questions: overall QoL and health satisfaction; and (2) model the relationship between commuting travel mode and QoL in the STAHS using nonparametric and parametric multilevel ordinal logistic regressions to adjust for these biases.
Analysis
Data analysis was conducted as follows: data assumptions were tested; differences in reporting behaviours were then investigated; and then associations between QoL and transport modes were modelled using the two corrected approaches and compared with standard ordinal logistic regression analysis.
The distributions of the QoL and vignette variables were examined. The two lowest QoL categories (i.e. very poor and poor) were collapsed. The correlation between overall QoL and health satisfaction and WHOQOL physical health domain variables was tested (Spearman’s rho). The underlying assumptions of the vignettes were then evaluated. Lacking an objective measure of QoL, we investigated consistency across the three vignettes within the intended order. We also hypothesised that self-reported responses would be more likely to positively correlate with vignette 1 than vignette 3, and tested these correlations. We then tested the vignette equivalence according to the pattern where V1 ≥ V2 ≥ V3 and removed cases where this order was violated.
To illustrate scale perception bias, the rating of each vignette was compared between demographic groups (χ
2). As the vignettes are fixed levels, there should be no difference between groups. For example, both men and women should rate the vignette in the same way. Significant associations (p < 0.05) would suggest different reporting behaviour between demographic groups. Income and education variables were also tested in their un-collapsed categories. The interaction between the QoL and demographic variables was then modelled using ordinal logistic regression.
Finally, the association between QoL and transport modes was analysed in three ways. A standard ordinal regression model was constructed, which adjusted for age, sex, income, and education. We called this the unadjusted model to differentiate it from the models correcting for scale perception bias. Secondly, scale biases were then corrected using the nonparametric approach described by King and Wand [
27]. The QoL variables (overall QoL, health satisfaction) were rescaled according to the thresholds used by the respondent to rate the vignettes. The new QoL variables contained seven categories (based on the number of vignettes 2V + 1). If the self-rated response
X was greater than the levels described by the vignettes, such that
X > V1 > V2 > V3, then the new self-response Q was designated the highest category, seven and so forth (see Table
1 for full details). Where vignettes ratings were tied, for example
X > V1 > V2 = V3, where V2 and V3 were given equal weighting, then more than one category would be valid. To deal with these inconsistencies, tied responses were designated the mean category of all possible categories that would apply for the given response. Inconsistent responses which violated vignette assumptions were excluded (
n = 12). The rescaled variable was then analysed in the same way as the standard model.
Table 1
Nonparametric rescaling of quality-of-life (QoL) variables through the use of anchoring vignettes
X > V1 > V2 > V3 | Ordered | 7 |
X = V1 > V2 > V3 | Ordered | 6 |
V1 > X > V2 > V3 | Ordered | 5 |
V1 > X = V2 > V3 | Ordered | 4 |
V1 > V2 > X > V3 | Ordered | 3 |
V1 > V2 > X = V3 | Ordered | 2 |
V1 > V2 > V3 > X
| Ordered | 1 |
X > V1 > V2 = V3 | Tied | 7 |
X > V1 = V2 = V3 | Tied | 7 |
X > V1 = V2 > V3 | Tied | 7 |
X = V1 > V2 = V3 | Tied | 6 |
X = V1 = V2 > V3 | Tied | 3, 4, 5, 6 |
X = V1 = V2 = V3 | Tied | 2, 3, 4, 5, 6 |
V1 > X > V2 = V3 | Tied | 3, 4, 5 |
V1 > X = V2 = V3 | Tied | 2, 3, 4 |
V1 = V2 > X > V3 | Tied | 3 |
V1 = V2 > X = V3 | Tied | 2 |
V1 = V2 > V3 > X
| Tied | 1 |
V1 = V2 = V3 > X
| Tied | 1 |
V1 > V2 = V3 > X
| Tied | 1 |
In the final parametric model, the observed QoL response was allowed to vary according to the thresholds the respondent used, and individual thresholds are treated as a function of the covariates (as determined by the vignette anchor points). We first applied a hierarchical ordinal probit model in Stata using the gllamm function according to the example provided by Rabe-Hesketh and Skrondal [
37]. We then applied a cumulative logit link. Logit models are more useful in explaining health outcomes and, unlike probit models, can be interpreted with odds ratios. The models’ fit was then compared using Akaike information criteria (AIC) [
38] and Bayesian information criteria (BIC) [
39], where the smallest criterion represents the model with the smallest information loss. As the models were non-nested and the complex design of the parametric model relied on transformed data, differentiating it from the previous models, the criterion information was weighted to the sample to reduce penalising the parametric model [
40].
In each model, linearity of age was tested and confirmed as appropriate. Interaction terms were tested and effect modification rejected. For each model, the proportional log odds assumptions for ordinal logistic regression were tested, and no violation was observed. For missing income data (9 %) it was assumed a full-time student, unemployed, welfare recipient, or homemaker was less likely to be in the high bracket income. Otherwise, missing demographic data (missing income n = 3; education n = 6) were excluded, and only unique data retained. All statistical analyses were conducted using Stata version 13 (StataCorp LP, College Station, TX).
Discussion
This study sought to adjust for the presence of scale perception bias in the self-rating of QoL in a sample of Australian city dwellers in order to appropriately analyse the relationship between commuting mode and QoL. Simple nonparametric rescaling of the data and parametric multilevel modelling was used to detect and adjust for differences in the rating behaviour across demographic groups. The vignettes were used to create fixed thresholds to compare findings. Application of the vignette methodology to the association between travel mode and QoL revealed some interesting findings that were not detected through conventional modelling. Using anchoring vignettes, we were able to detect significant differences in the overall QoL and health satisfaction between bicycle commuters and those who commuted by foot, motor vehicle, and public transport modes.
Demographic differences often exist across different modes of travel. For example, a higher proportion of men commute to work or study in Australia by bicycle or drive to work, while women are more likely to take public transport [
42]. These mode share differences were reflected in this study. As a result of demographic differences in mode share, scale perception differences in QoL between demographic groups had a greater confounding effect on the relationship between travel mode and QoL than would have been observed had there been greater equality across travel modes.
To date, there has been very little research that has investigated the relationship between travel mode and well-being. Transportation appraisals and transport policy decisions too often fail to include the experience of the transport journeys from the user’s perspective with unconvincing efforts to translate subjective metrics of the user experience (comfort, convenience, QoL) into financial costs and benefits that can be compared alongside traditional measures such as travel time costs [
43‐
45]. The association between transport QoL and health and well-being is however an emerging area of interest [
45,
46]. The effect of travel on overall QoL and health has broader implications for infrastructure and urban planning and is particularly important in terms of sustainable transport investment. In many cities, such as Sydney, Australia, where these data were collected, commuting by bicycle is inhibited by a lack of cycling infrastructure and safe routes for travel. This has the potential to negatively impact on QoL. However, there is good evidence that moderately intense physical activity is associated with improved QoL and health satisfaction [
47]. Cycling offers other benefits that may not be attained through other travel modes such as the mental health benefits of being outdoors, a greater control and predictability of the journey, sense of fun and excitement in the journey, and personal cost-savings [
48,
49]. The higher intensity of cycling compared with walking may be what differentiates these modes in terms of QoL benefits. More research is needed to further explore causal associations between cycling and QoL.
The results of this study also provide a valuable illustration of the importance of measuring QoL appropriately. In the Canadian Community Health survey, Layes et al. [
13] observed that health status consistently varied across age and socio-economic levels as a result of reporting behaviour. The authors concluded that ‘it might be misleading to take self-rated health at face value as a measure of health status’ [
13]. For this QoL measure to continue to play an important role in population health research and policy development, they recommend that ‘its users must acknowledge and understand the determinants of self-rated health, including reporting behaviour’. QoL measures, particularly single items, face the problem of being undefined and therefore attract greater ambiguity. While there are many reasons why single-item QoL measures are used, we would argue that in order to make any comparison across individuals or populations, a common reference point needs to be introduced. The application of anchoring vignettes is one useful way of adjusting for reporting differences in scale threshold use, and of creating definitive parameters for abstract concepts such as QoL.
The standard ordinal logistic regression approach first used to analyse our data was unable to reveal actual associations due to scale biases. Logistic regression has been touted as an effective method for identifying reporting biases [
26,
50]. Yet without some method to adjust for these scale biases, findings remain distorted. Two approaches were used in this study to adjust for scale bias, following those first proposed by King and colleagues [
8,
51]. Parametric models provide greater precision over the nonparametric rescaling, yet they support the same outcome. One of the issues with the nonparametric approach is that any tied responses need to be scaled, and this becomes problematic when more than one of the scale categories are possible. However, there is a place for the more simplistic rescaled model over the decision not to adjust for scale bias. Parametric approaches require larger datasets and more sophisticated analysis. Nonparametric models which recalibrate the distribution of responses according to a common reporting scale are simpler to replicate and appropriate for less sophisticated statistical software, yet they require vignette questions to be asked of all respondents.
The QoL variables used in this analysis were taken from the two umbrella items in the WHOQOL-BREF. We tested the ability to use levels of health as vignette equivalences for health satisfaction and overall QoL in the assumption that scale perception bias for overall QoL could likewise be identified through the anchoring of responses to health specific scenarios. To confirm this, the correlation relationship between the single-item overall QoL and health satisfaction variables and health domains of the WHOQOL-BREF were tested.
The WHOQOL-BREF is designed for cross-country population use. While the content of the WHOQOL-BREF may be cross-culturally valid, differences in the interpretation of scales across populations are still likely to influence results, as observed in this study. The use of appropriate vignettes would address this limitation in the ability to compare findings across population groups.
The STAHS sample used in this analysis is a small sample of Australian inner-city residents. The sample was highly educated and as such not representative of the larger population. The sample was useful for this analysis because respondents were exposed to a number of public transport options and were included if they had ever ridden a bicycle. Thus, their choice of transport was not necessarily inhibited in ways other communities with lower access to transport options may be. This enabled us to investigate the association between QoL and a range of transport choices, their level of QoL may however be unrepresentative of the wider population.