Skip to main content


Swipe om te navigeren naar een ander artikel

01-12-2012 | Uitgave 10/2012 Open Access

Quality of Life Research 10/2012

Assessing measurement invariance of a health-related quality-of-life questionnaire in radiotherapy patients

Quality of Life Research > Uitgave 10/2012
Bellinda L. King-Kallimanis, Claartje L. ter Hoeven, Hanneke C. de Haes, Ellen M. Smets, Caro C. E. Koning, Frans J. Oort


Many questionnaires have been developed to assess the different facets of cancer patients’ experiences before, during, and after their treatment. For example, there are questionnaires to assess their satisfaction with care, their health-related quality-of-life (HRQoL), or their preferred communication style with their oncologists. When developing scales for a general cancer population or testing differences between groups using well-established questionnaires, an important question to keep in mind is whether members of different groups assign the same meaning to questionnaire items. In other words, if there are two patients with the same level of overall satisfaction, will they respond to an observed item in the same way, or will specific characteristics, like gender or treatment regime, influence their response to the item. If it can be shown that these characteristics do not affect responses to observed items, then the assumption of measurement invariance has been met.
The assumption of measurement invariance requires that the relationships between the observed items and the latent construct remain constant regardless of respondents’ group membership, for example, age, race, or disease characteristics or the measurement occasion [1, 2]. If this assumption is violated, then the results from cross-group comparisons of the construct may be incorrect. This is because mean differences should represent true differences in the construct of interest and not reflect anything else. For example, it may be that a male patient and female patient share the same underlying level of Physical HRQoL. However, when asked a question about carrying groceries, the male who does not do the shopping may respond that he has no difficulty with this activity, whereas the female may indicate that she has great difficulty. The responses given to this grocery item are related not only to Physical HRQoL but also to gender roles. In this example, it is clear to see how gender roles and Physical HRQoL can become entangled. However, it may not always be obvious how patient characteristics might affect certain items. In a study by Reker and Fry [3], bias with respect to age was found in personal meaning measures. The authors concluded that bias in the Self-Transcendence Scales stemmed from older adults using events from the past as their frame of reference, whereas younger adults used present and future events as their frame of reference. When developing items for a scale, this type of bias will be difficult to anticipate and success can only be evaluated after scale development and piloting. If invariance testing yields positive results, in that the measurement is invariant, we can be confident that our results are not distorted because of different functioning of the measurement as a result of group membership. Unfortunately, measurement invariance of self-report questionnaires is often not investigated.
Establishing that a scale has good reliability and validity does not ensure that the scale will not violate the assumption of invariance. The European Organization for Research and Treatment of Cancer (EORTC) QLQ-C30, a measure developed to assess HRQoL, is considered to have excellent reliability and validity [4]. Specifically, it is a generic HRQoL measure for use with all cancer patients, with additional modules for specific cancer diagnosis (e.g., breast [5] and lung cancer [6]). Despite having excellent reliability and validity, the factor structure of this scale has received little attention [7, 8], and most measurement invariance testing has been conducted using item response theory (IRT) [9], with the primary focus on language translation [10, 11]. While designed for a general and therefore heterogeneous cancer population, it is possible that this heterogeneity will lead to a violation of measurement invariance. If we are interested in, for example, differences in HRQoL based on different treatment stages or information preferences, then before differences can be investigated, we must check whether measurement bias with respect to these variables is present. For example, patients who have already received treatment for their diagnosis may have experienced an unmeasured response shift [12]. This in turn could result in a shift in internal standards when responding to HRQoL items, whereas yet to be treated patients will not have experienced this phenomenon. In regard to patients’ information preferences, it is conceivable that patients with high compared to low levels of information preferences may respond to HRQoL items using a different frame of reference. This might be because patients who want more information may want this information to inform their family and friends of their treatment and prognosis [13], therefore, they might have a different frame of reference toward social functioning. Thus, before we investigate the relationships between these variables and HRQoL, we need to be sure that differences in HRQoL mean the same thing for patients in different treatment stages or with different information preferences.
Testing the assumption of measurement invariance in different situations and with different groups of people has been greatly facilitated by the development of several analytic techniques including IRT and structural equation modeling (SEM)/confirmatory factor analysis (CFA) [14]. Within the framework of SEM, there are three approaches available to assess whether the assumption of invariance holds. In cross-sectional research, multi-group CFA comparisons are the most frequently used method. However, to conduct such an analysis, a large sample is required as the sample must be split by group membership. Also, if the potential violator of invariance is continuous, the variable must be transformed to a discrete variable to create multiple groups, in doing this, there is a loss of information. Restricted factor analysis (RFA) is one alternative. The RFA specification allows for multiple groups to be tested simultaneously (i.e., sex and race) and continuous variables can be included as originally measured (i.e., age). These additional variables are modeled as single indicator exogenous variables in the RFA model and tested as possible violators of invariance [15, 16]. The RFA model is equivalent in overall fit and yields the same results as the third alternative, the multiple indicator, multiple cause (MIMIC) model. The difference between these two models is in how the relationships between the exogenous variables and the common factor(s) are modeled. In the MIMIC model, these relationships are causal and in RFA the relationships are not necessarily causal [17]. As we do not necessarily expect causal relationships between the exogenous variables and HRQoL and RFA has been shown in simulation studies to be a robust method [15, 16], we will use RFA.
By using the RFA approach, we can obtain further insight into the psychometric properties of the EORTC QLQ-C30 in a heterogeneous cancer sample. To achieve this we include and study simultaneously multiple variables that have the potential to violate the assumption of measurement invariance. Therefore, the aim of this paper is to investigate whether HRQoL scales are invariant with respect to age, sex, previous treatment for cancer, and patients’ information preferences. If any of the observed scales are biased with respect to the exogenous variables, group comparisons in relation to the variables investigated will be less meaningful. So, in doing this, we aim to better understand the construct of HRQoL as measured in a heterogeneous cancer population.


Participants and procedure

The current study constitutes a part of a larger research project, involving the use of several questionnaires as well as videotaping of the patients’ initial- and first follow-up consultation with the radiotherapist. Fifteen radiation oncologists of the radiotherapy department of the Academic Medical Centre in Amsterdam, the Netherlands, were invited to participate in the project. All agreed. Their consecutive newly registered patients were contacted by mail inviting them to participate in the study. Exclusion criteria were: (1) having undergone radiotherapy treatment before; (2) age < 18; (3) unable to understand the Dutch language; and (4) suffering cognitive limitation or cerebral malignity. Patients were asked for written informed consent, and were invited to fill out a questionnaire at home, prior to their first visit to a radiation oncologist. Non-responding patients were asked to fill in some background variables and one item measuring overall information need. The study was approved by the hospital’s medical ethics committee. For further study details, see [18, 19].



The Dutch language version of the EORTC QLQ-C30 was used to measure HRQoL [4]. It consists of 30 items, 15 items are used to create five subscales related to functioning, which include, Physical Functioning (5 items), Role Functioning (2 items), Emotional Functioning (4 items), Cognitive Functioning (2 items), and Social Functioning (2 items). Two items are used to measure Global Health Status. Thirteen items relate to symptoms experienced by the patient, seven of which are used to create three subscales, Fatigue (3 items), Nausea and Vomiting (2 items), and Pain (2 items). The remaining six items are single-item symptom scales. In this paper we focus only on the multi-item scales and not the single-item scales. Higher scores indicate better HRQoL in regard to functioning; higher scores indicate worse HRQoL in regard to symptoms.

Information preference

To measure a general level of information preference, we used one item from the Information Styles Questionnaire [20]. This item asked patients to indicate their information preference concerning disease and treatment on a 10-point response scale, ranging from 0 (no information needed) to 10 (prefers to receive all available information).

Previous treatment

Patients’ medical records were examined to identify whether they had received either chemotherapy or surgery to treat the same cancer tumor that was being treated by radiotherapy. This information was dichotomized; no treatment compared to previous treatment (chemotherapy/surgery).

Patient characteristics

We considered patients’ gender (0 = male; 1 = female) and their age (continuous).

Statistical analysis

To investigate measurement invariance, we used a two-step procedure; Step 1 involved establishing a measurement model using CFA and Step 2 tested the assumption of invariance with respect to specific patient variables (exogenous variables) by extending the CFA and using RFA. Maximum likelihood estimation was used and all analyses were conducted using the computer program Mx 3.2 [21].

Step 1: Establishing a measurement model

As there is no agreed upon CFA structure for the EORTC QLQ-C30, we aimed to find a satisfactory measurement model for the EORTC QLQ-C30 scales using CFA in our sample. For simplicity we focused only on the multi-item scales and investigate measurement invariance at the scale level. While the use of the symptom scales is controversial, we aimed to include them in our measurement model. Therefore, we fit three measurement models, two of which included the symptom scales, and a third that focused solely on the functioning scales. In Model 1.1, all nine scales loaded on one general HRQoL common factor, and in Model 1.2, all nine scales loaded on two common factors; Functioning HRQoL, which included the five functioning scales and the Global Health Status scale, and, Symptom HRQoL, which included the three symptom scales. Finally, in Model 1.3, the symptom scales were removed, and the five functioning scales and the Global Health Status scale loaded on one common factor, Functioning HRQoL.
To assess the overall goodness-of-fit of our models, the Chi-square test of exact fit, the root mean square error of approximation (RMSEA), and expected cross-validation index (ECVI) were considered [22]. A non-significant Chi-square value indicates good fit; however, it is sensitive to small deviations between the model and data. Therefore, we also considered the RMSEA and ECVI. An RMSEA value of <0.08 indicates satisfactory fit and a value of <0.05 indicates close fit [22]. The ECVI is used to assess the fit of nested alternative models, in other words it cannot be used as a stand-alone index; smaller values indicate improved model fit [22]. In addition to these overall model fit statistics, we also considered the standardized residuals to identify potential sources of model misfit and if required, guide appropriate model modifications.
If a new model was specified, we investigated the change in overall model fit by using both the Chi-square difference test and ECVI difference test. The Chi-square difference test is the difference in Chi-square values between the alternative and null models; if the difference is significant, the re-specification has improved model fit. The ECVI difference test is the difference in ECVI values for the alternative and null models; if the 90% CI does not include zero, then the re-specification has improved model fit [22]. It complements the Chi-square difference test, but it penalizes models containing more free parameters.

Step 2: Testing invariance with respect to exogenous variables

Using the final model from Step 1, we included all additional exogenous variables in the model. These included age, sex, previous treatment, and information preferences. These additional variables were allowed to correlate with the latent variable(s), but all direct effects of these variables on the observed scales were fixed to zero. A violation of invariance is indicated by a significant direct effect of an exogenous variable on an observed variable.
In order to identify significant direct effects, a series of iterative tests were conducted. We fit models where the direct effect between the exogenous variable and the observed scale under consideration was freed. For example, when investigating invariance associated with sex, we fit a series of models with an additional parameter for the effect of sex on each one of the EORTC QLQ-C30 scales included in the final measurement model from Step 1. This resulted in a series of one degree of freedom Chi-square difference tests. If any of these tests were significant at a Bonferroni corrected significance level [23] and the observed parameter change (OPC) was greater than 0.1, then we considered the scale to be non-invariant in relation to sex. The OPCs are the difference between the standardized parameter in the null model (equal to zero in this example) and the standardized parameter estimated in the test. We rely on Chi-square difference tests and OPCs rather than modification indices and expected parameter change because particularly the modification indices can be influenced by mis-specification elsewhere in the model [24]. We used a cut-point for the OPC of 0.1, which was based on Cohen’s small effect sizes [25].



The questionnaire was sent to 293 eligible radiotherapy patients of whom 159 (54%) agreed to participate. Non-responding patients were on average older than the participating patients (M = 66.61, SD = 13.49 versus M = 62.98, SD = 12.64; P = 0.019); no gender differences were found. In this paper we only focus on assessment before their initial radiotherapy consultation. Four patients had missing values on treatment information and were therefore excluded from further analyses, leaving 155 participants in the sample. Characteristics of the participating patients are given in Table 1. In Table 2 we present the correlations, means, and standard deviations of the items included in the RFA analysis. As can be seen, the assumption of multivariate normality was violated, therefore the resulting test statistic may not have a central Chi-square distribution, and the standard errors may not be correct [26]. Hoogland and Boomsma [27] suggest that this does not seriously bias the estimates of the model parameters in samples larger than 200; however, little is known about the effects of violations of multivariate normality in smaller samples such as the present sample.
Table 1
Descriptive statistics for demographic variables of radiotherapy patients (N = 155)
Number (%)
60 (38.71%)
Age (mean and SD)
62.98 (12.64)
Previous treatment—yes
65 (41.94%)
Information preference (mean and SD)
8.68 (1.98)
Cancer site
33 (21.43%)
24 (15.58%)
19 (12.34%)
16 (10.39%)
13 (8.44%)
7 (4.55%)
6 (3.90%)
6 (3.90%)
31 (20.00%)
Other cancer; e.g., gallbladder, testicular, pancreas, and non-Hodgkin lymphoma
Table 2
Correlations, means, and standard deviations for all observed variables
Prev. tx
Info. pref.
Prev. tx
Info pref
Mean (SD)
7.97 (2.05)
6.97 (3.11)
7.45 (1.94)
8.61 (1.86)
8.09 (2.36)
7.09 (2.17)
62.98 (12.64)
0.39 (0.49)
0.42 (0.49)
8.56 (2.26)
*P <0.05; **P <0.01; ***P <0.001

Step 1: Establishing a measurement model

We tested our three possible measurement models, and found all three models to have unsatisfactory overall model fit (Model 1.1; 1.2; 1.3 Table 3). Suggested modifications via the standardized residuals for Models 1.1 and 1.2 were difficult to interpret. Therefore, no modifications were made to these models. Model 1.3 included one source of misfit, a residual covariance between Physical Functioning and Emotional Functioning. With the inclusion of this additional parameter the model had satisfactory overall fit (Model 1.F, Table 3: Fig. 1) and all factor loadings were significant.
Table 3
Overall goodness-of-fit and Chi-square difference test results for EORTC QLQ-C30
P value
(90% CI)
(90% CI)
Comparison models
P value
(90% CI)
Measurement model 1
(0.056; 0.120)
(0.499; 0.780)
Measurement model 2
(0.058; 0.120)
(0.501; 0.790)
Measurement model 3
(0.053; 0.150)
(0.247; 0.440)
Final measurement model
(0.000; 0.131)
(0.231; 0.368)
1.F versus 1.3
(0.018; 0.178)
Addition of exogenous variables
(0.078; 0.135)
(0.729; 1.079)
Age and physical functioning
(0.041; 0.107)
(0.612; 0.879)
2.1 versus 2
(0.070; 0.299)
Previous treatment and emotional functioning
(0.024; 0.098)
(0.589; 0.827)
2.F versus 2.1
(0.001; 0.127)
Information preferences and global health status
(0.000; 0.090)
(0.594; 0.778)
2.F versus 2.3
(−0.003; 0.1074)
df degrees of freedom, RMSEA root mean square error of approximation, ECVI expected cross-validation index

Step 2: Testing invariance with respect to exogenous variables

We added the exogenous variables to the final model of Step 1 (Model 2, Table 3). The overall fit of the new model was not satisfactory. Although the parameter estimates of this model cannot be trusted due to poor fit, we still looked at the strength of the correlations between the exogenous variables and Functioning HRQoL before investigating invariance. The largest correlation observed in this model was between sex (−0.20) and Functioning HRQoL. The correlations between information preferences (−0.09), age (0.09), and previous treatment (0.05) were considered to be small.
In the series of tests investigating invariance, two violations of invariance were identified as the OPC was greater than 0.1, and both the Chi-square difference test and ECVI difference tests were significant. The first significant direct effect was with age on Physical Functioning (Model 2.1). The next and last significant direct effect identified was with previous treatment on Emotional Functioning (Model 2.F). After this iteration, the next largest direct effect was of information preferences on Global Health status (Model 2.3). The OPC was greater than 0.1 (−0.15); however, both the Chi-square difference test (according to Bonferroni adjustment) and ECVI difference test were not significant, therefore this direct effect was not included in the final model.
After the inclusion of these two direct effects, the overall fit of the model was satisfactory (Model 2.F, final parameter estimates Fig. 1). We re-examined the correlations between the exogenous variables and Functioning HRQoL to investigate the impact the inclusion of the additional direct effects had on the correlations. The correlations between age, gender, and information preferences and Functioning HRQoL increased. The largest increase was between age and Functioning HRQoL, the correlation increased from 0.09 to 0.17. There were slight increases in the correlations between gender and Functioning HRQoL (−0.20 to −0.22) and between information preferences and Functioning HRQoL (−0.09 to −0.10). There was no change in the correlation between previous treatment and Functioning HRQoL (0.05). See Fig. 1 for all final model parameter estimates.
We can now interpret how the two direct effects impact the observed scales (Physical Functioning and Emotional Functioning) in relation to the correlations between age and previous treatment and Functioning HRQoL. The direct effect of age on Physical Functioning was negative (−0.30) and suggests that older patients report worse PF than would be expected given there is a weak positive correlation between age and Functioning HRQoL. The direct effect of previous treatment on Emotional Functioning was also negative (−0.20) and suggests patients who have previously received treatment related to their current diagnosis report worse Emotional Functioning than would be expected given there is almost no relationship between previous treatment and Functioning HRQoL.


In this study, we investigated the assumption of measurement invariance for the EORTC QLQ-C30 in a heterogeneous population of cancer patients who were about to begin their first session of radiotherapy. Applying RFA, we investigated age, sex, previous treatment for cancer, and information preferences regarding treatment as potential violators of invariance. Two violations were identified in Physical Functioning with regard to age and in Emotional Functioning with regard to previous treatment.
In the first step, we were able to fit a measurement model to the EORTC QLQ-C30 that had satisfactory fit. Our final measurement model did not include any of the symptoms scales of the EORTC QLQ-C30. However, Fayers and Hand have argued that these symptom scales should not be used as manifestations of underlying HRQoL but rather as manifestations of treatment [28]. This is because one would expect a different factor structure for symptoms dependent on the type of treatment the patient was undergoing. While this substantive debate is beyond the scope of this paper, in the current sample, the patients are in different stages of treatment, and this may explain why the inclusion of symptom scales did not lead to a satisfactory model. Once we identified a satisfactory measurement model we were able to investigate invariance in the EORTC QLQ-C30, therefore, it was in Step 2 that we identified the two violations of invariance.
The direct effect between age and Physical Functioning suggested that if younger and older patients had the same underlying Functioning HRQoL, older patients reported their Physical Functioning to be worse than younger patients. This result was found in another study where measurement invariance was investigated in regard to the SF-36 [29] (HRQoL measure) in a sample of cancer patients [30]. The authors suggested that because Physical Functioning is the most objective HRQoL scale, leaving little room for individualized interpretation, it is conceivable that it is the other scales that are biased because there is more room for subjective interpretation. An alternative model could be fitted that allowed direct effects between age and the other scales, excluding Physical Functioning; however, this model would include many additional parameters and as a result be less parsimonious. Therefore, we opt for parsimony and the model with least instances of measurement bias. As a result of our finding, care should be taken when making age comparisons with any of the EORTC QLQ-C30 scales and age. Recently, the EORTC QLQ-ELD15 [31] was developed specifically for older adults, though it is ideal to have observed variables that are invariant to the effects of age.
The direct effect between previous treatment and Emotional Functioning was also negative. In other words, radiotherapy patients who had undergone a previous treatment (chemotherapy/surgery) evaluated their Emotional Functioning worse than those who did not undergo treatment before starting radiotherapy, even when their underlying Functioning HRQoL was similar. The different interpretation of Emotional Functioning might be due to resource depletion [32]. According to resource models, self-regulatory resources can be depleted or fatigued by self-regulatory demands. Muraven et al. [33] found that one route to self-regulatory failure is prior self-regulatory activities. In their laboratory studies, participants who were asked to employ a form of self-regulation (e.g., mental control or regulation of emotional expression) were less able to self-regulate after that (see also [34]). Previous treatment can be regarded as a prior self-regulatory effort, where emotions needed to be regulated. To undergo more treatment might decrease Emotional Functioning, because regulatory resources are depleted. This depletion may result in a different frame of reference in regard to Emotional Functioning for patients who have already undergone treatment. Interestingly, the latent construct of Functioning HRQoL was not reliant on self-regulatory efforts as evidenced by the very small relationship between previous treatment and Functioning HRQoL. To better understand this relationship, more research with longitudinal data is needed.
Previous research has shown that the EORTC QLQ-C30 has excellent psychometric properties [35] and is used extensively to assess HRQoL [3638]. For the aim of our study to investigate invariance, we believe the model we used was a good representation of the functioning scales of the EORTC QLQ-C30. The two instances of non-invariance detected do not suggest that Physical Functioning and Emotional Functioning are not valid indicators of Functioning HRQoL, but rather that care should be taken when using the functioning scales to compare younger and older adults and patients at different stages of treatment. While our sample size was small, and therefore limits generalization, the direct effect between age and Physical Functioning has been identified in previous research, indicating that it is certainly worth further investigation in a longitudinal study. In addition to this, it would be worthwhile to also consider invariance of the EORTC QLQ-C30 with respect to different cancer diagnoses, different treatment regimes, and different stages of cancer. Focusing on these specific variables would lead to greater confidence when comparing differences in HRQoL in relation to these variables.
Accounting for violations to the assumption of measurement invariance in our study lead to a significant improvement in overall model fit. The inclusion of patient characteristic variables to our model initially resulted in a model where the estimates could not be confidently interpreted. However, after the inclusion of direct effects accounting for bias, our model fit was satisfactory and conclusions regarding the model could be drawn. It is important to note that a single violation of invariance may not be enough to cause unsatisfactory model fit, but could have a substantial impact on the conclusions drawn. In other words, if the assumption of measurement invariance is ignored the researcher cannot be sure whether differences observed are related to true differences in HRQoL, or whether these differences are related to how patients interpret the HRQoL items.


The Dutch Cancer Society funded this study n (grant UVA 2005-3199). We like to thank the patients who participated in this study.

Conflict of interest

The authors of this paper have no conflicts of interests to report that may bias the findings of this study.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Open AccessThis is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://​creativecommons.​org/​licenses/​by-nc/​2.​0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Onze productaanbevelingen

BSL Podotherapeut Totaal

Binnen de bundel kunt u gebruik maken van boeken, tijdschriften, e-learnings, web-tv's en uitlegvideo's. BSL Podotherapeut Totaal is overal toegankelijk; via uw PC, tablet of smartphone.

Over dit artikel

Andere artikelen Uitgave 10/2012

Quality of Life Research 10/2012 Naar de uitgave