Introduction
Pulmonary embolism (PE) describes an obstruction of the pulmonary arteries mostly originating from deep venous thrombosis of the leg or pelvic veins [
1]. PE belongs to the most common acute cardiovascular diseases after myocardial infarction and stroke and incidence rates are increasing [
2,
3]. Due to improved therapy and disease management, more patients survive an acute PE event [
2]. Patients after PE can suffer from lasting symptoms, right heart failure and chronic thromboembolic pulmonary hypertension (CTEPH) [
4]. While CTEPH is a severe but rather rare secondary disease [
2], more than half of patients struggle with persistent or deteriorating dyspnoea and poor physical performance 6 months to 3 years after PE [
5]. Some studies also report mental health problems such as anxiety disorders and depression after PE [
6‐
8]. Since PE negatively affects different dimensions of health, it is important to consider health-related quality of life (HrQoL) as an outcome of PE health care. The disease-specific Pulmonary Embolism Quality of Life (PEmb-QoL) questionnaire was developed and validated in 2010 and covers six dimensions of HrQoL after PE with 40 items [
9,
10]. After it was originally developed in Dutch and translated into English, it has subsequently been translated and validated in four different languages: Norwegian [
11], Chinese [
12], French [
13] and German. The German version was translated by Frey et al. and it has been shown to meet standard psychometric criteria of reliability and validity [
14,
15]. One study investigated the minimal clinically important difference (MCID) of the PEmb-QoL to be 15 units, which is important to assess relevance of observed changes [
16]. Some other longitudinal studies, which did not primarily aim to examine the responsiveness of PEmb-QoL, are indicating its ability to detect change over time [
17,
18].
Moreover, the structural validity of the PEmb-QoL questionnaire has not been comprehensively investigated. In the development and validation process of a questionnaire, explanatory factor analysis (EFA) is a commonly employed method and useful for discovering a set of unknown factors. To confirm a hypothesis about the number of underlying factors of the instrument, a confirmatory factor analysis (CFA) should be conducted [
19,
20]. In addition, the dimensions of the original PEmb-QoL questionnaire were created based on the content of items and not based on EFA. The results of EFA in the first validation study of Klok et al. already showed slightly different results than the proposed factorial structure [
9]. Other validation studies using EFA also reported different factor structure, e.g. three [
13] or four [
12] factors instead of the six original dimensions.
To the best of our knowledge, the PEmb-QoL questionnaire is the only existing and widely used disease-specific instrument for measuring HrQoL after PE; therefore, it is essential to comprehensively investigate its psychometric properties. Thus, the specific aim of the present study is to determine acceptability, reliability, responsiveness and structural validity using CFA of the German version of the PEmb-QoL questionnaire.
Discussion
Compared to the other German validation study by Frey et al., our sample was slightly larger, patients were younger with a higher proportion of women, and notably more patients had bilateral PE and cancer [
14]. Except for sharing the same median for age, the same differences exist between our sample and the French validation study [
13]. The Norwegian and the Chinese validation studies had notably younger patients included (mean age of 63 in comparison with 56 and 52) [
11,
12]. The time between PE and study participation in the other studies differed considerably from our study. Frey’s study and the French validation study, for example, had a median time since PE occurrence of 15 months and the Norwegian study a median of 3.6 years [
11,
13,
14].
In the present study, missings (more than 50% missing items in one dimension) were < 10% for all dimensions which indicates good acceptability. All dimensions except emotional complaints had substantial floor effects. Social limitations showed the highest floor effect with 53.2%, but it also comprises only a single question. Ceiling effects were observed in one dimension only: work-related problems. These results are in line with two other validation studies of the German version of the PEmb-QoL [
14,
15]. Since floor effects may limit the ability to detect small changes, an analysis of the responsiveness of PEmb-QoL is crucial for the questionnaires’ applicability in long-term and intervention studies.
Internal consistency showed good to acceptable results. Cronbach’s alpha of limitations in ADL and work-related problems were almost higher than the recommended limit. This may be an indication for redundant items. The high average inter-item correlations of 0.71 and 0.80 for limitations in ADL and work-related problems also correspond with possible redundancy among items. Frey et al. found similarly high values for those two dimensions [
14].
For the analysis of test–retest reliability, the time interval for the retest was on average 2 weeks. Since only three participants reported change, we assumed this interval to be appropriate for avoiding both memory effects and having a real change in PE-related health status. The ICCs were in a good range (> 0.75) except social limitations. It seems possible that the low ICC in social limitations may be related with the contact restrictions due to Covid-19 in Germany at the time of the retest and thus, respondents may have interpreted the question differently. Otherwise, it can be assumed that it is a problem of the wording of this question in the German version because Frey et al. also found low ICC for social limitations [
14].
We investigated responsiveness of PEmb-QoL for the time intervals 3 to 6, 6 to 12 and 3 to 12 months. The size of the SRM was at least moderate for all dimensions and time intervals except for the PEmb-QoL summary score, which showed a lower SRM of 0.35 for the time interval of 3 to 12 months. The group that remained stable showed only low or no effects. As expected, the SRM was notably higher for the CRQ dyspnoea dimension than for EQ-VAS. The CRQ dyspnoea dimension includes questions about specific symptoms that are relevant for patients after PE, whereas EQ-VAS is only a global rating of subjective health status. Of interest, the SRM for the PEmb-QoL summary score was high for both EQ-VAS and CRQ, supporting the assumption that it may be suitable for representing overall PE-related quality of life. These results are supported by other studies that did not primarily examine responsiveness but used the PEmb-QoL questionnaire in a longitudinal study design. Kahn et al. used various HrQoL and health status questionnaires 1, 3, 6 and 12 months after PE. The results showed that the PEmb-QoL summary score is improving in alignment with improved scores of the Mental Component Summary score (MCS) and Physical Component Summary score (PCS) of the SF-36 and the University of California at San Diego Shortness of Breath Questionnaire (SOBQ) [
17]. In addition, Chuang et al. reported mild to moderate effect sizes of the PEmb-QoL questionnaire 1 month after PE when a clinical event (e.g. bleeding, stroke, recurrent PE) has occurred in the meantime [
18]. Together with results of test–retest reliability, our study contributes supportive information indicating the PEmb-QoL questionnaire to be an appropriate instrument for longitudinal studies. However, we did not have an objective external criterion for defining change, and responsiveness is a measure of one particular instrument applied to a particular sample and cannot be seen as absolute [
39].
To our knowledge, this is the first study that uses CFA to evaluate the fit of the factor structure of the PEmb-QoL questionnaire. All models showed significant chi-square test statistics with
p < 0.001, which would indicate bad fit. However, for some models, the ratio of χ
2/df was < 2.0, which is being considered as good fit. It has also been discussed that models with robust estimation tend to be over-rejected by corrected chi-square test statistics [
40].
While the four-factor structure showed good fit indices, the original six-factor structure could not be fitted to the data. We found very high correlations between intensity and frequency of complaints, which may suggest that the two dimensions are actually representing one factor for severity of symptoms. This assumption was already made in the very first validation study by Klok et al. [
9]. For models 4 and 5, some re-specifications were applied. Co-varying of error terms should not be done just to improve model fit, but must be necessarily supported by theoretical rationale [
41]. Since the re-specifications are theoretically reasonable and in each case within the same factor, we assumed them to be justifiable.
The model fit and high factor loadings of the hierarchical model (model 5) support that an overall summary score seems to be appropriate, but it has to be considered that in this model, some items (1h, 6, 8, 9h and 9i) of the original version are omitted. If the summary score should be able to compare the results of PEmb-QoL in different languages, changing the number of items should be treated with caution. Another aspect of the summary score is the potential loss of information about PE-related quality of life; as Tavoly et al. already pointed out, it is generally seen as a multidimensional construct [
11]. Multidimensionality is supported by the structure of model 5, in which the general factor explains a high percentage (82–83%) of variance in the dimensions limitations in ADL and work-related problems, but for emotional complaints and symptoms about half of the variance is specifically explained by the dimension. Furthermore, it has to be considered that model 5 did not outperform model 4 as they showed similar fit indices and results of the scaled chi-squared difference test suggested model 4 to be the better model. However, even trivial differences may become significant and contrarily, overlapping RMSEA confidence intervals indicate no statistically significant difference. We assumed both models to show an adequate fit to the data, but model 5 may be favoured due to accounting for the high correlation between the four factors. Regarding the fact that by applying modifications, CFA loses its confirming character, the models should be validated in a different sample.
Since the current six existing dimensions in the questionnaire are debatable due to our and also previous study results, the often recommended PEmb-QoL summary score may be seen as a good option for interpreting and comparing results from different analyses. The two dimensions frequency and intensity of complaints could be considered to be interpreted as one dimension in the future analyses. Furthermore, the social limitations dimension should be interpreted with caution, because it includes only one question and psychometric properties did not show good results. Hence, practitioners should keep in mind to collect additional data about PE-related effects on social activity, if this is an information of interest. Additionally, if studies identify items not contributing to the measurement of the relevant concept, this may help developing short forms of the questionnaire, which are highly requested among clinical practitioners.
Our study has several limitations. While the respondents filled in the first questionnaire by themselves in written form at home, we conducted the retest by telephone. Therefore, our results for test–retest reliability are only comparable to a limited extent and have to be interpreted with caution. Furthermore, the sample may seem quite small for CFA due to the complex measurement model with 38 manifest variables. For assessing responsiveness, we were lacking a clinical assessment or judgement of medical experts as an external criterion for change of PE-related health status. Generalizability of our findings is also limited due to the fact that our sample comprises a cohort from a single university hospital in southern Germany. Patients admitted to university hospitals may differ from patients in non-university hospitals in terms of disease characteristics and treatment.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.