Background

According to studies published recently within the cancer epidemiology and health research outcomes areas, fatigue is one of the most frequent symptoms with a major impact on oncology patients [13]. Cancer-related fatigue (CRF), which has been defined as a “distressing persistent subjective sense of physical, emotional, and/or cognitive tiredness” [4], is a very prevalent symptom that can affect more than three quarters of oncology patients [58]. Fatigue is undertreated [5, 6], prevents a “normal” life [9], and could have a greater impact on the quality of life than pain or depression, which are symptoms also observed frequently in cancer patients [5]. Excellent reviews addressing fatigue in cancer patients and its associated problems have been published recently [1012].

The need to deepen our understanding of the impact of fatigue interventions on outcomes such as quality of life [13, 14] has been recognised recently, for which adequate and duly validated and developed instruments are required. This underscores yet again the importance of integrating clinical and outcomes research in daily clinical practice [15]. A new instrument to measure patient perceptions of fatigue in cancer and its treatment has been developed: the Perform Questionnaire [16], originally developed for Spanish-speaking patients and created with the intention of being a feasible and valid tool for evaluating, from a patient perspective, the perceptions associated with fatigue within usual clinical practice. Initially, the instrument development procedure focused on item generation and item reduction, as well as on exploring the structural validity and internal consistency of the instrument [16], while assuring certain formal characteristics in terms of length, scoring system, etc., which are characteristics that can often jeopardise the feasibility of the tool in usual clinical practice, as demonstrated in other health areas [17].

Subsequently, the psychometric properties of the new tool were assessed, an indispensable requirement for fulfilling the purpose of using the new tool in the target population. The aim of this paper was to report the findings of the validation study of the Perform Questionnaire.

Patients and methods

To assess the reliability of the new PQ, as well as its validity and sensitivity to change, we performed a prospective and observational study between November 2005 and September 2006 in the oncology and palliative care departments of 50 Spanish public hospitals. Each centre consecutively included patients with the following characteristics: (a) ambulatory and over 18 years of age; (b) with a diagnosis of cancer (any site and period of disease duration, as long as they were capable of completing the study questionnaires); (c) with a self-rated fatigue intensity ≥30 mm on a 100-mm visual analogue scale (VAS) at the time of the study visit [4]; and (d) with a life expectancy of over 6 months. All patients provided informed consent to participate in the study, and the study was approved by the Ethics Committee of the Hospital Clínic i Provincial in Barcelona and have therefore been performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki.

Patient assessments were performed at the time of inclusion in the study and 3 months later to assess test–retest reliability and sensitivity to change. We collected data regarding sociodemographic factors (i.e. sex, age, level of education, and level of family care required) and clinical characteristics (including cancer location, extent of the disease, haemoglobin level, cancer treatment, time since diagnosis, Karnofsky index on inclusion, and intensity of fatigue measured on a 100 mm horizontal VAS).

The PQ [16] is a new questionnaire developed by clinicians that uses patient perspectives for assessing the perception of fatigue in cancer. It consists of 12 items with responses on a five-point ordinal scale. The 12 items are distributed in three dimensions: “Physical Limitations”, “Activities of Daily Living”, and “Beliefs and Attitudes”. An overall score and three-dimension scores are obtained, with low scores indicating worse patient perception of fatigue.

The new questionnaire was self-administered together with the Functional Assessment of Cancer Therapy—Fatigue (FACT-F) [18, 19]. The FACT-F consists of 13 items related to fatigue with responses to the individual items on a five-point ordinal scale and overall scores range from 13 (no impairment) to 65 (greatest impairment). The new questionnaire was also self-administered together with the short version of the Nottingham Health Profile (NHP) [20, 21]. The NHP-22 questionnaire is a generic health-related quality-of-life measure that has two summary scores: physical and psychological dimensions [21]. Each dimension score ranges from 0 (minimum impact in Health-Related Quality of Life (HRQOL)) to 100 (maximum impact in HRQOL).

Patients were also asked to rate the new questionnaire’s ease of use on a scale ranging from “Very difficult to complete” to “Very easy to complete”, and the time taken to complete the questionnaire was recorded.

A “health status transition” item was self-administered at the second visit, to assess changes in health status perception from the first visit. Patients provided answers based on a Likert-type ordinal scale with 13 response options ranging from “have greatly improved” to “have greatly worsened”. The results were used in the analysis of sensitivity to change.

Statistical analyses

The questionnaire’s feasibility was assessed by examining responses to the ease of use question and the time taken to complete the questionnaire. The lost information was also assessed by calculating the completion rate (the percentage of responders with no missing data in any of the 12 items) and the range of missing answers (the maximum number of missing responses per item).

The distribution of the overall and dimension scores was analysed by calculating mean scores, standard deviations (SDs), observed score ranges, and floor and ceiling effects (the proportion of patients with the worst and the best possible scores, respectively) for the overall score and for each dimension of the PQ.

The instrument’s internal consistency was assessed by estimating Cronbach’s alpha (CA) [22] coefficients for individual dimensions and the overall score at the baseline. We hoped to obtain a CA over 0.70, as recommended in the literature [23, 24]. The 3-month test–retest reliability was assessed by calculating the intraclass correlation coefficient (ICC) between visits among patients who did not report any significant change (<5 mm) in VAS fatigue intensity between study visits.

The known groups’ validity was tested by determining whether the instrument was able to discriminate between patient groups likely to differ in fatigue perception according to variables such as intensity of fatigue, anaemia prevalence, level of Hb, and need for a caregiver.

The convergent validity was tested by estimating Pearson’s correlation coefficients between the PQ and scores of the FACT-F, the NHP, and the Karnofsky index [25]. We expected correlations to be higher between the PQ and the FACT-F, as both are disease-specific instruments, than between the PQ and other health measures such as the NHP and the Karnofsky index.

The sensitivity to change was assessed by calculating the effect size (ES; i.e. the standardised mean score change) and standardised response mean (SRM) in the subgroup of patients who reported at least a “small improvement” on the health status transition item as well as in the subgroup of patients who reported at least a “small deterioration” on the aforementioned item. ES values of approximately 0.2 were considered as representing a small change, while values of approximately 0.5 indicated a moderate change, and values of approximately 0.8 or higher represented a large change in the attribute of interest [26]. The SRM was calculated by dividing the mean change in score by the SD of the change scores between the two study visits [24].

To improve the interpretation of the observed numerical differences in the PQ, the minimally important difference (MID) that would imply a clinically meaningful outcome was determined according to the method described in previous studies [27]. An Hb increase of 1 g/dL was considered the minimally important clinical change necessary to evaluate fatigue results. “Improved” patients were defined as those who experienced an increase in Hb ≥1 g/dL. “Stable” patients had a change in Hb < 1 g/dL to a lower limit of −1 g/dL. The difference in the mean PQ change score between the improved and stable groups was considered the MID of the measure.

Statistical analyses were carried out using the SAS software [28].

Results

The study population consisted of 437 patients, whose characteristics are shown in Table 1. The mean age of the study population was 59.1 years (SD, 11.8), and 60.5% of patients were female. The most prevalent type of cancer was breast cancer (33.6%), followed by lung (14.9%) and colon (11.4%) cancer. Time from diagnosis was 2.2 years, with a current Karnofsky mean score of 81.3 (SD, 11.6). Fifty-four and seven percent of patients had metastasis, and almost a third of the patients presented with anaemia at inclusion. Only 10% of the sample was enrolled in follow-up, while the remaining 90% was undergoing cancer treatment. The most prevalent treatment in the study population was chemotherapy (as a single therapy or as combined therapy), followed by radiotherapy, which was used in only 14% of the sample.

Table 1 Baseline characteristics of the validation study sample (N = 437)

As Table 2 shows, more than 80% of the patients considered that the PQ was very easy or easy to answer, and the mean time required for its administration was under 9 min. Over 80% of the study patients answered 100% of the questionnaire items. In general, all of the items showed low levels of lost answers (<4%), with the exception of question 8 (“I’ve felt bad about feeling tired at work”), which was left unanswered by 16% of the patients. Floor/ceiling effects were negligible (<2%) for the overall score and low (<9%) for all dimensions. The highest floor/ceiling effect (8.6%) was detected in the “Physical limitations” dimension. The Cronbach’s alpha for the overall score was 0.94, and it was at least 0.80 for each dimension. The ICC values found in the subgroup of patients considered “stable” were above 0.75 for the dimensions and 0.83 for the overall score.

Table 2 Feasibility, score distributions, and reliability of the Perform Questionnaire (N = 437)

Patients with the highest (severe) levels of fatigue intensity obtained worse Perform overall and dimension scores than patients with moderate fatigue intensity (P < 0.0001; Table 3). Patients who needed a caregiver, anaemic patients, and patients with lower levels of Hb obtained worse Perform overall (P = 0.0001 to 0.0006) and dimensions scores (P = 0.0001 to 0.0036) than patients who did not need a caregiver, were non-anaemic, and had higher levels of Hb. The Pearson’s correlation coefficient between the Perform overall score and the Hb level at baseline was 0.18 and ranged between 0.14 (“Physical limitations”) and 0.20 (“Activities of daily living”).

Table 3 Mean (SD) scores for perform questionnaire overall and dimensions scores (low scores indicate worse patient perception of cancer-related fatigue), according to clinical baseline characteristics (N = 437)

We used a linear regression model to assess the multivariate association between the Perform overall score and several clinical characteristics (Table 4). Significant results were obtained using the model for sociodemographic variables (educational level); however, they were predominant for clinical characteristics, such as the need for a caregiver, cancer location, fatigue intensity level, Karnofsky score, the presence or absence of cancer treatment, the presence or absence of palliative treatment, and the time from diagnosis. These clinical characteristics were independently associated with the Perform overall score (P = 0.0001 to 0.03) and explained 31% of the variance.

Table 4 Variables associated with the overall scores of the Perform Questionnaire (linear regression model) (N = 298)

The comparison of PQ scores with the other health measures (i.e. FACT-F, NHP, and Karnofsky index) revealed stronger correlations between Perform scores and the FACT-F (overall = 0.80; dimensions = 0.68–0.75) than between Perform scores and the NHP dimensions (overall Perform with Physical NHP = 0.68; overall Perform with Psychological NHP = 0.56; Perform dimensions with Physical NHP = 0.57–0.67; Perform dimensions with Psychological NHP = 0.44–0.55) or between Perform scores and Karnofsky index (overall = 0.35; dimensions = 0.23–0.26; Fig. 1).

Fig. 1
figure 1

Pearson Correlation between the Perform Questionnaire and the FACT-F, NHP-22 and Karnofsky scores (N=437)

The sensitivity to change (Table 5) was assessed among the subgroup of 208 patients who self-reported an improvement in their health status since the time of their first visit 3 months earlier, as well as among the subgroup of 84 patients who self-reported deterioration in their health status since their first visit. In general, the ES and the SRM values were higher for patients who worsened than for patients who improved. For instance, the ES for the Perform overall score among patients who improved indicated a moderate health improvement (0.57). However, the ES for the Perform overall score among patients who worsened suggested great health deterioration (−1). Perform dimension scores also behaved according to this pattern (ES for improved patients = 0.5–0.6; ES for worsened patients = 0.73–0.83; Fig. 2).

Table 5 Sensitivity to change and clinical significance of the improvement of the Perform Questionnaire Score
Fig. 2
figure 2

Effect sizes obtained for the Perform Questionnaire among patients who reported on improvement or deterioration, respectively, in the "health status transition item"

Eighty-seven patients experienced an increase in Hb ≥1 g/dL, while 201 patients remained “stable” at 3 months (Table 5). The mean change in Perform overall score for “Hb-stable” patients was 0.03, and the mean Perform overall score change for “Hb-improved” patients was 3.72. The MID for the Perform overall score, which implied a clinically meaningful outcome, was a score change of 3.69 (i.e. the difference between 0.03 and 3.72).

Discussion

The PQ, like any questionnaire or health scale meant for use in clinical research or clinical practice, must be guaranteed by rigorous validation and development procedures, as described in the literature [23, 24, 29]. In this sense, the PQ was developed, modified, and validated using standardised test construction methods consistent with US Food and Drug Administration guidelines [29].

The PQ is a tool originally developed using Spanish-speaking patients from Spain; therefore, its application to Spanish speakers within a Hispanic culture would be relatively easy after correction for cultural adaptation according to the agreed-upon guidelines for this purpose [30]. Moreover, the PQ was created with the intention of being a feasible, valid, and useful tool for assessing, from a patient perspective, the perceptions associated with fatigue in cancer within usual clinical practice. That is to say, this measure was developed with the belief that it could be converted into a real tool to be used by physicians and health professionals who normally manage and assist oncology patients. Therefore, its development process has been lengthy and meticulous, and the validation performed and reported in this article tested exhaustively some of the most relevant psychometric properties and characteristics for the use of health measures in clinical practice [30].

The PQ was well received by the study population. Its administration required less than 9 min, and >80% of the subjects classified it as “very easy” or “easy” to answer. Importantly, there was a low level of lost answers for all items, with the exception of the item “I’ve felt bad about feeling tired at work,” which was associated with a rate of lost answers of 16%. Taking into account that oncology patients are often not fit enough to work and that the format of the administered questionnaire did not include item responses like “not applicable” or “does not apply”, the aforementioned 16% probably included cases that understood the item in question but were unable to answer, as none of the five response options was adequate. In further studies regarding the PQ, this point will have to be corrected, and the option “not applicable” or “does not apply” will be added. Nevertheless, the response rate for the questionnaire (patients with all 12 items answered) was over 80%. These results are especially satisfactory within a sample of oncology patients whose common and main characteristic are precisely those of suffering from fatigue while responding to the measure.

Some patients obtained the worst or the best possible overall scores and dimension scores, which suggests that the questionnaire satisfactorily covers the perceptions of fatigue as presented by the target population under study [31]. The analysis of the internal consistency yielded satisfactory results. The Cronbach’s alpha value obtained for the questionnaire’s summary score (i.e. with all 12 items) was well over 0.70 (Table 2), which is usually considered a standard value [24] and verges on what some authors [23] consider a value that would permit the administration and individual interpretation of the questionnaire without additional samples or patient populations. The Cronbach’s alpha values for the three dimensions of the PQ were positive and comfortably above standard values. The test–retest reliability, another measure of reliability, was analysed using patients who did not report any significant change (<5 mm) in the VAS fatigue intensity between study visits. Regarding the “stable” subgroup, the questionnaire showed a reproducibility of satisfactory scores (ICC >0.70) and was within accepted standards [24] both for summary scores and for dimension scores [3234].

The analysis of the validity of the questionnaire was concentrated on assessing the behaviour of the PQ’s scores in certain patient profiles that, a priori, were appreciably different with respect to the perception of fatigue. In this sense, and consistent with the expected results, the PQ scores, i.e. the perception of fatigue, were worse in patients with greater fatigue intensity, patients who needed and had a caregiver involved in their daily lives, and patients suffering from anaemia or with lower Hb levels. These studies are coherent with the results reflecting a low or moderate association with the scores of others fatigue scales (mainly through the use of the FACT-F or FACT-An questionnaires; similar data have not been found in other questionnaires) and Hb levels [18, 3437]. However, no previous studies used a standardised scale for the assessment of fatigue associated with the need for a caregiver involved in the daily lives of cancer patients, which is further proof of the validity of the PQ. In addition, the multivariate analysis showed that two of the above-mentioned variables (need for a caregiver and fatigue intensity) were independent variables particularly associated with the overall PQ scores.

The behaviour analysis of the PQ scores was also focused on the assessment of convergent and divergent validity with respect to specific health questionnaires for fatigue, such as the FACT-F (convergent validity), and with respect to non-specific questionnaires for fatigue, such as the NHP and Karnofsky index (divergent validity). The relationship between the Perform and FACT-F questionnaires was, as expected, high to very high, as these are tools whose content is well associated. However, the relationship between the PQ and the NHP was more moderate, and the correlation between the PQ and the Karnofsky index was low. This pattern of correlations, which was consistent with what was expected (the greater the similarity in assessed concepts, the greater the relationship and vice versa) was reproduced for both the overall scores of the questionnaire and for the dimension scores, and was consistent with associations between performance status and diverse fatigue questionnaires reported previously [18, 35, 3840].

Sensitivity to change in the Perform scores was assessed. This property was analysed using two subgroups of patients: those who considered that their health status with respect to fatigue had “improved” since the previous visit and those who considered that it had “worsened”. In both groups, the questionnaire scores underwent an expectable, coherent, and consistent change for better or worse, respectively. The change was from slight to moderate in patients who showed a certain improvement according to the interpretation of Cohen’s ES [26]. In patients who reported a worsening in their health status, the ES was between moderate and great, which indicates that the change had been greater (worse) among those who had reported worsening than among those who had reported an improvement. Sensitivity to change, a metric property that indicates the extent of the ability of the questionnaire to detect change (from better to worse) within the health concept assessed by the patients, is rarely tested when validating these kinds of questionnaires [35, 41], although knowledge of the sensitivity of a measure for fatigue in cancer would allow suitable follow-up of the health status of oncology patients. In this case, we demonstrated the sensitivity of the PQ regarding improvement and worsening of fatigue, which could be of great use in clinical practice and for research purposes.

Finally, and following the recommendations of the most recent guidelines [29], it was desirable to estimate the minimum change (improvement) in the overall score of the PQ required to identify a clinical relevant improvement. Using an adaptation of the method used previously [27], we estimated in 3.5 points the aforementioned minimum change in the overall PQ score, and this value would represent an improvement of 1 g/dL in patients’ Hb levels.

In conclusion, the PQ is a questionnaire designed to assess the attitudes and beliefs about fatigue in cancer and its treatment in clinical practice that is feasible, reliable, valid, and sensitive to improvement or deterioration. Its characteristics, content, and demonstrated psychometric properties are likely to render it highly applicable for use in clinical practice in different Spanish-speaking target populations and cultures.