Main

In the United States and Europe, kidney cancer is estimated to account for approximately 4% of all new cases of cancer (Jemal et al, 2008). The annual incidence of cancer of the kidney and renal pelvis in the United States in 2008 is estimated to be 54 390, with approximately 13 010 deaths per year (Jemal et al, 2008). In the 25 countries making up the European Union (EU), more than 63 000 new cases of renal cell carcinoma (RCC) were diagnosed in 2006, resulting in an estimated 26 000 deaths annually (Ferlay et al, 2007).

The prognosis for patients with metastatic RCC (mRCC) has been historically poor, with only 10% of patients surviving beyond 5 years (US National Institutes of Health, 2006). Patients living with mRCC can suffer significant symptoms, even with the resection of primary tumours and metastatic sites (Williams et al, 2004). Health-related quality of life (HRQoL) outcomes are an important assessment tool to measure disease- and treatment-related symptoms directly reported by patients.

A number of health status scales are available for the measurement of HRQoL and disease- and treatment-related symptoms in cancer patients (de Haes et al, 1990; Cella et al, 1993; Rabin and de Charro, 2001), including two validated scales that are specific to RCC: the Functional Assessment of Cancer Therapy–Kidney Symptom Index (FKSI) and the RCC Symptom Index (Cella et al, 2006; Harding et al, 2007). Factors such as age, race, marital status, education level, income, and employment have been shown to affect HRQoL in cancer patients (Movsas et al, 2006) and may influence outcomes assessed using these scales. Cultural differences may also affect patients’ responses to questionnaires (Wild et al, 2005). These variables should be considered when interpreting data and making regional comparisons from international multi-centre studies.

Sunitinib malate is an oral multi-targeted receptor tyrosine kinase inhibitor of vascular endothelial growth factor receptors 1, 2, and 3, platelet-derived growth factor receptors, stem cell factor receptor, colony-stimulating factor 1 receptor, glial cell-line-derived neurotrophic factor receptor (Rearranged during Transfection), and FMS-like tyrosine kinase 3, with anti-tumour and anti-angiogenic effects (Abrams et al, 2003; Mendel et al, 2003; Murray et al, 2003; O’Farrell et al, 2003; Kim et al, 2006; Pfizer Ltd, 2009). Sunitinib has been approved multinationally for the treatment of advanced RCC and for gastrointestinal stromal tumours after disease progression on or intolerance to imatinib mesylate therapy (Pfizer Ltd, 2009).

In an international randomised phase III trial, sunitinib showed statistically significant improvement in progression-free survival and objective response rate compared with interferon-α (IFN-α) as first-line therapy for mRCC (P<0.001), and longer median overall survival compared with IFN-α (Motzer et al, 2007, 2009). An interim analysis of the trial data also showed superior HRQoL benefits with sunitinib compared with IFN-α (Motzer et al, 2007; Cella et al, 2008) . Here we report updated HRQoL results based on the final data for this trial, including an analysis of geographical differences.

Methods

Full details of the study design have been previously reported by Motzer et al (2007). This study was conducted in accordance with the Declaration of Helsinki Principles and Good Clinical Practice Guidelines. All patients provided written informed consent.

Patients and study design

Male and female patients aged 18 years with mRCC with a component of clear cell histology were eligible for enrollment in this study. Other eligibility criteria have been previously reported (Motzer et al, 2007). Patients were excluded if they had a severe, acute, or chronic medical or psychiatric condition, or a laboratory abnormality that could increase the risk associated with study participation or study drug administration, or that could interfere with the interpretation of study results, or in the judgment of the investigator makes the patient inappropriate for entry.

In this phase III trial, patients were randomised to receive either sunitinib or IFN-α in repeated 6-week cycles: oral sunitinib was initiated at a starting dose of 50 mg per day on a schedule of 4 weeks on treatment followed by 2 weeks off treatment (Schedule 4/2) and IFN-α was administered as a subcutaneous injection on 3 non-consecutive days per week at a dose of three million units (MU) in the first week, six million units in the second week, and nine million units thereafter. Dose modifications were allowed for toxicity management in both treatments.

Health-related quality of life assessments

Health-related quality of life was assessed using three validated self-reported questionnaires: the FKSI-15 (Cella et al, 2006), the Functional Assessment of Cancer Therapy–General (FACT-G) (Cella et al, 1993), and EuroQoL Group's EQ-5D self-report questionnaire (Rabin and de Charro, 2001). Full details of these questionnaires and their application in this study were reported by Cella et al (2008).

The questionnaires were completed on days 1 and 28 of each 42-day treatment cycle, and at the end of treatment or on study withdrawal. Non-English speakers were provided with questionnaires in their preferred language.

Nine HRQoL end points were derived from the three questionnaires, including: (1) the FKSI-15 total score, (2) FKSI-15's FKSI-Disease-Related Symptoms (FKSI-DRS) subscale (Cella et al, 2007), and (3) the FACT-G total score, (4–7) FACT-G's four subscales: physical, social/family, emotional, and functional well-being (PWB, SFWB, EWB, and FWB respectively), and (8–9) EQ-5D questionnaire's EQ-5D Index (EuroQol, 1990; Rabin and de Charro, 2001) and visual analogue scale (EQ-VAS) (de Boer et al, 2004). The FKSI-DRS subscale score was prospectively specified as the primary HRQoL end point.

All HRQoL end points were reported for the total sample (US, EU, Australia, Brazil, Canada, and Russia). In addition, these end points were examined to determine any differences between treatment arms and between the US and European (EU; France, Germany, Italy, Poland, Spain, United Kingdom) groups.

Statistical analyses

All analyses were conducted on the intention-to-treat population. Patient demographics and characteristics were described using frequency distributions, means, and standard deviations. Completion of the questionnaires was defined as responses to more than 80% of items in the overall FACT-G and more than 50% of items in the FKSI-15, FKSI-DRS, and FACT-G subscales. All patients with a baseline assessment and at least one post-baseline measurement were included in the analysis. Patients (n=25) who crossed over from IFN-α to sunitinib treatment were included in the analyses with the original randomisation assignment. Patients with no post-baseline assessment were excluded.

Estimated (or predicted) means were calculated for each end point and for each treatment, as estimated using the repeated-measures mixed-effects model (MM), controlling for time, treatment, country, treatment-by-time, and treatment-by-country interactions, and baseline (cycle 1, day 1) score (Fairclough, 2002; Singer and Willett, 2003; Fitzmaurice et al, 2004). Means within treatment group and differences in means between treatment groups were estimated across the entire span of the post-baseline period and all available observations.

With the exception of the individual items of FKSI-15, sensitivity analyses to MM on HRQoL total and subscale scores were performed using pattern-mixture models (PMM) (Little, 1994; Hedeker and Gibbons, 1997), which helps interpret results when data are not missing at random. Results are not overly dependant on the nature of missing data if results from MM and PMM are similar. The main difference from MM is the addition of the new variable ‘Pattern’ and interaction terms of ‘Pattern’ variable with all other predictors (except ‘Baseline’). Patterns were defined according to the dynamics of the attrition process.

In applying the pattern-mixture methodology, we needed to choose the number of patterns and how they are distributed over the study population. We graphed the percentage of patients with data up to a certain cycle. From this depiction three distinct patterns emerged: an exponential decrease in the number of patients from cycle 1 to cycle 10, then a modest linear decrease in the number of patients from cycle 11 to cycle 21; followed by a more pronounced linear decrease in the number of patients after cycle 21. On the basis of these observations, we selected three patterns with each cycle belonging to one of the patterns.

Data were analysed using SAS 8.2 (SAS Institute, Cary, NC, USA). Statistical significance for between-treatment differences was set at P<0.05. No adjustments were made for multiple comparisons in this supplemental analysis.

Estimated means for US and EU groups were calculated for each end point and each treatment arm based on the model described above. Estimations for the EU group were performed over the balanced population, that is, as if every country comprising the EU group had the same number of patients in the study.

Results

Patient baseline characteristics

A total of 750 patients were randomly assigned to receive sunitinib (n=375) or IFN-α (n=375). The US group consisted of 347 patients. The EU group (274 patients) comprised patients enrolled in France (n=82), Germany (n=17), Italy (n=24), Poland (n=103), Spain (n=27), and the UK (n =21). Patients were evenly distributed between the two treatments arms within each geographical group and there were no significant differences between treatment groups (Table 1). Patients had received up to 30 cycles of treatment at the time of the final analysis.

Table 1 Baseline patient characteristics

Questionnaire completion rates

A total of 692 patients (92%) had at least one post-baseline observation for each of the FACT-G, FKSI-15, and EQ-5D questionnaires. In the US group, an equal number of patients, 320 (92%) each, completed the FACT-G and FKSI-15 questionnaires for at least one treatment cycle; 319 patients (92%) completed the EQ-5D questionnaire. In the EU group 252 (92%), 252 (92%), and 253 (92%) patients completed the FACT-G, FKSI-15, and EQ-5D questionnaires respectively.

In both study populations, completion rates were slightly lower in the IFN-α treatment arm compared with the sunitinib treatment arm. The completion rates in the US sample were 96.6% (173 of 179) for sunitinib and 87.5% (147 of 168) for IFN-α; the completion rates in the EU sample were 96.3% (130 of 135) for sunitinib and 87.8% (122 of 139) for IFN-α.

Questionnaire assessments

Primary HRQoL end point: FKSI-DRS

For the primary end point, FKSI-DRS, differences in estimated means significantly favoured sunitinib over IFN-α in the total sample and in both the US and EU groups (all P's<0.05; Table 2).

Table 2 Model means of HRQoL end points across all available post-baseline observations (mixed-effects model)

Patients receiving sunitinib reported higher FKSI-DRS scores than those receiving IFN-α, with a significant difference in the overall means (2.36, P<0.0001; MM; Table 2). In examining the nine items in the FKSI-DRS (Table 3, bold items), the differences in means significantly favoured sunitinib (P<0.05) in six of nine items (lack of energy, fatigue, coughing, breathlessness, weight loss, and fever).

Table 3 Model means of FKSI-15 item scores across all available post-baseline observations (mixed-effects model)

Secondary HRQoL end points

As with the primary end point (FKSI-DRS), differences in estimated means for FKSI-15 (total score), FACT-G (total score and all domains), EQ-5D Index, and EQ-VAS were all significantly favourable for sunitinib compared with IFN-α in the total sample (all P's<0.05; Table 2).

In the US group, all end points, with the exception of the EQ-5D scores, were significantly better in the sunitinib group than in the IFN-α. In the EU group, between-treatment differences were significant in five of nine end points favouring sunitinib over IFN-α (Table 2). There were no significant treatment differences between the US and EU groups for all of these total and subscale scores for the HRQoL end points (Table 2).

FKSI-15

Higher (more favourable) FKSI-15 scores at each cycle were observed for sunitinib treatment than for IFN-α in patients in the total sample (Tables 2 and 3). Patients on sunitinib treatment reported higher FKSI-15 scores than those on IFN-α treatment with a significant difference in the overall means (4.06, P<0.0001; MM, Table 2). The difference in means significantly favoured sunitinib compared to IFN-α (P<0.05) for 10 of the 15 FKSI items in the total sample (Table 3). Interferon-α was not superior to sunitinib in any of the items in the subscales. Between-treatment differences did not significantly differ between US and EU groups across all end points, with the exception of FKSI symptom ‘I am bothered by side effects of treatment’ (P=0.0209; Table 3).

FACT-G assessments

The differences in scores between patients receiving sunitinib and those receiving IFN-α were statistically significant for the FACT-G total score in the total sample and in both US and EU groups (Table 2). Patients receiving sunitinib reported higher FACT-G scores than those receiving IFN-α, with a significant difference in the overall means (6.62, P<0.0001; 6.08, P<0.0001; 4.83, P=0.0036 respectively; MM).

Similarly, for the FACT-G subscales, differences in scores between the two treatment groups significantly favoured sunitinib over IFN-α in the total sample and in the US groups. In the EU group the differences in score between the two treatments were not significant for three of the four subscales (Table 2).

EuroQoL assessments

The overall post-baseline mean treatment difference for the EQ-5D Index in the total sample was estimated to be 0.05 points in favour of sunitinib (P=0.0078; Table 2). The overall mean treatment difference for EQ-VAS was estimated to be 7.70 in favour of sunitinib (P<0.0001; Table 2).

In the US and EU groups, the differences between the two treatment groups were not significant for EQ-5D score, but were significant for EQ-VAS score (P=0.0076 and 0.0177 respectively).

Sensitivity analyses

In general, the PMM results were consistent with those results from the MM (Figure 1). Similar to the MM results on the total and subscale scores, the MM results on the total and subscale scores of HRQoL did not show any statistical discrepancy between US treatment differences and EU treatment differences (Figure 1).

Figure 1
figure 1

(AC) Comparison of results between MM and PMM analyses across all available post-baseline observations. MM, mixed-effects model difference between sunitinib and IFN-α; PMM, pattern-mixture model difference between sunitinib and IFN-α; FKSI-15, FACT–Kidney Symptom Index-15 item; FKSI-DRS, FKSI Disease-Related Symptoms subscale; FACT-G, Functional Assessment of Cancer Therapy–General; EQ-5D Index, EuroQoL health-utility index; EQ-VAS, EQ visual analogue scale.

Discussion

In this phase III trial, sunitinib was associated with superior HRQoL compared with IFN-α in patients with mRCC, P<0.01 as measured by overall health status (EQ-5D Index and EQ-VAS), cancer-specific HRQoL (FACT-G and its subscales), and P<0.0001 as measured by kidney cancer-related symptoms (FKSI-15 and FKSI-DRS) (Cella et al, 2008). The results reflect between-treatment differences rather than within-treatment improvement compared with baseline.

Although patients were aware of the assigned treatment arm, which could potentially bias the responses to the HRQoL questionnaires in favour of sunitinib, this was substantially mitigated by several factors. Assessments were conducted and measured uniformly between treatment groups, and through control of the baseline HRQoL covariates and use of the random-effects model, the analysis incorporated and controlled for the propensity to respond in a certain way.

Overall, FKSI-DRS scores, the primary HRQoL end point of this study, showed that patients receiving sunitinib had fewer severe disease-specific symptoms (lack of energy, fatigue, coughing, breathlessness, weight loss, and fever) than did patients treated with IFN-α. Patients receiving sunitinib also reported better scores for FKSI-15, FACT-G, EQ-5D Index, and EQ-VAS, the secondary HRQoL end points. Although it cannot be ruled out, it is unlikely that these post-treatment differences could have been due to unobserved pre-treatment differences in comorbid conditions (or other factors) as the large number of patients randomised to each treatment would be expected to make the treatment groups equivalent in known and unknown ways. Therefore, any noticeable post-treatment difference is most reasonably attributable to the intervention, which was controlled by random assignment. Moreover, if baseline comorbid conditions (or other factors) were related to the HRQoL outcome score, they would also likely be related to their corresponding HRQoL baseline score, which was adjusted for in the model (thereby increasing the precision of estimated treatment effects).

No significant differences were found between the US and EU groups for the FKSI-DRS, FACT-G, EQ-5D, and EQ-VAS, implying no difference in treatments (sunitinib vs IFN-α) on HRQoL outcomes in the US and EU subpopulations. FKSI-15 symptoms also did not differ significantly between the US and EU subgroups (with the exception of ‘I am bothered by side effects of treatment’).

For the item ‘I am bothered by side effects of treatment’, the geographical variation observed between the US and EU subgroups may reflect many factors, including a chance variation, a genuine pharmacogenetic variation, cultural differences in attitudes to illness, differences in health-care delivery or patients’ experiences, or differences in scoring and reporting of HRQoL outcomes. But, more likely, the geographical variation observed between the US and EU subgroups is a trivial anomaly because the treatment difference within each subgroup was not statistically significant (P>0.05).

Several interesting observations are worthy of comment. The subscales of the FACT-G (physical, social, and emotional well-being subscales) and items within the FKSI-15 were significantly different by treatment group in the US but not in the EU subgroups (i.e., ‘I feel fatigued’, ‘I have been coughing’, ‘I am able to enjoy life’, ‘I worry that my condition will get worse’). In addition, the FKSI question ‘I am bothered by side effects of treatment’ was not significantly different in the patients in either the IFN-α or the sunitinib arm, either when analysed in the entire group or when analysed in the US or EU subgroups. Further, only 4 items within the 15-item FKSI significantly differentiated treatment groups in both the US and the EU populations. These four items were ‘I have a lack of energy,’ ‘I have been short of breath,’ ‘I am able to work,’ and ‘I am bothered by fevers.’ The other items on the FKSI scale did not seem to be important in distinguishing HRQoL in the IFN-α and sunitinib treatment arms in both the US and the EU treatment groups.

The EU group may give more variability in responses than the US group. This would not necessarily imply any cross-cultural issues, especially as the translated questions were validated in European patients. More research would inform us further, including psychometric testing of the FKSI in diverse European samples as the instrument was developed and validated only in English-speaking patients in the United States.

Individual items have more random variability (measurement error) than multiple-item subscales, which tend to be more reliable and accurate (Sloan et al, 2002). Therefore, it is not surprising that several items on the FKSI scale did not distinguish HRQoL in the IFN-α and sunitinib treatment arms in both the US and the EU groups. What is important, though, is the direction of the effect: the estimated treatment effect was in the same direction in the two geographical groups for 14 of 15 FKSI items.

It should be noted that the primary HRQoL end point, FKSI-DRS, showed treatment differences within the US group and the EU group, but not between these two groups (P=0.9645). The other end points are considered secondary outcomes in this study.

The main purpose of the geographical analysis was to determine whether the treatment effect within the US group differs from the treatment effect within the EU group. Despite there being a significant effect (P<0.05) for some subscales and items in the US group but not in the EU group, the treatment effect within the US group did not really differ from that within the EU group. Such occurrences are not uncommon. For example, an active intervention group may show a significant change from baseline but the control group may not. If there is no difference in the mean changes between the two treatment groups, we would conclude that there is no treatment effect between the two treatment groups.

The MM used in these analyses reduced the potential for bias caused by the varying numbers of patients in the two treatments leaving the study over a period of time as a result of differing efficacy. The results from the PMM, which allowed comparison between different treatments based on the pattern of missing data, supported and validated these findings. These results, therefore, showed the robustness of the data from this analysis showing that sunitinib was effective across all patient populations irrespective of country, cultural, and treatment differences.

All patients including those who crossed over from IFN-α to sunitinib treatment were analysed as per original randomisation assignment. The impact of such analyses, if anything, can be expected to make results of sunitinib benefit more conservative. Yet, even with the inclusion of these 25 crossover patients, sunitinib showed HRQoL benefits over IFN-α.

A recent geographical analysis of interim data from this study, which included a European-only sample and data from only six treatment cycles, reported similar results and conclusions overall (Castellano et al, 2009). Some variations in results between these two analyses probably stem from their use of different models, different sets of data, and different objectives and hypotheses.

The results from our final analyses of HRQoL outcomes are consistent with the previously reported interim results from the overall sample (Cella et al, 2008) which reported superior HRQoL outcomes for sunitinib over IFN-α. In addition, the similarity in findings for patients in the geographical subsamples suggests that regional variations in treatment experience or underlying cultural differences in HRQoL reporting are minimal. Although some demographic variables were statistically significant between the EU and US groups, in general, the differences were caused by the relatively large sample sizes and did not have real import.

Conclusions

In this study, patients treated with sunitinib had improved HRQoL compared with patients treated with IFN-α. In general, treatment differences within the US cohort did not differ from treatment differences within the EU cohort.