Introduction
Treatment for cancer can affect patient quality of life [
1,
2]. Therefore, health-related quality of life (HRQOL) measures are frequently included in randomized clinical trials in oncology. The interpretation of results for HRQOL within these randomized trials can be seriously hampered by drop-out due to relapse and death, especially because these events do affect HRQOL.
In longitudinal studies, observations of patients can be missed at a certain time point because they miss visits or do not fill in certain questionnaires, although they still participate in the study. Consequently, these patients have responses on a subset of outcome measures at different time points. Alternatively patients can drop-out, i.e., not participating in the study after a certain moment. These patients are lost to follow-up, and no information is available. In our study, we define drop-out as not filling in the HRQOL questionnaire after a certain time point, while there could be information about the survival status after that time.
There exist different reasons for drop-out. Except for drop-out caused by relapse or death, drop-out occurs due to censoring, administrative, or person-related reasons. The reasons for drop-out are often referred to as the drop-out mechanism [
3]. It is important to distinguish different types of drop-out mechanisms, because each drop-out mechanism requires different analyses to get valid results. A common classification of drop-out mechanisms has been introduced by Rubin and can be summarized as follows: Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR) [
4].
Drop-out can be considered MCAR if the probability that a patient drops out is independent of the observed (past) response measurements and of current and future response measurements. Or, to put in other words: patients drop-out due to completely unrelated reasons. For example, when HRQOL data are lost due to administrative failure, the drop-out can be considered MCAR: the drop-out is unrelated to HRQOL scores, and therefore does not depend on the observed nor on unobserved HRQOL scores. Under the assumption of MCAR, complete case analysis (including only those patients with complete data) will give valid results.
Drop-out can be considered MAR, if the probability of drop-out depends on previously observed response measures, but not on the current and future measures. However, given the observed response measures, drop-out is assumed to be independent of the unobserved (current and future) response measures. For example, when a patient drops out because the HRQOL score is lower at earlier measurements, the drop-out can be considered MAR. In this situation, one is able to predict the measurement after drop-out based on past measurements, and random effects models (mixed models) would provide unbiased results if the model is correctly specified. In many situations, this drop-out mechanism will be present in the data.
Drop-out is missing not at random (MNAR) if the probability of drop-out is also dependent on the unobserved (current and future) response measures. In this situation, the missing measurement itself is informative. For example, when a patient has a sudden decrease in HRQOL and decides not to fill in the questionnaire, the drop-out is MNAR. In this situation, we can not predict the missing HRQOL of this patient without making further (untestable) assumptions. The reason of a sudden decrease could be the occurrence of tumor relapse, or the fact that a patient is dying. So, drop-out due to tumor relapse and death might be informative.
In trials, it is quite common to analyze longitudinal data with random effects models. This type of models can handle missing data and give valid results if MAR can be assumed. In these models, the patients’ courses over time are specified using regression equations per patient, taking the correlation between the different measurements within patients into account. Therefore, a random effects model helps understanding about how individual patients change across time [
5]. However, without modeling, the drop-out mechanism informative drop-out is not taken into account. In the analysis of a trial, this might lead to serious bias, and therefore to wrong conclusions about HRQOL. To understand this, suppose that in a clinical trial, the experimental arm reduces or postpones an event as tumor relapse. Suppose, additionally, that the experimental arm has a larger (negative) impact on HRQOL than the other arm. If patients are less inclined to fill in HRQOL questionnaires after a relapse, then patients in the experimental arm are more likely to fill in the HRQOL questionnaires than patients in the control arm. So, more patients with lower HRQOL will remain in the experimental arm compared to the control arm. In this case, complete case analysis will yield biased results in favor of the standard arm, because patients with missing observations after relapse are not taken into account. A similar argument holds when using last observation carried forward, since the high HRQOL values before relapse are imputed for the missing HRQOL values after death and relapse. It is therefore important to model implicitly or explicitly the drop-out mechanism.
There are several approaches for the analysis of informative drop-out. They are rather complicated. Therefore, their complexity may be the reason why these approaches are not routinely employed in the analysis of randomized clinical trials.
Historically, there are two different likelihood-based approaches for analyses of informative drop-out that differ with respect to the way the joint distribution of responses and drop-out process is factorized (Curran et al. [
6]; Michiels et al. [
7]). These approaches are the pattern mixture model approach (Hedeker and Gibbons [
8], Pauler et al. [
9]) and the selection model approach (Diggle and Kenward [
10], Little [
11], and Curran et al. [
12]). In the present paper, we will focus on the pattern mixture approach, because this is a relatively simple extension of the commonly used random effects model assuming MAR.
Another class of models is based on the joint modeling of response measures (e.g., HRQOL) and times to events (e.g., relapse and death). These models estimate the HRQOL measures given that a person is in a certain state at a certain time point. Examples are the models of Kurland and Heagerty [
13], and of Diggle et al. [
14]. These models take the ordering of event times explicitly into account and regard drop-out as a consequence of events in the past. In this sense, these models can be used for prediction. This is in contrast to the pattern mixture approach where the drop-out process is reflected in patterns and where these patterns are determined retrospectively. This approach is therefore primary appropriate for adjusting the estimates for the missing data mechanism.
In this paper, we will use data of a Dutch multi-center randomized clinical trial in patients with breast cancer as illustration for the use of the pattern mixture approach. In this trial, the long-term impact of two different chemotherapy schedules (high dose vs. standard dose) on HRQOL was compared. The results of the study with respect to the primary outcomes, relapse-free survival, and overall survival have been published by Rodenhuis et al. [
15,
16]. Buijs et al. [
17] analyzed the HRQOL data comparing both treatment arms assuming that patients remained disease free.
We aimed in this paper to correct the HRQOL estimation for informative drop-out in order to get a valid comparison between schedules, taking events as death and relapse into account. Since we expect that drop-out due to relapse and death are the two main sources of informative drop-out, we propose a pattern mixture approach in which patterns are defined based on the different disease states a person can have at the end of the study period. The different states we use for the patterns are quality of life related such as ‘death’, ‘alive with tumor relapse’, and ‘disease free’. We start considering the ‘usual’ random effects model analysis assuming MAR. Then, we will analyse the data with the pattern mixture approach with HRQOL-related patterns. Thereafter, outcomes of both approaches will be compared.
Methods
Dutch multi-center randomized clinical breast cancer trial
From August 1993 to July 1999, 804 patients from 10 Dutch centers were randomized in the HRQL part of the trial comparing two different chemotherapy schedules. The standard chemotherapy arm consisted of five cycles of 5-fluorouracil (500 mg/m2), epirubicin (90 mg/m2), and cyclophosphamide (500 mg/m2) (FEC). The high-dose chemotherapy arm consisted of 4 identical cycles of FEC followed by one cycle of high-dose chemotherapy comprising cyclophosphamide (6 g/m2), thiotepa (480 mg/m2), and carboplatin (1600 mg/m2) administered over 4 days followed by peripheral blood progenitor cell reinfusion on day 7. There were 885 patients randomized; 804 of them participated in the HRQOL part of the trial of which 405 patients were randomly assigned to the standard therapy and 399 patients to the experimental (high dose) therapy. The main study was a survival study. The primary outcomes were the overall survival and the disease-free survival. Secondary outcome measures were health-related quality of life measures. The Medical Ethical Committees of all participating centers approved the study. All patients gave informed consent before study entry.
Subgroup of patients
At the time of the analysis in 2003, there appeared to be no significant differences following intention to treat analyses with respect to overall and disease-free survival (Rodenhuis et al. [
15]). However, a subgroup of patients with normal HER2 expression did benefit of high-dose chemotherapy as far as disease free (
P = 0.002) and overall survival (
P = 0.02) were concerned (Rodenhuis et al. [
16]). In this paper, we focus therefore on this subgroup. Only patients participating in the HRQOL part of the trial are considered, of which 273 patients were randomized to standard dose and 288 to high-dose chemotherapy.
Quality of life assessment
The pattern mixture model is illustrated for one of the subscales of the SF-36 namely the physical functioning (PF) subscale, consisting of items which indicate the ability to perform physical activities as walking, carrying shopping, and climbing stairs. The scores range from 0 to 100, with higher scores representing higher level of functioning. The choice of this subscale was mainly based on illustrative reasons: preliminary analyses on this subscale demonstrated significant and relative large differences in favor of the standard arm. Other subscales demonstrated smaller (and/or not significant) differences.
The questionnaires were sent by mail before randomization (baseline), after chemotherapy (about 3 months), after radiotherapy (about 6 months), and thereafter every 6 months.
Due to the variation in entry date, not all patients could be studied for the maximum period of 5 years and 12 measurements points. Each patient had a study period follow-up of at least 3 years and eight measurement points. We defined patients who were alive but did not have the maximum study period follow-up as censored.
Statistical analysis
The analyses were performed with SPSS (version 16), MLWin (version 2.02), and R (version 2.6.2).
Descriptive
Analyses were based on intention to treat. A cross-sectional comparison of PF between the two doses was performed at each measurement point, presenting the observed mean and standard error per measurement point. In this analysis, differences in PF per time point between both doses were based on different sets of patients. Visual inspection of the course of the means over time gave suggestions for modeling the course over time in the random effect model and pattern mixture model.
Drop-out
A large part of drop-out was due to death and censoring (e.g., patients who did not have the maximum study period follow-up). Patients able to fill in the questionnaires at a certain time point (the patients at risk) are the patients being alive and being not censored at that time point. Percentages of patients with observed HRQOL measures (i.e., responding patients) are based on these patients at risk, and are called response rates. They were calculated for each time point and for each treatment.
A distinction in drop-out due to relapse and due to death must be made. In the case of relapse, we can assume that PF exists and is affected by a relapse, but that it is not measured. The definition of PF after death is not directly obvious. Therefore, three patterns were defined: A pattern consisting of patients who died during the study period (deceased patients), a pattern with patients with relapse who were alive at the end of the study period, and a pattern with patients who remained disease free. A cross-sectional comparison of the PF assessments of these patterns was performed to analyze descriptively whether the course over time was different for the patterns or not, and whether both doses had different responses over time per pattern.
Random effects model (assuming MAR)
A random effects model was fitted on the data with ‘course over time’, ‘treatment’ and ‘interaction course over time by treatment’ as fixed explanatory variables for the PF assuming MAR. In the following, this is called the “final model”. Since we assumed that each patient had her own course over time, time effects were also considered to be random. The ‘course over time’ was specified based on the findings of the visual inspection of the descriptive analysis. P-values smaller than 5% were considered to be significant.
We started with fitting the empty model (Snijders and Bosker) [
18], using maximum likelihood. This was a model without explanatory variables; only a fixed intercept was included (parameter for the overall mean) with two random effects. This model estimated the overall mean PF of all patients and time points and estimates the variability of the PF within patients and between patients. Then, we fitted the time model using maximum likelihood. This model included only the fixed and random intercept and fixed and random time effects. This model estimated the mean PF of all patients over time. The difference in deviance (i.e., -2 times the value of the likelihood) between the empty model and time model is an indication for the effect of time on PF, and the deviances can be compared using the likelihood ratio test. Then, the final model ‘interaction course over time by treatment’ was fitted by maximum likelihood. This model estimated the mean PF over time for the different treatment arms. The performance of the final model was evaluated by the deviance compared to the deviance of the empty model and to the deviance of the time model. The difference between the final model and the time model is an indication of the effect of treatment over time on PF. The comparisons of different deviances between different nested models can be regarded as a relative for the fit of the model for the data.
These models assumed only MAR implying that drop-out dependent on the missing PF score itself (for example because of an unpredictable decrease caused by relapse) was not taken into account. The empty model, the time model, and final model are explained and specified in the appendix.
Pattern mixture approach (correction for informative drop-out)
In the pattern mixture approach, the drop-out process and the outcome measures conditional on the drop-out pattern are jointly modeled. The drop-out process was modeled by the probability to belong to a specific drop-out pattern for each dose separately, and estimated by the sample proportion. Patterns were based on the relapse and survival state of the patient at the end of the study period and defined as before: ‘deceased’, ‘alive with relapse’, and ‘disease free’. The outcome conditional on the drop-out pattern was modeled by the random effects model with additional explanatory variables indicating different follow-up patterns of patients. In the following, we call this model, the pattern mixture model. We implicitly assumed that each pattern has its own missing data process and own course over time with respect to PF. Moreover, we assumed that within each pattern, the drop-out is MAR. This means that conditional on being in a certain pattern, the missing PF can be predicted based on observed measurements of patients within this pattern. The pattern mixture model was specified with ‘course over time’, ‘treatment’, and ‘interaction course over time by treatment’ as explanatory variables (as before), and also with the variables ‘pattern’, and all possible interactions with pattern. This model was fitted using maximum likelihood. See the
Appendix for the explanation and specification of the model.
Based on the difference in deviances (likelihood ratio test), it was decided which additional explanatory terms could be excluded from the model. This difference can be regarded as a relative measure for the fit of the model on the data.
For each pattern, the course over time per treatment was estimated. In order to compare both treatments for all patterns together, the effects of both doses were estimated by a weighted average over all patterns weighted by their sample proportions. An illustration of this is given in the appendix. With the delta-method (Bisshop et al. [
19]), standard errors of the differences between the two treatments were estimated to determine the significance of the effects.
Comparison of the results of the random effects model and pattern mixture approach
The fit of the random effects model and the random effects part of the pattern mixture approach (i.e., the pattern mixture model) were compared based on the difference in deviance (likelihood ratio test). The results of the pattern mixture approach (weighting over all patterns) in terms of estimated differences in PF scores between both treatments arms per time point were compared to those of the random effects model assuming MAR.
Sensitivity analysis
Since in our data, not all patients had the maximum study period of 5 years, the patterns ‘alive with relapse’ and ‘disease free’ consisted of patients for which the disease state after 5 years was not known yet. Therefore, we implicitly assumed that censored patients and patients with maximum study period could be combined within each pattern and that this did not have consequences for the results. To check whether the definition of patterns for all patients (with complete and incomplete follow-up) leads to different results than the definition of patterns for only patients with maximum study period follow-up, we performed a sensitivity analysis by also modeling the data using only patients with complete 5-year follow-up and comparing the results.
Discussion
In this paper, the pattern mixture approach as a method for correction for informative drop-out was studied. We illustrated this on data from a large multi-center randomized trial comparing the long-term impact on HRQOL of two different doses chemotherapy for patients with breast cancer. We only considered one dimension of the HRQOL, namely PF. We focused on a subgroup of patients with normal HER2 expression in their tumor for whom it was shown that the (disease free) survival was better for the high dose than for the standard-dose arm (Rodenhuis et al. [
16]). We compared the results of the pattern mixture approach to the results of the commonly used random effects model for the subscale PF.
In this particular example, the pattern mixture approach leads to differences with respect to the estimation and significance of the difference in decline of PF measures over time between both arms after 1 year: The estimated difference in decline is doubled from 0.10 (P = .055) in the random effects model to 0.21 (P = 0.001) in the pattern mixture approach. However, the absolute estimated differences in PF between both chemotherapy doses after 1 year are similar in both approaches, and therefore the pattern mixture approach does not lead to other conclusions than the commonly used random effects model.
In studies with a lot of missing values due to drop-out for different reasons, it is important to explore whether the drop-out is informative or not. It is possible that for some outcomes the drop-out is informative, while for other outcomes this is not. An example for this is the case where the mean blood pressure is the main outcome in a study concerning a clinical patient population with high blood pressure, and where patients are repeatedly measured over time until their blood pressure is sufficiently reduced. So, patients are not measured any more (drop-out of the study) when they have “good” blood pressure values. In this case, the drop-out is caused by the previous measurement, and therefore not informative. So if one is planning a study with different sort of outcome measures, one must consider whether the drop-out is informative for each different outcome measure. In our data, we only considered drop-out by death or relapse as the two main sources of informative drop-out.
The crucial (and maybe the hardest) point in applying the pattern mixture approach is the choice of the patterns. The validity of the model estimates is determined by the implicit assumption that within each pattern, the missing data process is MAR. This is an untestable assumption; testing MCAR vs. MAR is possible, but testing MAR vs. MNAR is not (Molenberghs et al. [
20]). The choice of the patterns must be made in such a way that this assumption is plausible. The number of patterns could be extended, based on the exact survival time or time of relapse, for example to account for the fact that shortly before death, the PF often drops and the amount of missing responses increases. In our data, we defined the patterns based on HRQOL-related events as death and relapse.
In our model, we had some censored patients, and we used their last observed event state for grouping them into patterns. Compared to the patients with maximum study period, relapse-free patients who are censored could still die or get tumor relapse within 5 years, implying that they belong to the pattern ‘deceased’ or ‘alive with relapse’. Analogously, patients with tumor relapse could still die within 5 years and therefore belongs to the pattern ‘deceased’. The implication of the different length of study period follow-up might be that the assumption of MAR within each pattern is violated. To check this, we also modeled the data for only patients with complete follow-up. Comparing these results with the results of the analysis of all patients, we found similar results with somewhat larger standard errors. This means that for our data, there is no large problem here on this point. But for the application on other data with patterns based on disease states for patients with different lengths of follow-up, one must carefully consider the point whether the data are MAR within the patterns, and check this by sensitivity analysis, if possible.
The patterns chosen here do have clinical relevance. It is clinically interesting to estimate the courses over time for relapse-free patients. This could be interesting in terms of prediction: What is the HRQOL during the next 5 years, assuming that patients remain disease free? However, our model is less appropriate for the prediction for patients alive with relapse, because the pattern is based on the state at 5 year and the estimated PF measures over time are based on PF measures for patients with relapse and patients who did not yet have a relapse at that time point. The pattern “deceased” needs also special attention. Death is different from other reasons of missing, where we can assume that the quality of life exists but is not measured. If and how quality of life after death is defined is a topic of much discussion. The random effects models used in this paper implicitly impute missing PF scores, also after a patient has died. This is done by extrapolating the patient’s individual trajectory before death. One could question this approach. Therefore, one might want to limit the analysis for patients being alive. Pauler et al. [
9] proposed to use the pattern mixture model estimates conditional on not being in pattern “deceased”. Diggle et al. [
14] discussed extensively the validity of different models accounting for informative drop-out, including the relatively new approaches where measures (as HRQOL) and times to events (as relapse and death) are jointly modeled. Applying the latter models to these particularly data will be a topic for new research.
In general, it is recommended in data with informative drop-out to do sensitivity analysis by using different approaches. The pattern mixture approach we propose here is rather a simple extension of the commonly used random effects models in longitudinal data, and might therefore be routinely employed in the analysis of randomized clinical trials.