Introduction
Implication 1: definition
Revision of the Vanier et al. [7] definition
“Response shift is an effect on observed change that cannot be attributed to target change because of a change in the meaning of the subjective evaluation of the target construct.”
• Sprangers and Schwartz [11] proposed the following working definition of response shift: “a change in the meaning of one's self-evaluation of a target construct as a result of: (a) a change in the respondent's internal standards of measurement (scale recalibration, in psychometric terms); (b) a change in the respondent's values (i.e. the importance of component domains constituting the target construct); or (c) a redefinition of the target construct (i.e. reconceptualization)” [11, p. 1508]. | |
• Rapkin and Schwartz [3, 4] defined response shift as the residual change score or the discrepancy between expected and observed change that can be explained by changes in appraisal, after taking into account standard influences (i.e. demographic and clinical characteristics generally considered important to quality of life (QOL) [4, p. 4]. | |
• Following Oort [12, 13], Vanier et al. [7] consider response shift to be a special case of violation of the principle of conditional independence (PCI) when observed change is not fully explained by target change”. They added that they assumed response shift to be the consequence of ‘a change in the meaning of one’s self evaluation of a target construct’ …” [7, p. 8]. | |
• In the current paper, we define response shift as an effect on observed change that cannot be attributed to target change because of a change in the meaning of the subjective evaluation of the target construct This definition can be operationalized as a special case of the violation of the principle of conditional independence (PCI): Response shift is present when there is a discrepancy between: (a) observed change conditioned only on change in the target construct (i.e. observed change directly reflects target change) and (b) the observed change conditioned on the target construct and other variables that explain variation in observed change (e.g. adaptation to a new health state) [7]. This definition refers to the special case where there is response shift only if the violation of the PCI is caused by a change in the meaning of the subjective evaluation of the target construct. |
Agreements and differences among the definitions
Implication 2: theory
Recalibration, reprioritization, and reconceptualization
Adaptation and response shift
Implication 3: methods
Change in meaning of subjective evaluation
Methods | Alternative explanations based on [10] | Exploring and making alternative explanations less plausible |
---|---|---|
Design-based Methodsa | ||
Then-test [15] | – Differences between pretest and then-test scores can also be due to response biases such as effort justification and social desirability. | – The more respondents invest in an intervention or the more desirable a certain outcome is, the more likely this will occur. |
– If this is the case, one might include a social desirability assessment questionnaire, preferably situation-specific. Examine whether pretest–then-test differences are larger (or smaller) in patients reporting higher levels of social desirability responding than in those reporting lower levels. – In comparative studies, employ (placebo) control condition, if possible. | ||
– Given the need for retrospection, this method is also prone to recall bias and implicit theories of change.b | – Ensure the baseline event is a salient moment for patients to increase the possibility they will remember it, e.g. by involving patients in the design of the study. | |
– Compare the then-test results with those of an independent approach that cannot be affected by recall bias, e.g. statistical methods. See further [38]. – An earlier suggested method to explore recall bias included the administration of the same outcome measure at follow-up, but with the instruction to recall the baseline scores (rather than providing a re-evaluation of one’s functioning at baseline). However, this may not be a sound approach, given that response shift itself may induce recall bias. That is, given the new standard, respondents may not even be able to recall their previous responses that were given from a previous standard. | ||
– Without further data, it is uncertain whether the results can be attributed to a change in meaning of the subjective evaluation. | – Conduct qualitative research to understand response processes. This may include cognitive interviews with (a subsample of) respondents inviting them to reflect on their responses, starting with open, non-leading questions. For example, “If you compare the answers you have just provided (i.e. then-test) with those you gave…weeks ago (i.e. pretest), can you tell me something about it? Are the answers you have just provided generally in agreement with the answers you gave …weeks ago? Are there answers that are in disagreement? Can you amplify your answer? Can you give an example?” [39, p. 712]. Think aloud interviews with verbal probing techniques may also be conducted at different assessments to enable comparisons of respondents’ cognitive processes over time [22, 30, 31]. | |
Appraisal method [3] | – The extant appraisal measures (e.g. the Brief Appraisal Inventory assessing health worries, concerns, goals, mood, and spirituality) do not distinguish among appraisal of QoL, QoL itself, adaptation, and response shift. | – While this questionnaire may generate new information, one should be aware that the responses will not give information on how respondents appraised a completed questionnaire but rather how they rate different aspects of their lives. However, change in people’s lives is not equivalent to change in response processes. Other measures may need to be devised or interviews may need to be held. |
– Given the need to retrospect on the way respondents completed questionnaire items, this method is prone to response bias such as recall bias and social desirability bias. | – Provided appraisal can be measured validly with a questionnaire [20], administer such a measure after completion of a limited number of related items, e.g. one questionnaire or one domain within a questionnaire. This procedure would also allow the assessment of variability of appraisal across items/domains as previously documented [6]. | |
Semi-structured interview [21] | – Recall bias and implicit theories of changeb can be introduced if interview questions ask to reflect on the past. | – Ruling out alternative explanations would need to be done in a narrative way. One may probe about patients’ explanations regarding perceived change or stability. Be alert for signs of recall bias (e.g. confusion about the timing or inconsistencies in placing events in time). Recall bias can be examined when a second interview is being conducted as one may compare the recalled answers with the previously given answers when they should be similar (e.g. events). |
– Respondents may indicate change that could be interpreted as response shift but which in fact is enforced by the interview context (e.g. response biases such as demand characteristics, social desirability responding). | – Be alert for the influence of the interview context. Key is to be as open and non-judgmental as possible, and again, ask for explanations of respondents’ answers. Also be aware of inconsistencies as they can be meaningful and revealing in addition to being indicative of noise. | |
– Response shift may remain undetected when respondents are not capable of reflection or verbalization. | – The interviewer may need to make a subjective judgement about whether the respondent is not capable of such reflection or verbalization. This finding may impact the conclusions, which needs to be discussed. | |
Vignettes [40] | – If vignettes describe health states outside respondents’ experience and knowledge, change in ratings over time may be caused by factors that are irrelevant to the vignettes. | – Make sure that the vignettes are relevant to the patient population prior to their use. The construction and validation of the vignettes may need to involve interviews or focus groups with patients of the target group. |
– Without further data, it is uncertain whether the results can be attributed to a change in meaning of the subjective evaluation. | – An interview would need to be conducted with (a subsample of) respondents. The first part of the interview needs to be aimed at ascertaining whether (lack of) changes in responses over time reflect target change or rather noise or other causes. Then respondents can be invited to reflect on their responses to the vignettes in an open, non-leading way, and to explain why they gave those responses. Previous responses to the vignettes may need to be provided to enable respondents to make such comparisons. | |
Individualized Methods | ||
Schedule for the Evaluation of Individual Quality of Life (SeiQol) [41] Patient Generated Index [42] | – Change in weights (reprioritization) may be an artefact of the calculation method as they need to add up to 100 (or to 12 or 14 imaginary points for the PGI). A decrease in the relative importance of one cue implies increases in the relative importance of other cues. | – Given this operationalization, be aware that the absolute and relative changes in weights need to be interpreted with caution. |
– Change in domain content (reconceptualization) may be caused by forgetting to nominate a domain previously mentioned (recall bias), not listing a domain that has improved, mentioning a different domain due to implicit theory of change2 or mentioning a similar domain at a different level of abstraction. | – All these causes will make it difficult to determine whether the domains mentioned over time are similar or dissimilar. Conducting an interview will reveal what patients had intended with their responses. An alternative is to provide the domains mentioned at baseline at follow-up and let respondents indicate whether the previously mentioned domains still hold or need to be changed. The possible, unintended influence of this approach on the results needs to be discussed. | |
– If used at the individual level, changes in ranking or content of domains may be attributed to chance fluctuations, such as changes in mood or just measurement error. | – Again, only with an interview one may be able to distinguish target change from noise. | |
Latent Variable Models | ||
Structural Equation Models (SEM) [12] | – Misspecification of the measurement model (e.g. ignoring multidimensionality, response dependence over timec). | – Make sure the model is consistent with the data by examining model fit using several model fit indices and examining residual dependencies. |
– Inter-relations between the different forms of response shift: reprioritization may in fact reflect non-uniform recalibration and vice-versa. | – Use substantive arguments (e.g. theoretical notions, clinical insight, and/or common sense) to decide which type of response shift is most likely. | |
– Change in residual variances (non-uniform recalibration) can also be due to change in intercepts (uniform recalibration) or in factor loadings (reprioritization) going in different directions. | – Change in residual variances may be caused by heterogeneity in a sample. To examine heterogeneity, covariates can be incorporated, based on theory and clinical insight, to test whether changes in intercepts or factor loadings differ in direction and/or magnitude across subgroups. | |
– Induced violations of the PCI may occur due to missing data. | – Sensitivity analyses (e.g. imputation by specifying the imputation models) can be used to assess the robustness of the results to missing data. Careful examination of missing data patterns is essential (e.g. missingness over time may not be random and could be correlated with a response shift effect) [43]. Different approaches for handling missing data can be used, e.g. robust full information maximum likelihood estimates. However, to date, there is no consensus regarding the best method to handle missing data [44]. | |
– Without further data, it is uncertain whether the results can be attributed to a change in meaning of the subjective evaluation. | – Qualitative research about response processes, which may include interviews with (a subgroup of) respondents, are needed to understand whether the changes in responses to PROMs are attributable to change in meaning of the subjective evaluation, or something else. One way is to conduct interviews at each measurement occasion about the response processes (e.g. using think-aloud interviews, or verbal probing techniques conducted after completion of a questionnaire aimed at understanding why respondents chose the answers they gave) [22, 30, 31]. Another way is to conduct interviews only at follow-up and give back the answers respondents provided at baseline and invite them to reflect on the comparison of these responses (see also interviews for then-test and vignettes). | |
Other, circumstantial evidence may also be sought. | ||
– Misspecification of the measurement model (e.g. ignoring multidimensionality, item response dependence over timec). | ||
– Inter-relations between the different forms of response shift: Reprioritization may in fact reflect non-uniform recalibration and vice-versa (only IRT, not RMT). | – Use substantive evidence (e.g. theory, clinical insight, common sense) to decide which type of response shift is most likely. | |
– Differential change in difficulty parameters (non-uniform recalibration) can also be due to uniform recalibration (or reprioritization for IRT) response shifts going in different directions. | – Again, this confusion may be caused by heterogeneity in a sample. Incorporate covariates, based on theory and clinical insight, to test whether changes in item parameters differ in direction and/or magnitude across subgroups. | |
– Induced violations of the PCI may occur due to missing data. | – Sensitivity analyses (e.g. imputation by specifying the imputation models) can be used to assess the robustness of the results to missing data. Careful examination of missing data patterns is essential (e.g. missingness over time may not be random and could be correlated with a response shift effect) [43]. The robustness of IRT/RMT models to missing data still needs to be assessed for response shift detection. | |
– Without further data, it is uncertain whether the results can be attributed to a change in meaning of the subjective evaluation. | – Conduct qualitative interviews with (a subgroup of) respondents to understand whether the changes in responses to PROMs are attributable to change in meaning of the subjective evaluation, or something else (see further under SEM). | |
Regression methods without classification | ||
Relative Importance Analysis [49] | – Relative importance of component domains is sensitive to non-normal data distributions and multi-collinearity when the analysis is conducted using discriminant analysis and logistic regression, respectively, leading to false rank ordering of the domains and false detection of reprioritization response shift. | – Start with checking the distributional assumptions for the domains and transform the data to normality, wherever appropriate when using discriminant analysis. Inspect multi-collinearity using for example the Variance Inflation Factor (VIF) when using logistic regression. |
– Change in relative importance weights or ranks may be due to the existence of more than two observed subgroups (i.e. heterogeneity due to presence of latent groups). | – As this method requires the a priori identification of two independent groups, seek the strongest evidence possible to form the two mutually exclusive subgroups, using theory, clinical knowledge, and common sense. | |
– Artificial rank ordering of the domains and false detection of reprioritization response shift may be affected by missing data and mechanisms of missingness. | – Sensitivity analyses (e.g. imputation by specifying the imputation models) can be used to assess the robustness of the results to missing data. Careful examination of missing data patterns is essential (e.g. missingness over time may not be random and could be correlated with a response shift effect) [43]. | |
– Without further data, it is uncertain whether the results can be attributed to a change in meaning of the subjective evaluation. | – Conduct qualitative interviews with (a subgroup of) respondents to understand whether the changes in responses to PROMs are attributable to change in meaning of the subjective evaluation, or something else (see further under SEM). | |
Regression methods with classification | ||
Classification and Regression Tree (CART) [50] | – This method might be prone to model overfitting leading to false detection of response shift. | – Tree pruning or cross validation should be used to avoid overfitting. |
– If the classification variable (e.g. clinical status) is measured with measurement error or if it is unrelated to the PROM, spurious inconsistent changes in PROM scores and in the classification variable may occur, leading to false detection of recalibration response shift. | – Employ valid and reliable instruments to measure the classification variable. Use theory and clinical knowledge to identify a priori hypothesized relationship between changes in PROM scores and changes in the classification variable. | |
– A selective bias towards covariates with many possible splits (i.e. with large numbers of values or categories) may lead to false detection of reprioritization response shift. | – Use theory, clinical insight, and common sense when interpreting the results of the partitioning. Conditional inference tree can be used to minimize this bias [51]. | |
– Without further data, it is uncertain whether the results can be attributed to a change in meaning of the subjective evaluation. | – Conduct qualitative interviews with (a subgroup of) respondents to understand whether the changes in responses to PROMs are attributable to change in meaning of the subjective evaluation, or something else (see further under SEM). | |
Random Forest Regression [52] | – The choice of average variable importance metric can affect the rank ordering of the component domains at each occasion and lead to false response shift detection. | – An alternative implementation of random forests based on a conditional inference framework can be applied [53] to obtain reliable variable importance measures. |
– When the autocorrelations within each domain is ignored, this might affect the estimated importance of each domain (i.e. average variable importance) and possibly the detection of response shift. | – Some recent developments on Repeated Measures Random Forests [54] may be used to handle correlated repeated measures. | |
– Missing data may result in biased estimates of variable importance. | – A decision tree algorithm named Branch-Exclusive Splits Trees (BEST) that handles MCAR, MAR and MNAR missing data, can be used in case of missing data in predictors [55]. | |
– Without further data, it is uncertain whether the results can be attributed to a change in meaning of the subjective evaluation. | – Conduct qualitative interviews with (a subgroup of) respondents to understand whether the changes in responses to PROMs are attributable to change in meaning of the subjective evaluation, or something else (see further under SEM). | |
Mixed Models and Growth Mixture Models [56] | – Misspecification of the mixed model for predictions (e.g. misspecified predictors, interactions, covariance structure) might lead to inaccurate trajectories for the residuals from which response shift is deduced. | – Make sure clinical knowledge is taken on board when constructing the mixed model, using Directed Acyclic Graphs. Test for misspecification of the random effects distribution and their structure to improve the model, if appropriate. |
– Non-monotonic trajectory patterns of residuals suggesting response shift may be attributable to other phenomena, such as cognitive impairment. | – Use theory and clinical knowledge to assess the possible relationships between some clinical characteristics and the trajectory of residuals Provide sensitivity analyses with and without the patients having these clinical characteristics to assess the robustness of the results. | |
– Biased trajectories of residuals suggesting response shift may occur due to missing data. | – Sensitivity analyses (e.g. available case analysis, imputation by specifying the imputation models, shared Parameter Mixture Model [57] can be used to assess the robustness of the results to missing data. | |
– Without further data, it is uncertain whether the results can be attributed to a change in meaning of the subjective evaluation. | – Conduct qualitative interviews with (a subgroup of) respondents to understand whether the changes in responses to PROMs are attributable to change in meaning of the subjective evaluation, or something else (see further under SEM). |
Exploring alternative explanations
Different purposes
Detection | Explanation | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Methods | Observed change ≠ target change | Directly Attributable to change in meaning | Effect size estimatesa | Classification of respondentsb | Adjusting for response shift | Recalibration | Reprioritization | Reconcept-ualization | Other changes in response processes | Inclusion of explanatory variables |
Design-based methods | ||||||||||
Then-test [15] | √ | - | √ | – | √ | √ | – | – | – | √ |
Appraisal, using QOLAP or BAPc [3] | √ | – | –c | – | – | – | – | – | √c | – |
Semi-structured interview [21] | √ | √ | – | √ | – | √ | √ | √ | √d | √ |
Vignettes [40] | – | – | √e | – | – | – | √ | – | – | √ |
Individualized methods | ||||||||||
Schedule for the Evaluation of Individual Quality of Life [41] Patient Generated Index [42] | – | – | √f | √f | – | – | √ | √ | – | √ |
Latent variable methods | ||||||||||
Structural Equation Modelling [12] | √ | – | √ | – | √ | √ | √ | √g | – | √ |
√ | – | √ | – | √ | √ | √ (IRT) − (RMT) | √g | – | √ | |
Regression methods without classification | ||||||||||
Relative Importance Analysis [49] | – | – | – | – | – | – | √ | – | – | √h |
Regression methods with classification | ||||||||||
Classification and Regression Tree (CART) [50] | – | – | – | √ | – | √ | √ | – | – | √h |
Random forest regression [52] | – | – | – | – | – | – | √ | – | – | √h |
Mixed models and growth mixture models [56] | √ | – | √i | √ | √i | – | √i | – | √i | √ |