Introduction
Major depressive disorder (MDD) and anxiety disorders are the most prevalent, often co-occurring, emotional disorders found in the western world. The disorders are frequently associated with functional impairment and carry high socio-economic costs (Whiteford et al.
2013; Wittchen et al.
2011). Although cognitive behaviour therapy (CBT) and other evidence-based treatments exist for these disorders and generally show good effects, they do not work for every individual (Dunlop et al.
2017; Hofmann et al.
2012; Loerinc et al.
2015; Springer et al.
2018).
National treatment guidelines recommend CBT as one of the treatments for anxiety and for depression (NICE
2009,
2011,
2013; Danish Health Authority
2007a,
b). CBT manuals traditionally address a single disorder by targeting specific psychological mechanisms (i.e., cognitive distortions) that are believed to maintain a particular disorder. Within the CBT framework, however, the frequent occurrence of co-morbidity among emotional disorders has spurred the development of transdiagnostic CBT, i.e., treatments that apply a unified set of interventions to address several anxiety disorders, depression, and other emotional disorders (Barlow et al.
2016; McEvoy et al.
2009; Reinholt and Krogh
2014; Reinholt et al.
2017). The Unified Protocol for Transdiagnostic Treatment of Emotional Disorders (UP, Barlow et al.
2011a,
b; Barlow et al.
2018) targets negative affectivity by stimulating emotion regulation strategies away from avoidance towards more adaptive strategies (Barlow et al.
2014; Sauer-Zavala et al.
2012). This builds on empirical evidence which suggests that negative affectivity is an important transdiagnostic process underlying all emotional disorders (Brown et al.
1998; Clark and Watson
1991; Krueger et al.
2005). In contrast, the UP integrates standard cognitive and behavioral techniques with mindfulness-based techniques, with the main focus on learning to accept emotional experiences and manage difficult situations, despite strong emotions.
In a recent randomized controlled trial for anxiety disorders, the UP and diagnosis-specific CBT protocols showed comparable symptom reductions (Barlow et al.
2017). Other studies furthermore suggest that transdiagnostic CBT (e.g., the UP) delivered in groups for patients with anxiety disorders and/or MDD can be effective in reducing anxiety and depressive symptoms (Bullis et al.
2015; de Ornelas Maia et al.
2015; Laposa et al.
2017; Norton and Hope
2005; Norton and Barrera
2012; Osma et al.
2015; Reinholt et al.
2017; Zemestani et al.
2017). However, no randomized controlled trials comparing diagnosis-specific CBT with the UP for groups including patients with MDD have been published (Arnfred et al.
2017; Reinholt et al.
2017).
While the reported average effects of the UP versus diagnosis-specific CBT are comparable, it is possible that some patients would have greater benefit from a broader focus on emotion regulation as applied in the UP, while others would benefit more from the specific symptom-focused approach applied in diagnosis-specific CBT. Clinical practice would therefore benefit from knowledge to assist the matching of individuals with the most optimal treatment (personalized approaches to clinical decision making).
Studying individual pre-treatment characteristics that reliably predict (differential) treatment outcomes is a possible approach to the identification of patients who would benefit most from transdiagnostic and/or diagnosis-specific CBT. Such individual characteristics may be either prescriptive or prognostic. Prescriptive variables (i.e., moderators) can help identify for whom or under what conditions a treatment has a certain causal effect on outcome. Moderators are thus useful in stratifying a population into subgroups: those who would experience greater improvement from the transdiagnostic CBT and those for whom diagnosis-specific CBT would be best (Kraemer
2013). Treatment outcomes can be predicted by prognostic variables (i.e., predictors), irrespective of treatment type. Researchers have recently developed multivariable models or algorithms to predict (differential) outcomes by integrating multiple predictors and/or moderators (Cohen and DeRubeis
2018). Introduced by DeRubeis and colleagues, the Personalized Advantage Index (PAI, DeRubeis et al.
2014a,
b) is a promising approach to the prediction of differential treatment effects, which has been replicated by others (Huibers et al.
2015; Keefe et al.
2018; Vittengl et al.
2017; Webb et al.
2019; Zilcha-Mano et al.
2016). This two-step approach first selects relevant predictors and moderators of treatment outcomes, then uses the variables to construct a PAI to recommend a specific treatment for each individual. The recommendation is based on quantitative estimates of the predicted advantage of the optimal treatment over the non-optimal treatment. Further examples of multivariable prediction models to guide treatment selection are provided by the “matching factor” (Barber and Muenz
1996), the “nearest-neighbours” approach (Lutz et al.
2006), and the M* approach (Kraemer
2013; Niles et al.
2017a,
b; Niles et al.
2017a,
b; Smagula et al.
2016; Wallace et al.
2013). A number of multivariable prediction models containing only prognostic information have been developed over the past years. Studies using the Prognostic Index (PI) are other promising attempts in which a quantified estimate of the individual's prognosis can help determine the needed level of care (Lorenzo-Luaces et al.
2017; van Bronswijk et al.
2019a). Such studies would be particularly relevant with regard to group CBT, since the treatment of patients with anxiety and depression is often fraught with uncertainty regarding treatment outcomes. The evidence concerning predictors of CBT effects is mainly derived from individual treatment studies (Marker et al.
2019); whether the same predictors are relevant for group CBT is not known. However, there is evidence to suggest that:
(a)
co-morbid depression in group CBT for anxiety is a poor predictor of treatment effects (Talkovsky et al.
2016)
(b)
interpersonal problems (measured before treatment) predict lower effect of group CBT for depression (Mcevoy et al.
2013)
(c)
motivation predicts the effect of group CBT for anxiety disorders (Marker et al.
2019)
(d)
baseline severity and level of depression moderate differential effect of group CBT and mindfulness-based stress reduction for anxiety disorders (Arch and Ayers
2013).
As the evidence cited above appears from single studies with relatively modest sample sizes, our knowledge is limited as to what can help us predict who will improve during any type of group CBT and who will improve more from UP in group compared with diagnosis-specific group CBT.
The aim of the present paper was to examine moderators of treatment effects of the UP and the diagnosis-specific group CBT, using the PAI approach in a multi-site randomized controlled non-inferiority trial. The sampled patients suffered relatively severe and chronic symptoms, and the majority had failed to respond to either psychotherapy or medication. As this is typical of patients in Danish psychiatric outpatient settings, the search for moderators and predictors is even more relevant.
Should we fail to identify any moderators, we had planned to continue with an examination of general predictors to develop a PI for the estimation of personalized prediction of outcome, irrespective of the received treatment. We likewise intended to proceed to gauge the site specificity of the final model to test the generalizability of the results to other contexts. We are not aware of any other studies that have examined moderators of treatment outcome in transdiagnostic versus diagnosis-specific CBT. Furthermore, as the knowledge of predictors of group CBT treatment outcomes is very limited, we used a data-driven analysis strategy, rather than testing specific hypotheses. The study was thus exploratory in nature, aiming at using findings for the design of hypothesis-driven studies.
Results
Variable Description and Missing Data
The PI analyses included 291 randomized patients (UP: n = 148; diagnosis-specific CBT: n = 143). The means and frequencies of the clinical characteristics of the patients are listed in Table
1. Online Appendix B lists demographic and background information. The median number of completed sessions in both groups was 10. Four or more group sessions were completed by 228 patients (UP: n = 110; diagnosis-specific CBT: n = 118), who were included in the PAI analyses.
There were no significant differences between the two treatment groups regarding demographic or clinical characteristics except that the UP group had higher levels of anxiety symptoms on HAM-A.
A total of 14.5% of the observations were missing. We tested the imputation method by artificially producing the missing data on the data of the 81 patients with no missing pre-treatment data or missing outcome data before Session 5. With a 0.33 error term for the continuous variables (NRMSE), and a 0.41 error term for the discrete variables (PFC), the imputation method appeared satisfactory.
Variable Selection for Personalized Advantage Index
Age, the Positive Affect Schedule, level of the detachment personality trait, duration of disorder, and BDI-II were identified by running the model-based recursive partitioning method, with age having the largest variable importance score. None of these variables, however, were selected as moderators in at least 60% of the bootstrap samples using the backwards elimination technique, indicating that no robust moderators were available for building a PAI.
Variable Selection for PI
Thirteen predictors of outcome were identified by running the model-based recursive partitioning method in the search of predictors. They are depicted in Table
2.
Table 2
Predictors selected with model-based recursive partitioning technique (N = 291)
1 | Depression severity (BDI-II) | 37.8 | 9.52 | 90.48 |
2 | Detachment (PID-5) | 78.1 | 0.13 | 99.87 |
3 | Level of personality functioning (LPFS-BF) | 28.0 | 56.43 | 43.57 |
4 | Positive affect (PANAS) | 93.0 | 0.00 | 100.00 |
5 | Duration of disorder | 87.1 | 0.00 | 100.00 |
6 | Perseverative thinking (PTQ) | 43.9 | 99.77 | 0.23 |
7 | Cognitive reappraisal (ERQ) | 62.0 | 0.48 | 99.52 |
8 | Expressive Suppression (ERQ) | 36.8 | 0.82 | 99.18 |
9 | Emotion regulation (ERSQ) | 23.1 | 55.41 | 44.59 |
10 | Psychoticism (PID-5) | 40.1 | 95.26 | 4.74 |
11 | Certainty about mental states (RFQ) | 49.6 | 99.60 | 0.40 |
12 | Anxiety severity (HAM-A6) | 26.6 | 8.27 | 91.73 |
13 | Disinhibition (PID-5) | 32.0 | 3.44 | 96.56 |
Only level of positive affect (measured by the PANAS), duration of disorder, level of
detachment personality trait, and cognitive reappraisal were selected in at least 60% of the bootstrap samples (Table
2, in bold). More than 99% of the bootstrap samples showed negative coefficients, indicating that longer duration of disorder, higher levels of positive affect, detachment, and cognitive reappraisal were associated with less improvement during treatment.
Estimating PI Scores Using Fivefold Cross Validation
The four selected predictors were combined into the following multiple linear regression model: SlopeWHO-5 = β + β1*log(positive affect) + β2*log(length of disorder) + β3*(detachment) + β4*(cognitive reappraisal). Comparing the actual "observed" slope with the predicted slope (based on weights in four of five folds), we found a mean difference of 0.01, with a 95% CI between − 0.10 and 0.13. The correlation (Pearson's r) was 0.25 (p < 0.001).
Generalizability of Variable Selection and Model Fitting
When we separately conducted the two variable selection steps at each treatment site (Site 2 versus Sites 1 and 3), we were unable to identify any predictors, which were selected both in the primary PI and in the PIs on the two separate samples (see Online Appendix C).
Sensitivity Analyses
No considerable change in results was observed when the analyses were repeated without imputing the outcome variable except that only two variables were selected in both steps: positive affect and duration of disorder. Changing the parameters in the model-based recursive partitioning method also failed to modify the results (details in Online Appendix C).
Discussion
We were unable to identify any robust moderators of differential treatment outcome in this study. This suggests that the pre-treatment patient variables trialled here are of no use in predicting whether patients with social anxiety disorder, agoraphobia/panic disorder, and/or MDD will improve more or less from the UP compared with diagnosis-specific group CBT. We identified four predictors of outcome: level of positive affect, duration of disorder, the
detachment personality trait, and the
cognitive reappraisal coping strategy. The predictors were negatively associated with outcome, indicating that higher pre-treatment levels of positive affect, i.e., feeling enthusiastic, active, and alert, predicted less improvement during treatment. This may be surprising but could result from the fact that positive affect and well-being are highly correlated and that a high level of well-being at baseline allows less room for improvement and thus a lower slope. This finding corresponds with those of a study of social phobia in which high positive affect before treatment was found to predict less improvement in quality of life during treatment (Sewart et al.
2019). Cognitive reappraisal as an emotion regulation strategy is thought to intervene early in the emotion-generative process and typically leads to more positive emotions and less negative emotions (Gross and John
2003). Our finding that this coping strategy predicts less improvement may therefore be surprising, whereas it may be less surprising that a longer duration of disorder and the detachment personality trait (the tendency to avoid socioemotional experience) predicted less improvement. According to the sensitivity analysis concerning site specificity, however, the PI model was not reliable across treatment sites. This indicates that further study is needed to improve the PI. Variables such as motivation and expectancy as well as interpersonal problems could be relevant for inclusion in future studies. Measures of stability/strain in everyday life should likewise be considered, since these aspects of the patient’s life may interfere with home work engagement that has been shown to mediate the effect of CBT in some studies (Cammin‐Nowak et al.
2013; Westra et al.
2007).
Our results provide no evidence to support a preference for either UP or diagnosis-specific group CBT for a given patient. Accordingly, treatment selection may be based on patient preferences or logistics as well as on the variables examined. However, other variables, not addressed in the present study (e.g., interpersonal problems or prior experience with CBT), should be investigated in future studies before any conclusions can be drawn as to the existence of moderators. In some treatment settings, a logistic advantage may be obtained by using transdiagnostic group CBT rather than diagnosis-specific group CBT since patients would not have to wait until a sufficient number of patients with the same diagnosis were ready to enter their designated group. Moreover, shifting from diagnosis-specific to transdiagnostic CBT may potentially reduce the costs of training therapists, as they would need training only in one manual.
Several aspects may be considered when attempting to explain the lack of identified moderators in this study. Despite differences in the treatment targets in the UP and diagnosis-specific CBT and the heterogeneity of the UP group (only this group had patients with different diagnoses), a likely explanation is that the two therapy formats were too similar in nature and used similar or overlapping interventions. For instance, although the therapies were delivered with different rationale and goals, exposure was an important intervention in both arms of the study. This may have rendered the search for moderators less relevant here than in previous PAI studies, where psychotherapy was compared with medications (e.g., DeRubeis et al.
2014a,
b; Vittengl et al.
2017) or CBT was compared with either interpersonal therapy (Huibers et al.
2015; van Bronswijk et al.
2019a,
b) or psychodynamic therapy (Cohen et al.
2020).
Another important question concerns the quality of treatment. Both treatment formats led to limited improvement as a WHO-5 slope of some 0.6 (95% CI 0.43–0.77) corresponds to an 8.4-point change (6.02–0.78) over the 14 weeks. The mean change obtained in our study thus failed to reach the clinically significant level, which was defined as a 10-point improvement on the WHO-5 scale (Arnfred et al.
2017). However, as the current study was set in secondary services, the included patients had previously failed to respond to medication and/or psychotherapy offered by primary services (typically delivered by a general practitioner or a private practice psychiatrist or psychologist). The patients may therefore be classified as treatment resistant, with a very low level of functioning in daily life (e.g., less than 10% worked full-time at time of study enrolment; 22% were classified as students). In addition, the use of the well-being index as an outcome measure may have diminished the obtained treatment effect, since positive variables tend to be less responsive as treatment targets (Sewart et al.
2019). However, changes in well-being and symptoms were similar in magnitude, which suggests that this explanation is less plausible here (Reinholt et al., under review). Taking into account the severity and chronicity of the sample as well as the relatively short duration of the treatment, the positive nature of the outcome measure, and the fact that the rating of treatment quality and adherence was high, we find it unlikely that the modest treatment effect reflects poor treatment quality. It cannot be ruled out, however, that moderators may have been identified if we had used a less chronic sample, longer duration of treatment, or another outcome measure, thereby increasing the potential of change during treatment. The homogeneity of the study population is another possible explanation for the failure to detect differences in outcome, since the participants were all non-responders to first-line treatment. Likewise, a high proportion of the group may have been what DeRubeis and colleagues have termed as
intractable, i.e., patients who would experience no improvement no matter the quality of therapy provided (DeRubeis et al.
2014a,
b). However, the highly varied response renders this explanation less likely. The slope range of − 3.56 to 6.24 corresponds to a change in WHO-5 range from − 49.8 to 87.36 points.
It is also possible that the samples from the three treatment sites differed since it was not possible to identify the same predictors in the three sites when conducting the analyses separately. However, treatment site was not selected as an important variable in the first variable selection step, at which the samples were pooled.
It may be speculated that we overestimated the capacity of multivariable prediction models at this stage of model development. Perhaps the difficulty of building prediction algorithms is greater than expected. Other factors may have played a role, such as the numerous interactions between variables, which may have rendered the models too complex, combined with the possibility that pre-treatment variables were less important than processes during or outside treatment (i.e., non-specific treatment factors such as alliance and group cohesion or unexpected events in the family or at work, etc., Kazantzis et al.
2018; Zilcha-Mano
2017).
Proponents of the complex network approach to psychopathology argue that symptoms are not a reflection of an underlying disease or dimension (e.g., neuroticism), but rather that they constitute the disease itself in the form of a complex network of elements interacting in ways that tend to maintain themselves (Hofmann et al.
2016). In this perspective, response to treatment is caused by critical transitions in network states; hence, pre-treatment factors do not necessarily offer important information on the likelihood of change or the resilience of a given network, which may explain the scarcity of predictors of treatment effect found in the current study. Furthermore, it is possible that the similarities between the two treatment conditions may have provided a similar push to the complex networks, which may explain the failure to find any moderator of treatment effect. In a complex network perspective, the current understanding of moderation as the effect of baseline characteristics may be too narrow, inasmuch as a reassessment of the same patient may be needed to capture the dynamics of a complex ideographic network (Hofmann and Hayes
2019).
Variable selection methods provide a further point of discussion. Such approaches may be biased towards multi-category variables, which are more likely to be selected in the model (Kim and Loh
2001; Strobl et al.
2007). To reduce bias in the current study, predictors were standardized before running the model-based recursive partitioning model. We cannot rule out, however, a bias in the variable selection and the estimation of variable importance scores. The effect may be that less optimal predictors are carried forward to the next step. Additionally, the sample size of our study may have been too small. Power calculation is no simple task in analyses such as these, but a newly published simulation study by Luedtke and colleagues conclude that a sample size of 300 per treatment arm is required for sufficient statistical power to use multivariable prediction models with four predictors for comparison of two or more treatments (Luedtke et al.
2019). On the other hand, the current study has a larger sample size than most of the previous PAI studies.
Some strengths of the study should also be noted. To the best of our knowledge, this is the first study to examine moderators of differential treatment outcome in UP and diagnosis-specific CBT. In contrast to previous studies comparing UP and diagnosis-specific CBT, our sample of 291 patients included a subgroup with a primary diagnosis of MDD, thus allowing us to examine predictors and moderators in both anxiety disorders and MDD. We used a two-step machine learning approach to identify relevant predictors and moderators of treatment outcome rather than basing PAI treatment selection on simple linear regression models, as in earlier approaches. Our approach thus incorporated internal validation techniques such as bootstrapping to maximize the stability and generalizability of these models. The random forest approach is a comparatively stable model when
n is relatively low and the number of predictors high (Bureau et al.
2005; Heidema et al.
2006). Without compromising the stability of the model, this approach also allows for the inclusion of many predictors and prevents weaker predictors from being dominated by stronger ones. In addition, the approach is capable of estimating linear and non-linear associations (Strobl et al.
2008).
Acknowledgements
The study was supported by TRYGfonden (ID 114241), the Capital Region Mental Health Services, Region Zealand Mental Health Services, the Jascha Foundation, the Psychiatric Research Foundation of Central Denmark Region, the Ivan Nielsen Foundation for Individuals with Particular Mental Disorders, and the Foundation for Research on Mental Disorders. The funding sources were not involved in the study design, collection, analysis, data interpretation, writing of the paper, or the decision to submit the article for publication. We are grateful to the UP Institute, Center for Anxiety and Related Disorders (CARD), Boston University for their invaluable advice concerning the manual and group delivery and to all the participating therapists and patients.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.