Introduction
Measurement-based care (MBC) uses patient-reported rating scales in conjunction with evidence-based clinical practice guidelines to provide an objective assessment of patient progress over time to guide a more precise plan of care [
1], prevent treatment failure [
2], and collect data for quality management [
3]. Through the use of patient-reported outcome, PROs [
4], the patient’s voice is being heard, quantified, and compared to normative data in a large variety of domains [
5]. This is in line with international trends, where more emphasis is put on the value of health care in terms of outcome [
6] and patients are granted a more prominent role [
7].
Current health definitions involve at least three domains: physical, mental, and social health that should be prioritized in delivering health care [
8,
9]. Ideally, health outcomes should include all three domains of health in a full cycle of care [
10]. In somatic care (e.g., oncology), the use of MBC has become largely routine, and usually several health domains are measured (e.g., physical symptoms, functioning, anxiety, depression) [
1]. In psychiatric care, on the other hand, MBC is less standard practice, due to several barriers, such as lack of agreement on key outcome domains and lack of empirical data on outcome measures [
3]. Measurement often mainly focuses on mental health without including other measurement domains, such as functioning or wellbeing [
11,
12].
In mental health care research, outcome is commonly assessed by comparing the severity of psychopathology before and after treatment with generic or disorder-specific instruments. In this context, a widely used instrument is the Brief Symptom Inventory, BSI [
13,
14], which provides with its total score (the Global Severity index) information regarding the severity of general psychopathology as well as on specific symptoms, such as depression and anxiety. In assessing the severity of psychopathology, these instruments measure signs and symptoms of the disorder (e.g., those listed in the prevailing taxonomy of mental disorders, the DSM-5).
However, operationalization of outcome in mental health care by signs and symptoms has been criticized as too narrow and too much focused on deficits [
10,
15]. Health is more than the absence of signs and symptoms. Patients’ view on their health-related quality of life (HRQOL) offers a broader conceptualization and may yield a useful additional indicator of treatment outcome. HRQOL is defined as the quality of life relative to one’s health or disease status, and it is commonly conceived as dynamic, subjective, and multidimensional [
16]. This shift in emphasis is also reflected in the emergence of the recovery movement in psychiatry, with its distinction between clinical and personal recovery [
17], and positive psychology [
18,
19] as well as positive psychiatry movements [
20].
An instrument for the assessment of HRQOL is the Short Form-36, SF-36 [
21,
22], widely used in health care and mental health care. The existing literature on HRQOL in mental health care research is predominantly concerned with severe mental disorders, such as psychotic disorders [
23‐
25], for which the measurement of HRQOL is seen as a necessary addition to other outcome domains such as psychopathology. For common mental disorders, such as mood- and anxiety disorders, the value of adding HRQOL to the assessment of treatment outcome is less well investigated. Assessment of HRQOL in mood disorders has been recommended [
26], but little research comparing measures head-to-head has been done [
27]. In a meta-analyses for anxiety disorders, Olatunji et al. [
28] reported that Post-Traumatic Stress Disorder (PTSD) in particular was associated with decreased HRQOL. This finding was confirmed in a recent study with PTSD patients [
29], demonstrating a strong association between the change in depression symptoms and change in HRQOL, which could be expected as depression symptoms are incorporated in HRQOL measures.
Thus, there is a strong plea for a broader assessment of the benefits of mental health care than mere symptom relief [
30]. Adding an outcome domain to signs and symptoms is especially valuable when a decrease in signs and symptoms correlates only moderately with increased health-related quality of life. This may be the case when change over time on both constructs does not occur in synchrony. Many patients fist show improvements in symptomatology, to be followed later on by an increase in health-related quality of life. However, the precise association between symptomatology and quality of life in MHC is still poorly understood [
31].
The present study used routine outcome monitoring (ROM) data of outpatients with common mental disorders to investigate and compare both outcome domains. Longitudinal data of the BSI and the SF-36 were compared, their correlation assessed, and—more importantly—the concordance between a decrease in score on the BSI with an increase in score on the SF-36 over time was established. We investigated whether the overall magnitude and the pace of change over time was similar in both domains. After all, a common hypothesis is that therapeutic change is first manifest on symptoms (in this case measured by the BSI), to be followed later by improved functioning or HRQOL [
32,
33]. We analyzed this issue using a subset of the sample with four repeated assessment per patient, which enabled us to test this hypothesis regarding the dyssynchrony of response on the two outcome domains of psychopathology and HRQOL.
In sum, the main objective of the study was to investigate and compare the responsiveness of two outcome measures: a symptom checklist (BSI) and a HRQOL measure (SF-36). An asynchronous response pattern on both constructs was hypothesized: Symptoms decrease first, followed later on by an increase in quality of life.
Discussion
The main findings of this study are as follows. There is a high correlation between the BSI total score and component and scale scores of the SF-36, especially for the MH scale. This is in spite of the fact that the BSI and the SF-36 use different temporal instructions for the rated time frame: 1 week vs. 4 weeks, respectively. The substantial concordance between the BSI and the MH scale of the SF-36 is not surprising, given the content of the relevant SF-36 items (“Did you feel nervous?,” “Did you feel downhearted and blue?”), which are very similar to BSI items (“Nervousness or shakiness inside,” “Feeling blue”). The MH scale of the SF-36 demonstrated somewhat larger changes than the BSI total scale. A likely explanation for this finding is that the BSI total scale contains a substantial number of items with low relevancy for patients with mood or anxiety disorders (e.g., “The idea that you should be punished for your sins,” “Feeling that you are watched or talked about by others”). Apparently, for common mental disorders, the generic applicability of the BSI is offset by a somewhat diminished ability to demonstrate change.
The hypothesis of a delayed response on the SF-36 was not supported by scores on the BSI and the SF-36. The data showed a similar linear pattern of change over time for the BSI total score and the mental component of the SF-36. We found a diminished response on the physical component score, but not a delayed response. Generally, scale scores of patients changed over time similarly, but to a lesser extent on physical component scales as compared to mental component scales and BSI scales. These results could have been affected by selective loss to follow-up, as four assessments were only available for 25% of the sample with at least two assessments. We compared the pretest scores of the sample with two assessments and the sample with four assessments and these were very similar. Furthermore, we inspected the course of scores over time of the samples with three (
n = 2946) or two assessments (
n = 5786) and found for these larger samples profiles that were very similar to Figs.
1 and
2. These results do not suggest that selective data loss explains the findings of the study.
When comparing the BSI with the SF-36 component scores, the physical component score demonstrated little change in this patient sample, and the BSI and the mental component score showed more or less equal amounts of change. This pattern of scores on the SF-36 is of course specific to patients treated for mental health problems. Likely, among patients treated for somatic diseases, the biggest change would occur on other SF-36 scales. For instance, Garratt et al. [
44] compared change in score on the SF-36 scales over time for four groups of patients with somatic diseases (low back pain, menorrhagia, suspected peptic ulcer, and varicose veins). They found that the BP and RP scales revealed the largest changes. Likewise, ten Klooster et al. [
45] demonstrated in patients with Rheumatoid Arthritis that the BP scale and the PCS showed the largest change in a retest period of six months. Finally, Frendl and Ware [
46] reported on a meta-analysis of 185 drug trials in which they examined change on the component scores for fourteen different somatic conditions, and the PCS score showed overall slightly more change than the MCS. The PCS showed the largest change with psoriatic arthritis and rheumatoid arthritis; the MCS showed larger changes in depression and psoriasis. Nevertheless, when treating psychiatric patients, the physical component score of the SF-36 is still informative, as somatic symptoms are important in their own right and can be an important cause for psychological distress [
47,
48].
Further research aimed at broadening the scope of treatment outcome in mental health care research is needed and should focus on other concepts with potential relevance, such as the recovery concept [
49]. Alternatively, in the direction of greater specificity, disorder-specific measurement instruments may yield more precise information on treatment gains [
50]. The present findings of greater changes on the BSI depression and BSI anxiety subscales—in spite of their brevity—lend support to this suggestion. Finally, the more recent development of item response theory-based computer adaptive tests (CAT), such as the PROMIS assessment battery [
51], may prove fruitful for outcomes research, as it allows for a more efficient assessment, without diminishing reliability which is usually associated with brief questionnaires.
A strength of the present study is its use of longitudinal data collected in everyday clinical practice, which enhances the generalizability of the findings. The size of the dataset implies ample statistical power to find differences between the outcome domains. On the other hand, the use of data collected under real-life circumstances yields less experimenter control, resulting in varied assessment intervals and substantial loss of data over time. Consequently, it is likely that treatment outcomes in the present study are somewhat inflated by selective loss of retest data, as patients who finished treatment unsuccessfully may decline to be reassessed. However, for a head-to-head comparison of outcome measures, the present data are very suitable, especially as the availability of lengthy assessments trajectories—four repeated assessments for a substantial number of patients—allowed for the investigation of synchrony of change on outcome domains.
Regarding the concordance between the BSI and the SF-36, it should be noted that the correlation coefficients presented in Table
3 may be a conservative estimate of the actual concordance of the underlying concepts of the instruments. The reliability of the BSI and the SF-36 scales determines the upper limit of their correlation according to the formula
\({r}_{\mathrm{max}}= \sqrt{{r}_{xx}{r}_{yy}}\) [
52]. The correlation between two scales can be corrected for their unreliability [
53] with the formula
\({r}^{*}=\frac{{r}_{xy}}{\sqrt{{r}_{xx}{r}_{yy}}}\) (
\({r}^{*}\) is the attenuated correlation,
\({r}_{xy}\) the correlation between the scales and
\({r}_{xx}\) and
\({r}_{yy}\) are the test–retest reliability coefficients of the scales). With
r = .82 for the Dutch version of the BSI–TOT score [
13] and
r = .80 for the SF-36-MCS [
54], the correlation between the BSI and the MCS would increase from
r = .61 to
\({r}^{*}\) =.75 and for EWB from
r = .75 to
\({r}^{*}\) = .93, indicating that the measured concepts are even more concordant than the unattenuated correlation coefficients of Table
3 reveal.
Finally, the present study focused on whether both instruments assessed change of similar size and pace. While the change appears to be of similar size and had a synchronous course, this head-to-head comparison leaves unclear whether highly similar (latent) variable(s) or dimensions were assessed. In line with this, further research is needed to reveal for which population groups and in which situations one instrument is more advantageous compared to another [
55].
Conclusion
We found correspondence but also significant differences between the BSI and the SF-36: change according to the BSI was similar to the mental component score (and its scales) of SF-36, but patients changed less on the physical component score and scale when compared to the mental component scores. Generally, the BSI and the SF-36 demonstrated a comparable degree of change in groups of patients, and this change occurs in similar size and pace. However, the profile of scores yielded by the SF-36 offers a more complete and more detailed clinical picture of the problems of individual patients, due to the additional domain of physical health.
Thus, the findings illustrate that there is considerable overlap between what is measured with the BSI and the SF-36, but also that each instrument contributes specific information regarding benefits from treatment. The BSI and the mental health component of the SF-36 offer similar specific information on symptom reduction or mental health gains. But the SF-36 clearly measures a broader construct and change on the physical component (and its scales) diverges from change on the SF-36 mental component as well as from change on the BSI. Finally, if the current findings regarding the substantial correlation between the mental component score and the BSI would be replicated with patients who are treated for somatic problems, the mental component scores of the SF-36 could be used to capture concurrent changes in psychological health.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.