Introduction

Mental health is a crucial capital in society [54]. However, major depression has become one of the leading causes of disease burden worldwide and its role is expected to grow [47]. In general, mood disorders may start early and last for decades with recurrent episodes [11, 24, 31, 32, 50, 56]. Initial treatment, if received [55, 66], may be delayed by years or recovery may be only partial [51, 62, 63]. This can lead to a chronic course [25], which may also occur with subthreshold depression [13, 16, 19, 25, 64, 65]. Indeed, epidemiological studies support the view of depression as a dynamic state evolving on a continuous scale [30]. Symptoms change over time, fulfilling the criteria for different types of depression as well as subsyndromal states [23].

Individual and societal stress especially strike the mentally ill, aggravating their disorder, hindering relief and restitution and inducing relapses and a chronic course [23, 54]. The growing discrepancy between the demands of and resources for mental health care requires both effective and adequate treatment interventions as well as easily applied tools to monitor recovery and mental health repeatedly with long follow-ups [10]. In addition, positive mental health and the availability of mental resources in an individual should be focused on [22, 60].

According to Vaillant [60], mental health can be conceptualized as maturity, emotional or social intelligence, successful adaptation, and subjective well-being, i.e. happiness and life satisfaction. Furthermore, subjective well-being is not only a desired state of mind but it is related to many other beneficial factors [12, 45]. In long follow-ups, those satisfied with their life have been shown to be better off in terms of longevity [3537], work ability [40], and depression [38], while the dissatisfied are faced with cumulative health hazards [6, 26, 33].

Thus, one of the key issues of mental health services is whether they actually improve the subjective well-being of the patients for whom psychopathology, especially depression and anxiety, plays an important role [14, 18, 41]. Life satisfaction ratings, measures of subjective well-being, have been criticized as being a new label on the old bottle “depression” [3, 27]. However, they can be concise and valid tools for screening and monitoring mental health [10, 3740, 46], thus increasing possibilities for the early detection and prevention of mental disorders. They also target the positive pole of mental health [10], which has largely been neglected. However, their relationships with general psychopathology, functional capacity, symptoms and recovery need further research. In addition, the evaluation and improvement of the mental health and well-being of patients, i.e. the outcome of treatment, requires assessments by both patients and clinicians. Nevertheless, prospective studies using and comparing both of these for the benefit of the patient and clinician are sparse [34, 43, 44, 52].

This 6-year follow-up study investigated the improvement in and relationships between mental health and well-being among patients referred to psychiatric care due to depression using psychometric scales—assessed by both patients and clinicians—for symptoms, general psychopathology, functional capacity and life satisfaction.

Methods

The kuopio depression study (KUDEP) involved consecutive patients (N = 203) referred for outpatient psychiatric care to kuopio university hospital (KUH) due to suspected depression [34]. They were included in the study if they were diagnosed as suffering from at least one specific mood disorder (ICD-10: F32–34, F41.1–2). Exclusion criteria were a central nervous system (CNS) disease, other severe somatic disease (such as recent myocardial infarction, sequelae of stroke), alcohol or drug dependence, a marked deficiency in cognitive capacity or other serious mental disorders such as schizophrenia or other psychoses. The baseline sample consisted of 185 subjects. Participation in this natural follow-up study did not affect their standard psychiatric treatment. The Ethics Committee of KUH and the University of Kuopio approved the study protocol. Data were collected from 1/1996 to 1/2004.

Data collection took place at baseline (T1, N = 185) and on follow-up after 0.5 yrs (T2, n = 168, i.e. 91%), 1 year (T3, n = 161, i.e. 87%), 2 years (T4, n = 148 i.e. 80%) and 6 years (T5, n = 121 i.e. 65%). Thus, the final number of drop-outs was 64 (35%).

At baseline, all the study subjects completed questionnaires assessing their sociodemographic background, health behavior, treatment history and performance on several psychometric scales. Data on treatment were also obtained from patients’ case records and interviews during the follow-up. At the T4 interview, the patients (n = 148) reported whether they had received any interpersonal treatment contacts due to depression in either psychiatric or primary care settings during the periods of 0–0.5 year (D1), 0.5–1 year (D2) and 1–2 years (D3). At T5 (n = 121), such contacts during 2–6 years (D4) and current treatment were also reported. Adequate drug therapy was a daily dosage of 150 mg TCA, 20 mg fluoxetine/citalopram, 50 mg sertraline or 75 mg venlafaxine for at least 3 months [57]. The duration of interpersonal treatment was categorized into three classes starting from the baseline and ending during: 1) D1 or D2; 2) D3; 3) D4 or still continuing at T5. Re-entering treatment meant skipping at least one treatment period, but re-entering in a later period.

The Structured Clinical Interview for DSM-III-R (SCID-I) was conducted to obtain the psychiatric diagnosis [59] at baseline (T1) and on 2-year (T4) and 6-year follow-up (T5) by a trained nurse who achieved a total kappa of 0.78 against the trainer in SCID diagnosis [61].

At each data collection (T1–5) several psychometric scales were administered. Scales measuring depression included the self-administered 21-item beck depression inventory (BDI-21, range 0–63; normal mood 0–9) [9]. The hamilton depression rating scale (HDRS, range 0–52; normal mood 0–7 [17] was assessed by the same trained nurse during the whole follow-up. The Montgomery–Åsberg depression rating scale (MADRS, range 0–60; normal mood 0–6) [49] was administered only at baseline by the physician who referred the patient to the study.

General psychopathology was assessed with a psychiatric self-report inventory, the symptom check list (SCL-90). This consists of 90 items with nine subscale scores: somatization (12 items), obsessive-compulsive (10 items), interpersonal sensitivity (9 items), depression (13 items), anxiety (10 items), hostility (6 items), phobic anxiety (7 items), psychotism (10 items), and paranoid ideation (6 items). It also includes seven additional items, primarily on sleep and eating disturbances [15, 20]. The instrument’s global index of distress, the global severity index (GSI), is the mean value of the all items (range 1–5).

The 4-item life satisfaction scale (LS, range 4–20; dissatisfaction 12–20) was originally modified from quality of life studies [2, 4, 42]. Study subjects assessed their interest and happiness in life, ease of living and loneliness with the following responses: very interesting/happy/easy/not at all lonely = 1; fairly interesting/happy/easy = 2; cannot say/missing data = 3; fairly boring/unhappy/hard/lonely = 4; very boring/unhappy/hard/lonely = 5 [36, 37].

The global assessment of functioning (GAF, range 1–100) was performed [58] at baseline by the physician referring the patients to the study and later by the same trained nurse. It covered functioning with the severity of symptoms included. The social and occupational functioning assessment scale (SOFAS, range 0–100) focuses on the level of functioning while excluding the severity of symptoms [5]. It was assessed by the same trained nurse throughout the follow-up. In both scales, a score >80 indicates good functioning and >60 adequate functioning with at most slight deficiencies.

Statistical methods

Data analysis was carried out with SPSS (version 13.0). Correlations were assessed with Pearson’s correlation coefficient. The statistical significance of differences was examined with the Pearson χ 2 test for categorical variables. For continuous variables the t test for independent samples was used, or, in the case of variables not following a normal distribution, the non-parametric Mann–Whitney U-test. To determine the statistical significance of changes, the paired t test for dependent samples as well as analysis of variance were used, or in the case of non-normally distributed variables, their non-parametric alternatives, Wilcoxon’s test and the Kruskal–Wallis test. When repeated measures analysis of variance was applied, the normality of the distributions of variables and residuals (standardized/unstandardized) was examined. The improvement effect was defined as the effect size (Cohen’s d), i.e. the standardized difference (divided by pooled standard deviation) between two means [28].

Results

The final study sample (n = 121) did not differ significantly from the drop-outs (n = 64) (Table 1) with respect to age, gender, education, financial situation, work status, subjective health or subjective work ability, the use of alcohol, the delay in receiving psychiatric treatment, psychometric scale sum scores or the proportion of those suffering from major depression. However, a significantly greater proportion of the drop-outs were non-cohabiting and current smokers compared to the final study sample.

Table 1 Baseline characteristics of the final study sample and dropouts

At baseline, all 185 patients were diagnosed as suffering from depression according to the ICD-10. According to the SCID-I for DSM-III-R, 135 (73%) had major depressive disorder and 25 (14%) another type of depression. In the final sample (n = 121), these respective proportions were 72 and 12% at baseline, 25 and 5% after 2 years, and 16 and 0% after 6 years.

Among all 185 patients at baseline, the self- reported BDI, SCL and LS had the strongest intercorrelations (r = 0.57–0.75), exceeding those with or between clinician-rated depression scales (HDRS, MADRS). The correlation between the HDRS and MADRS was 0.48. The GAF score had low intercorrelations with all the other scales including the SOFAS (r = 0.26), another scale for functional capacity. These two figures were even poorer for the final sample at baseline (0.38 and 0.14) (Table 2). From 6 months onwards, all the intercorrelations strengthened. The GAF and SOFAS had the strongest correlation at T5 (r = 0.93) followed by those between BDI, SCL and LS, but the objectively-assessed HDRS grouped with them (Table 2.). The same trends were seen with SCL subscales. At T5 the strongest correlations with SCL subscales of depression and anxiety were recorded with the SCL (r = 0.91/0.94) BDI (r = 0.84/0.80) and LS (r = 0.83/0.71) followed by the HDRS (r = 0.55/0.47). All scales had the lowest correlations with the eating disturbance subscale, mainly together with the sleep subscale.

Table 2 Correlations between psychometric scales among study subjects

All subjects in the final sample (n = 121) had had interpersonal treatment contacts due to depression in either psychiatric or primary care settings during D1 (0–6 months). This figure was 90 (74%) in D2 (0.5–1 year), 84 (69%) in D3 (1–2 years) and 56 (46%) in D4 (2–6 years). The mean (SD) number of monthly contacts, based on the total number of months in each period, was 2.2 (1.8) in D1, 2.9 (2.3) in D2, 2.5 (2.0) in D3 and 1.5 (1.5) in D4. At T5, 24 subjects (20%) were receiving treatment contacts. Adequate antidepressant medication with/without interpersonal treatment contacts was verified for 47%/- of subjects during D1, 48%/17% during D2, 40%/11% during D3, 66%/19% during D4, and 58%/17% at T5, respectively. At T5, subjects who had interpersonal treatment contacts (n = 24) (with/without adequate medication) or only adequate antidepressant drug therapy (n = 16) comprised 33% of the final sample. During the follow-up, only 15 subjects re-entered treatment after having had no treatment contacts during one entire interval.

In the final sample, repeated measures analysis of variance revealed a significant (P < 0.001) difference between data collection times in all the psychometric scales. When each follow-up level was compared to all the following levels together (Helmert contrast), the BDI (P < 0.001/0.01/ns/ns) and GAF (P < 0.001/0.001/ns/ns) showed significant recovery in the first two levels, the SCL (P < 0.001/0.001/0.05/ns) and SOFAS (P < 0.001/0.001/0.001/ns) in the first three, and the LS (P < 0.001/0.01/0.01/0.05) and HDRS (P < 0.001/0.001/0.001/0.01) in all four levels. At T1, at least moderate depression indicated by BDI  ≥ 30 or HDRS  ≥ 13 had 69.4 or 73.6% of the sample, respectively. At T5 these proportions were 17.4 or 18.4%, respectively. Most of the subjects had a normal mood (57% had BDI  ≤ 9 and 66% HDRS  ≤ 7), at least an adequate functional capacity (82% had SOFAS > 60 and 80% GAF  > 60) and were not dissatisfied with their life (78% had LS  < 12 with an LS mean of 7.69) at T5.

Recovery was also examined in the final sample in terms of three treatment duration groups (Table 3). Those re-entering interpersonal treatment during the follow-up (n = 15) and one subject without complete treatment data were excluded (n = 105). A significant interaction between time and treatment duration group was found. Recovery differed according to these groups, regardless of the psychometric scale (Fig. 1). The quicker and better the recovery was, the earlier interpersonal treatment was terminated. Those with a slow recovery and long treatment improved during the follow-up, with significant improvement not only in D1 but also in D4, eventually reaching a level of mental health not significantly different from those with more rapid recovery and a shorter treatment, even though there were significant differences between the groups during T2–T4. All the treatment groups were in a significantly better state of mental health at T5 compared to T1, regardless of the psychometric scale. The SOFAS and SCL showed the lowest improvement effect, followed by the BDI and LS, while the clinician-rated GAF and HDRS showed the highest effect size (Table 3). In general, the improvement effect was lowest for the most rapid recovery group, which also had the best baseline scores, but with the SOFAS and GAF the group with the longest treatment had the lowest effect size.

Table 3 The effect size* and means (95%CI) of psychometric scales according to treatment duration** among the final sample (n = 105) during the 6-year follow-up
Fig. 1
figure 1

Mean scores of psychometric scales during the 6 year follow-up of three treatment duration groups

Discussion

Subjective well-being, normal mood and at least adequate functional capacity could be reached among most of the subjects referred to psychiatric care due to depression. However, drug therapy was not fully utilized during the follow-up. The most pronounced recovery took place in the first 6 months [34], but improvement of those with slower recovery continued towards the end of the follow-up with interpersonal treatment contacts due to depression. It eventually reached the levels of mental well-being of those recovering most rapidly. The lowest improvement took place in functional capacity. Throughout the follow-up, recovery was similarly and consistently shown with the self-reported BDI-21, SCL-90 and LS-4, while intercorrelations between clinician ratings were low at baseline.

Depression can be a chronic illness or have recurrent episodes. In this study sample, the wide range of the depression continuum, i.e. from major to subthreshold depression, was represented at baseline according to DSM-III-R (SCID-I). Only a few subjects re-entered treatment based on the assessed intervals. In general, depression has been shown to have high relapse rates [29, 31, 50, 53], but in this study the relapses taking place during each interval (having different time frames) or among the dropouts could not be monitored. There may be also other subgroups behaving differently [61]. However, the main focus of this study was not on determining the precise rate of recovery from depression during the follow-up or the detailed impact of treatment, but on long-term mental health and global well-being and their relationships during the recovery process as assessed by patients and clinicians.

In terms of subjective well-being, psychiatric in-patients have previously shown lower life satisfaction than any other in-patient group [33, 34]. Here, it was shown that with standard psychiatric treatment, outpatients with depression can even approach the levels of life satisfaction in the adult general population (LS mean = 8.8 for all; 8.4 for the healthy; 9.4 for the sick), although this may take time [33, 34]. The health care system appears able to initiate an improvement in global well-being in psychiatric patients, but the utilization of and compliance with drug therapy needs more attention. Recovery may also continue with less frequent interpersonal treatment contacts, but according to our results, even those with pronounced recovery and the earliest treatment termination may need subsequent mental health check-ups. Further research should evaluate the most effective treatment methods to not only reduce the pathological symptoms but also to promote the subjective well-being and mental resources of patients, and what additional benefits might be gained by this approach.

The clinical status of the patients was measured with several psychometric scales and interviews. The self-reported BDI, SCL and LS were highly correlated from the very beginning, unlike the clinician-rated scales. The clinician-rated, widely-used depression scales HDRS and MADRS had a low baseline intercorrelation. Two functional capacity scales were assessed by different clinicians at baseline with some time lag between the assessments, but this may only partially explain their poor baseline intercorrelation. They also had a weak relationship at baseline with depression scales and SCL depression/anxiety subscales, which was not later apparent. The relationship between the HDRS and subjective scales also improved during the follow-up. Thus, some caution may be warranted in the use and interpretation of clinician-rated scales, at least in depressive patients just beginning treatment. The variability of scale scores increases along with recovery in scores, but this affects both subjective and objective scales. Thus, an established patient-doctor relationship may count for more.

Previous studies have shown difficulties in improvement in functional capacity after depression [1, 48]. In this study, especially those with slow recovery and long treatment showed the lowest improvement in functional ability. In general, clinicians rated recovery more positively than patients, but SOFAS showed the lowest effect size for the whole final sample.

The 4-item LS predicts health according to longitudinal studies. It is also linked with personality factors and interpersonal abilities [6, 42], but especially with depression, both in patients and among general populations [34, 3739, 41]. Its high validity in detecting depressiveness in the general population [38] in parallel with screening instruments [7, 8, 21] and its ease of administration is a benefit. According to this study on depressive patients, it is also strongly linked with general psychopathology (SCL) from the very start and with clinician-rated functional capacity (SOFAS, GAF) after the baseline. Furthermore, the three subjective scales (SCL-90, BDI-21, LS-4) seemed to have something shared and essential to all of them throughout the follow-up, but the considerable difference in their number of items favors the LS. Advantageously, the LS also enables the monitoring of subjective well-being after normality has been reached in symptom scales, when well-being may not yet be optimal [10]. Thus, the 4-item LS appears to work well among depressive patients as a global well-being indicator and in assessing their recovery process.

Conclusions

Adequate mental health and global well-being can be reached among depressive patients, but it may take time in treatment. Subjective assessments are reliable in monitoring recovery from depression. The 4-item life satisfaction scale is a global well-being indicator and a valid treatment outcome measure.