Introduction
Research on treatment of depression has increasingly focused on a symptom-specific approach, often targeting residual symptoms that persist after other symptoms have improved [
1‐
6]. Residual symptoms have been shown to predict relapse of depressive episodes [
4,
7‐
10], and they contribute to functional impairment even after other depressive symptoms have improved following pharmacological or psychological treatment [
8,
9,
11‐
13]. Much of the research on residual symptoms has focused on fatigue, which is one of the most common symptoms of major depressive disorder [
14‐
16]. Several studies suggest that fatigue is frequently a residual symptom, persisting in roughly 20 to 38% of patients who have remitted following pharmacological treatment or psychotherapy [
9,
17,
18]. There is a substantial and growing body of research focusing on fatigue associated with depression because of its prevalence, its resistance to treatment, and its association with impairment in social and work functioning [
19‐
23].
Despite the clinical importance of fatigue associated with depression, there was no available patient-reported outcome (PRO) instrument designed specifically to assess fatigue and its impact among patients with depression [
24]. Therefore, the Fatigue Associated with Depression Questionnaire (FAsD) was recently developed to address this gap in assessment tools for patients with depression [
25]. Depression symptom measures often include an item assessing fatigue [
26‐
28], but they do not provide a thorough multidimensional assessment of this construct, and they are therefore unlikely to adequately capture fatigue and its impact. In focus groups conducted when drafting the FAsD, patients reported a range of the 13 items of the FAsD were designed to capture a more thorough spectrum of fatigue experience and impact that is important to patients with depression.
Generic instruments, designed to be completed by respondents regardless of medical or psychiatric condition, are available for a more detailed assessment of fatigue [
29,
30]. However, there is growing awareness that PRO instruments must demonstrate content validity and good measurement properties in the specific target population in order to be appropriate for assessment of treatment outcomes [
31,
32], and the generic fatigue measures do not meet these standards for patients with depression. For example, although the FAsD has been shown to correlate strongly with the commonly used generic Brief Fatigue Inventory (BFI), there are important differences between the two measures in content validity. Whereas the BFI was designed for use in cancer patients [
29], the FAsD was developed based on direct input of patients with depression as well as clinicians who treat depression [
25]. As a result of this careful approach to establishing content validity, the FAsD items assess the specific types of fatigue and its impact that are likely to be experienced by patients with depression, and the items use words shared by patients during qualitative research. Therefore, unlike the generic instruments, the FAsD has established content validity in the target population, and the appropriate wording and content of the items for this specific population may lead to greater measurement precision. For example, the specific relevance of the items to this population led to the clear two-factor model based on a factor analysis conducted to derive FAsD subscales [
25]. In this analysis, there was a clear distinction between items assessing experience and items assessing impact. Therefore, the FAsD allows for specific assessment of the impact of fatigue, in contrast to the BFI, which has been shown to yield only a global score supported by a strong single-factor model fit [
29]. In sum, although there is clearly some overlap between the FAsD and generic instruments assessing fatigue, no other instrument has demonstrated content validity for the detailed assessment of this clinically important symptom and its impact among patients with depression [
25].
The FAsD was developed following recommendations in the Food and Drug Administration PRO Guidance Document [
31]. The items were initially drafted and refined based on literature review and qualitative research with clinicians and patients diagnosed with depression. Then, a psychometric validation study was conducted to identify subscales and examine reliability and validity of the measure. In this validation study, the FAsD demonstrated good factor structure, internal consistency reliability, test–retest reliability, and construct validity [
25]. The purpose of the current study is to examine the questionnaire’s responsiveness to change and identify a responder definition that will assist with interpretation of treatment-related change.
Responsiveness is the extent to which a health status measure accurately detects change in a patient’s condition over time [
32‐
34]. Demonstration of this measurement property is necessary for a PRO measure to be considered fit for the purpose of “identifying differences in scores over time in both individuals and groups who have changes with respect to the measured concept” [
31]. Tests for responsiveness typically include effect size statistics as well as correlations of change scores with change in previously validated measures or indicators of the concept of interest. Responsiveness testing may also include comparison of change scores among patient subgroups categorized by an indicator of change in the relevant concept, such as patients’ or clinicians’ perceptions of change.
Once responsiveness has been demonstrated, establishing guidelines for the interpretation of PRO change scores can assist in recognizing when an important shift in patients’ health status has occurred. This step of instrument development was often characterized as identifying the minimally important difference (MID). However, the 2009 Food and Drug Administration (FDA) PRO Guidance has eliminated the term MID from their directives for PRO development.
Instead of the MID, the FDA now requests a
responder definition that is “the individual patient PRO score change over a predetermined time period that should be interpreted as a treatment benefit” when a PRO instrument is used in clinical trials [
31]. The FDA recommends that the responder definition should be determined empirically through anchor-based methods using data from the target population, with supportive evidence from distribution-based statistics. The anchors, which should be easier to interpret than the PRO measure, may be clinical indicators, patient ratings of change, or clinician ratings of change. Once a responder definition is ascertained, the percentage of responders achieving change at or beyond this threshold in each treatment arm of a clinical trial can be compared to facilitate the evaluation and communication of PRO results to patients, physicians, and providers.
Discussion
Results of all analyses indicate that the FAsD was responsive to change. The total score and both subscales demonstrated statistically significant improvement from Visit 1 to Visit 2, with effect sizes suggesting that these changes were in the moderate to large range. In addition, FAsD change scores discriminated among groups of patients who differed by degree of improvement in patient- and clinician-reported fatigue and depression symptom severity.
Current results also provide an initial indication of a responder definition that may be used when interpreting treatment-related change in the FAsD. Using the 20 patients with ratings of a small but important improvement in fatigue after six weeks of treatment, the mean change scores in this study were −0.66, −0.51, and −0.59, respectively, for the FAsD experience subscale, impact subscale, and total score. These mean change scores are likely to be conservative estimates of a responder definition, because they exceed the magnitude of values provided by supportive analyses, including mean change scores among 17 patients viewed by clinicians as having a small but important change in fatigue, as well as the half standard deviation and the SEM of these scales. Based on these results and the magnitude of a state change in the FAsD experience subscale, a responder definition of 0.67 is recommended for this subscale (i.e., a score decrease of at least 0.67 on this subscale). This threshold corresponds to a shift of four response options across the six items in the subscale. Similarly, the responder definition for the FAsD impact subscale was identified as 0.57, which corresponds to a shift of four response options across the seven subscale items. Finally, the responder definition for the FAsD total score was identified as 0.62, which corresponds to a shift of eight response options across the 13 items.
The strong correlations between FAsD change scores and BFI change scores suggest that these two questionnaires capture change in similar aspects of fatigue. However, there are two key differences between the questionnaires. First, the FAsD was developed and validated specifically for patients with depression, suggesting that it may be uniquely fit for use in this target population. In contrast, the BFI was designed for use in cancer patients, with a general structure derived from the Brief Pain Inventory [
29]. Second, the FAsD subscales provide separate assessments of fatigue experience and impact, whereas the BFI yields only a global score [
29]. In qualitative research conducted when developing the FAsD, patients with depression have reported that fatigue has a powerful impact on multiple aspects of their lives [
25], and the FAsD impact scale was designed to quantify this impact. Therefore, the FAsD has advantages over the BFI for studies examining change in fatigue among patients with depression. Furthermore, although correlations with the BFI are strong (0.73 ≤
r ≤ 0.80), these coefficients suggest that the BFI explains only 53% of the variation in the FAsD subscales and 64% of the variation in the FAsD total score. Therefore, the FAsD captures unique aspects of fatigue that are not captured by the BFI in this population.
One limitation of the current study is that patients received treatment in naturalistic clinical settings, rather than in a controlled clinical trial context. Although all patients were required to receive a new antidepressant treatment within 7 days of study enrollment, it is likely that many aspects of the treatment experience varied among the seven clinical sites, as well as among clinicians at each site. Therefore, the generalizability of the current results to the clinical trial context is not known. Another limitation is that the current sample size is not large enough to examine FAsD measurement properties within subgroups of patients categorized based on their specific pharmacological treatment. Patients in the current study received a wide range of pharmacological treatments. Some of these medications may have the potential to exacerbate fatigue, while others may have the potential to reduce fatigue, and it is possible that FAsD scores were influenced by these treatments. Nonetheless, these results support the use of the FAsD in studies examining change in fatigue, and the FAsD may be even more responsive to change in a controlled trial with a standardized treatment approach.
Another factor that could have affected the results is the missing data at Visit 2. Of the 119 patients who were enrolled in the study, 23 were excluded from the analyses either because they did not attend Visit 2 (n = 18) or they attended Visit 2 outside the required window of 42 ± 7 days after Visit 1 (n = 5). Because the goal of this analysis was to examine change in any instrument over time, it was essential to have data at a minimum of two time points. Therefore, no data were imputed for the missing Visit 2 values. It is possible that the 23 excluded patients could have had more severe symptoms or less improvement on average than the 96 included patients. However, although this potential difference between included and excluded patients could affect the evaluation of treatment outcomes, it is unlikely to have a substantial impact on the current analysis that focused on longitudinal instrument performance and ascertaining the responder definition to identify individuals with a treatment benefit. Because the 96 included patients demonstrated improvement in depression and fatigue, their data are likely to be sufficient for evaluation of FAsD responsiveness and responder definition.
In the current study, the responder definition was based primarily on patients who reported a small but important change. However, other methodological approaches are possible. For example, there may be situations when it is preferable or necessary to use clinicians’ ratings, rather than patients’ ratings, as the primary anchor of change [
45]. Current results indicate that clinicians and patients may have different perspectives on meaningful change. In the current sample, 20 patients reported “small but important” change, compared with only 17 patients who had this rating of change from clinicians. The mean FAsD change scores were lower for the 17 patients classified to this change group by clinicians than for the 20 self-classified patients. These findings suggest that a clinician-based approach could yield a different responder definition than a patient-based approach. In addition, “small but important” may not be the optimal degree or description of change to select a responder. For some PRO instruments, perhaps a patient-reported “moderate improvement” response would be a more appropriate criterion for determining the responder definition.
Almost all results of the current analysis followed logical and expected patterns, with the exception of some FAsD change scores presented in Table
4. However, the unexpected results only occurred in the smaller groups who reported becoming worse during the study (
n ≤ 6) and are likely to be a function of the small group sizes. All groups of larger size (i.e., 16–30 patients categorized based on patient perception; 15–24 patients categorized based on clinician perception) followed logical patterns, with FAsD change scores of direction and magnitude that were entirely consistent with patient-reported and clinician-reported perception of change in fatigue. Future research with larger sample sizes may provide stronger support for the use of the FAsD to assess change over time.
When considered along with previous analyses demonstrating factor structure, reliability, and validity of the FAsD, current findings suggest that the FAsD is a useful measure of fatigue for studies focusing on treatment of depression. Measures such as the FAsD that allow for a detailed assessment of individual depressive symptoms are essential tools for developing a symptom-specific approach to treatment [
3,
4]. By administering symptom-specific PRO measures, researchers may examine the effects of medications and other interventions on individual symptoms that are particularly relevant for some patients.