Introduction

Depression and anxiety disorders and/or symptoms are commonly reported after spinal cord injury (SCI). Despite a conceptual distinction between depression and anxiety, clinically differentiating the two constructs has proven difficult, as people who experience anxiety are often depressed as well.1, 2 In a sample of 394 primary care patients, Mergl et al. (2007) found that depression without comorbidity occurred significantly less than expected by chance.2 Further, a high comorbidity odds ratio (6.25) between depressive and anxiety disorders was found, leading to the conclusion that depression and anxiety comorbidity occurs more often than expected.2 For this reason, it is important to assess both depression and anxiety, disorders or symptoms, in tandem.

Diagnosis of depression and anxiety disorders is typically conducted through structured interviews based on the Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM-IV).3 A diagnosis of a depressive disorder according to the DSM-IV requires a minimum number of symptoms presenting for at least 2 weeks. Major depressive disorder (MDD), for example, is diagnosed based on the presence of a depressed mood and/or a loss of interest, in addition to three or four of significant weight loss, insomnia or hypersomnia, psychomotor agitation or retardation, fatigue, feelings of worthlessness, lack of ability to think or concentrate, and recurrent thoughts of death or suicide.3 Similarly, anxiety disorders are diagnosed based on a constellation of symptoms with a common feature of inappropriate anxiety. Anxiety disorders may include symptoms such as increased heart rate, tensed muscles, fear of dying, inability to relax, irritability and trouble in concentrating. To be diagnosed with either a depression or anxiety disorder, the symptoms must be independent of other medical conditions.3

As the DSM-IV diagnostic process is time consuming for both clinicians and patients, self-report instruments are frequently used as devices to identify the possible presence of a depression or anxiety disorder or to assess the severity of symptoms. Importantly, such instruments serve to alert the professional to the need for further clinical evaluation or treatment. Accordingly, both clinicians and researchers are advantaged by instruments that can offer an efficient and effective means to screen for depression and anxiety disorders or determine levels of symptom severity.

Depression and anxiety measurement issues in SCI

Although many self-reporting instruments are available, the application of generic instruments with populations with SCI is not without concern. The use of generic instruments that are neither reliable nor valid among populations with SCI are a likely source of bias. As an example, in a review of 300 randomized controlled trials, Marshall et al. (2000) reported that treatment for schizophrenia was 36% more likely to be found effective if an unpublished scale was used.4 The authors also determined that one-third of the claims of non-drug treatment being preferred over the control would not have been made if published, reliable and valid scales had been used. Despite the findings being specific to schizophrenia scales, it exemplifies the need for reliable and valid scales specific to certain populations, such as SCI.

Another issue with the use of generic instruments is that people with disabilities who are undergoing rehabilitation have different needs and problems from those of the general population. Without accounting for such differences, measurement problems will persist. For example, many measures include somatic symptoms of depression and anxiety disorders, however, symptoms of weight and energy loss, loss of appetite and disruptions in sleep cycles are also commonly reported after SCI. Anxiety symptoms such as increased blood pressure, sweating and rapid heart rate are also present during episodes of autonomic dysreflexia among individuals with SCI. Simply excluding somatic questions from generic instruments is problematic, potentially altering the properties of the questionnaire, and possibly removing true indicators of depression or anxiety disorders. Alternatively, if somatic criteria are maintained in the scales, there is the potential for these measures to overestimate the prevalence of depression and anxiety.

Finally, another concern with the current instruments in use revolves around the differing screening criteria and timing of assessments. Such differences have raised questions as to the definitions used to assess depression and anxiety, and the concern that not all instruments may screen or measure the same construct. Such concerns limit the comparison of results, and lead to variability in the estimates of depression and anxiety. Nonetheless, a recent review of psychological morbidity and SCI5 found that 206 to 43%7 of people with SCI are at risk of having a depressive disorder during rehabilitation, and 118 to 60%9 are at risk of having raised depressive symptoms when living in the community. The same review also established that anxiety disorders are more prevalent in populations with SCI, with estimates ranging from 1310 to 44%11 as higher levels of anxiety symptoms. Although the studies in this review commonly reported higher rates of depression and anxiety among populations with SCI than that in the general populations, the comparison of findings are limited due to the use of different instruments.

Establishing accurate measurement of depression and anxiety with SCI

Establishing psychometric properties of validity and reliability of generic depression and anxiety instruments within populations with SCI before their use is critically important. Such evidence will determine the utility and help to avoid possible sources of bias. Systematic measurement also provides important information that enables clinicians to identify individuals who may require further evaluation, benefit from certain therapies, evaluate whether treatments are effective, and monitor progress. Such analyses of the various instruments used among populations with SCI will either justify the need for the development of a new instrument, should existing ones have poor reliability and validity, or identify instruments that are working properly and as intended. Agreement on the use of common instruments are beneficial as agreement will increase the generalizability of findings and allow for the comparison of outcomes, which has both clinical and research implications.

In a recent review of depression instruments by Kalpakjian et al. (2009),12 24 studies were found that reported psychometric data for seven screening and/or symptom-severity measures. A range of studies that provided psychometric data were included and classified into five levels. The levels ranged from level 1 studies, being the most important with primary purposes of evaluating the psychometric properties of depression measures, to level 3 studies with depression as a secondary outcome and not the primary focus, to level 5 studies estimating the prevalence in an SCI sample.12 Findings were that reliability was good to excellent, validity was limited to concurrent, construct and clinical utility, and that the instruments were comparable in terms of internal consistency, factor structure and clinical utility.12 It was concluded that there is insufficient evidence to recommend the use of one instrument over another. Despite being a comprehensive study, the criteria used to evaluate and compare the psychometric properties of reliability and validity is unclear. It is therefore difficult to identify and differentiate each instrument's strength of psychometric evidence.

As systematic reviews conducted independently of each other on a similar topic often have methodological differences, inconsistent findings may reveal areas where further study is needed to resolve differences, whereas similar findings will provide valuable information to both clinicians and researchers.13 Therefore, the purpose of this independent review is to identify the depression and anxiety screening devices and symptom-severity scales that have had their psychometric properties assessed among populations with SCI, and to systematically evaluate the properties according to pre-established evaluation criteria.

Methods

Search strategy

The PubMed, CINAHL, Embase, Medline HaPI, Psycinfo and Sportdiscus e-databases were searched for papers published between 1949 and July 2008, reporting on depression and anxiety instruments specific to populations with SCI. Additional searching was conducted by reviewing the references of papers obtained from the electronic search. The keyword spinal cord injury and its related terms, paraplegia, quadriplegia or tetraplegia, were used in conjunction with the psychometric terms, validity, reliability, responsiveness, reproducibility of results and data collection. The search was completed by combining these terms with the names and abbreviations of familiar instruments used to screen for depression and anxiety along with the key words depression, anxiety, depression measures and anxiety measures.

Inclusion criteria

To be included in this review, the instruments had to satisfy several requirements: (i) a depression and/or anxiety paper in which evaluation of the psychometric properties was the primary purpose (that is, level 1 papers based on the classification of Kalpakjian et al. (2009)); (ii) a population with SCI (⩾18 years of age); (iii) SCI-specific data; (iv) to have been published in a peer-reviewed journal; and (v) to have been written in English.

Selection process

The selection of articles used a multi-step process to ensure the inclusion of all relevant articles. First, the titles and abstracts of articles found through the electronic search were reviewed. Any study that referred to SCI, depression and/or anxiety in the title or abstract was imported to the online reference database manager, RefWorks.14 Second, after deleting nonrelevant and duplicate papers, a research assistant and the primary author reviewed the titles and abstracts of all articles. Third, resulting articles were printed and re-reviewed to ensure that the paper was a psychometric paper on depression and/or anxiety instruments, and evaluated among individuals with SCI. Finally, discrepancies in the retrieved studies were resolved through discussion with another author.

Data extraction and analysis

Consistent with the Spinal Cord Injury Rehabilitation Evidence15 process, data extraction methods and standards for this review were based on the study by Fitzpatrick et al. (1998)16 and Andresen (2000).17 Fitzpatrick et al. (1998), provided the methods and standards for data extraction and extraction form. Specifically, extracted data included reliability, validity, responsiveness, advantages and limitations of the instrument, interpretability of the scores, acceptability in terms of respondent burden, and feasibility in terms of administrative burden. The standards for summarizing the quality of the instruments were adapted from Andresen's (2000) overview of criteria for assessing instruments, and can be found in Table 1 along with the criteria used to assess rigor. In using these criteria, the psychometric properties were assigned a strength of evidence of either ‘excellent’, ‘adequate’, ‘poor’ or ‘NA’ if there was insufficient information. For instruments with more than one report for a specific property, a range of evidence is given. Finally, rigor, or the thoroughness in the evaluation of the psychometric properties, was rated. If at least two studies corroborate each other's findings, rigor is ‘excellent’, regardless of the strength of evidence. A rating of ‘adequate’ is given if a single study has ‘adequate’ to ‘excellent’ strengths of evidence, whereas rigor is considered ‘poor’ if only a single study with a ‘poor’ strength of evidence is available.

Table 1 Criteria for rating psychometric properties and clinical utility17

For purposes of this review, we include instruments that screen for disorders and/or assess symptom severity. Instruments are defined as symptom-severity scales if the response scale asks ‘how much’ or the frequency of symptoms, and screening instruments are defined as those that screen for a specific disorder and/or if cutoff scores are provided to indicate the need for further evaluation.

Results

In our literature search of seven electronic databases, 577 articles met the search criteria, and 13 papers reporting on 13 instruments were found that met the inclusion criteria. The instruments included in this paper are the Beck Depression Inventory (BDI),18 the Brief Symptom Inventory (BSI),19 the Center for Epidemiological Studies Depression Scale (CESD-20 and CESD-10),20, 21 the Depression, Anxiety, Stress Scales 21 (DASS-21),22 the General Health Questionnaire,23 the Hospital Anxiety and Depression Scale (HADS),24 the Ilfeld Psychiatric Symptom Inventory (Ilfeld-PSI),25 the Medical Emotional Distress Scale (MEDS),26 the Patient Health Questionnaire (PHQ-9 and PHQ-9-Short),27, 28 the Symptoms Checklist-90-Revised (SCL-90-R) Research Subscales29, 30 and the Zung Self-Rating Depression Scale (SRS).31 Table 2 provides a brief description of the instruments and Table 3 describes the studies.

Table 2 Description of instruments
Table 3 Included studies

Instruments specific to depression

The PHQ-9 and the PHQ-9-Short are the two instruments that screen and assess symptom severity for MDD. The PHQ-9 is the only instrument that parallels the DSM-IV criteria on which clinical diagnoses are based; responses to the questions represent frequency of experiences over the last 2 weeks, all other instruments only refer to the last week. Reliability of the PHQ-9 is excellent (α=0.87).32 Table 4 summarizes the reliability data for the PHQ-9 in addition to other instruments.

Table 4 Depression and anxiety instruments—reliability in SCI

Evidence in support of the PHQ-9's validity range from adequate (r=−0.50 and −0.51) to excellent (r=0.62), when compared with the the SF-36-subjective health question, the Satisfaction With Life Scale, and the greater difficulty with daily role functioning component of the DSM-IV.32 Sensitivity and specificity data for the PHQ-9 are for individual items on the tool. Depressed mood, anhedonia, and feelings of failure in addition to two somatic symptoms of disturbed sleep, and decreased energy, are highly sensitive indicators of MDD.32 These findings suggest that both psychological and somatic symptoms are indicative of MDD among individuals with SCI.

The PHQ-9-Short comprised three items from the PHQ-9, including the items referring to little interest or pleasure in doing things, feeling down, depressed, or hopeless, and feeling bad about yourself—or that you are a failure of have let yourself or your family down. These three items have a relative efficiency of 0.66 compared with the PHQ-9.28 When using a cutoff score of 3, specificity was 93% and sensitivity was 87%.28 A cutoff score of 4 yielded a specificity of 95% and sensitivity of 82%.28 The PHQ-9-Short excludes questions pertaining to somatic symptoms of depression. Table 5 presents the validity data for all instruments.

Table 5 Depression and anxiety instruments—validity in SCI

The BDI, CESD-20, CESD-10 and the SRS are instruments that assess the severity of depressive symptoms and have cutoff scores that may be used to screen for depressed moods. Four of the 20 questions on the CESD-20 are reverse scored and 10 of the 20 questions on the SRS are worded negatively. Somatic symptoms of depression are included in these instruments. The reliability data for these instruments are excellent (α=0.89, 0.91, 0.86 and 0.81, respectively).33, 34, 35 Only the CESD-20 and CESD-10 have 2-week test–retest reliability data (ICC=0.87 and 0.85, respectively).34

The validity of the CESD-20 and CESD-10 was assessed against eight scales on the SF-36 and the Visual Analog Scale-Fatigue (VAS-F). Results range from poor (r=−0.27) to excellent (r=−0.75) for the CESD-20, and from adequate (r=−0.38) to excellent (r=−0.71) for the CESD-10.34 Validity in support of the SRS ranges from adequate to excellent (r=0.52, 0.71 and 0.78) as per the results, when compared with the BSI, MEDS and Ilfeld-PSI depression subscale.26, 35, 36

Elevated cutoff scores have been identified for the BDI and CESD-20 to account for the inclusion of somatic symptoms and the possibility of overestimating depressed moods among populations with SCI. When using a cutoff score of 27, the BDI has a sensitivity of 50% and specificity of 100%, and a sensitivity of 83.3 and specificity of 90.8 when using a cutoff score of 18.33 A CESD-20 (Thai version) cutoff score of 19, when used with a Thai population with SCI, was found to have sensitivity of 80% and specificity of 69.8%.37 The SRS has a sensitivity of 86% and specificity of 61% when using a cutoff score of 55.35

The MEDS is a measure that is used to assess the severity of depressive symptoms and has excellent internal consistency (α=0.92).26 The validity evidence in support of the MEDS are excellent (r=0.65, 0.71, −0.75 and 0.77) when correlated with the Hopelessness Scale, SRS, Rosenberg Self-Esteem Scale and SCL-90-R depression subscale.26

Instruments with depression and anxiety subscales

The BSI, DASS-21, HADS and Ilfeld-PSI are instruments that have subscales to screen for both depression and anxiety symptoms, and to assess symptom severity. Although the GHQ-28 has depression and anxiety subscales to assess symptom severity, the total score is used to screen for psychiatric disorders. The SCL-90-R Research Subscales has subscales to assess symptoms for both depression and anxiety, but psychometric data are limited to the depression subscales, with poor reliability (α=0.62) for the somatic symptoms subscale and excellent reliability for the cognitive symptom subscale (α=0.89).38 The DASS-21 is the only instrument with subscales excluding reference to somatic symptoms.

The internal consistency evidence of both the BSI depression and anxiety subscales are excellent (α=0.87 and 0.88 for the depression subscale, and 0.85 for the anxiety subscale).35, 39 The anxiety subscale on the HADS also has excellent internal consistency (α=0.85), whereas evidence for the depression subscale is adequate (α=0.79).39 The GHQ-28 has excellent internal consistency (α=0.82).40

The BSI depression subscale's validity ranges from adequate to excellent, when compared with the SRS (r=0.52)35 and DASS-21 (0.70).41 Similarly, its anxiety subscale has excellent results when assessed against the DASS-21's anxiety subscale (r=0.61), and adequate results when compared with the SRS (r=0.38).35, 41 The HADS anxiety subscale has adequate validity, whereas that of the depression subscale is excellent when correlated with the Life Satisfaction Questionnaire (r=−0.42 (A) and −0.66 (D)).42 The Ilfeld-PSI depression subscale has excellent validity evidence when compared with the SRS (r=0.78), and adequate evidence for the anxiety subscale (r=0.59).36 When compared with the Clinical Interview Schedule, the GHQ-28 has excellent validity (r=0.83).40

Two studies report the sensitivity and specificity data for the BSI depression subscale and one for the anxiety subscale. When using an elevated t-score cutoff of 65, Tate et al. (1993) found the depression subscale to have a sensitivity of 57% and specificity of 87%.35 When using the cutoff scores recommended for SCI,43 Mitchell et al.41 found a sensitivity of 14% and specificity of 97%, and sensitivity of 57% and specificity of 82% when using the traditional cutoff scores. The BSI anxiety subscale has a sensitivity and specificity of 86 and 88% when using the traditional cutoff scores, and 43 and 100% when using the elevated cutoff scores.41 The DASS-21 depression subscale has a sensitivity of 57% and specificity of 76%, and that of the anxiety subscale is 86 and 64%.41 The GHQ-28 has a sensitivity and specificity of 81 and 82% when using a cutoff score of 4.40

In terms of administrative and respondent burden, the number of items on each instrument ranges from 3 (PHQ-9-Short) to 60 (MEDS), and most of them have reported completion times of ⩽10 min. The MEDS may take up to 45 min depending on the level of distress of the individual. The Ilfeld-PSI has a more complex scoring system, and the interpretation of scores on several instruments vary from comparing raw scores with cutoff scores to converting raw scores to t-scores (BSI) or percentile scores (DASS-21). Table 3 provides an overview of the instruments along with the scoring system, and Table 6 provides a summary of all ratings, including rigor.

Table 6 Summary of ratingsa

Discussion

Depression and anxiety disorders and the effects of severe symptoms can be debilitating conditions for individuals with recent or long-standing SCIs. As diagnoses of depression and anxiety disorders are time consuming and costly, access to quick and inexpensive instruments to screen for disorders or assess the severity of symptoms to determine the need of additional evaluation is invaluable. However, the use of such instruments is predicated upon an establishment of an instrument’s psychometric properties within specific populations. The purpose of this review was therefore to identify depression and anxiety screening and symptom-severity measures in current use among populations with SCI, according to our inclusion criteria, and consider their psychometric properties according to pre-established criteria.

This review identified 13 papers with a specific focus on assessing the psychometric properties of 13 depression and anxiety instruments that have been used with populations with SCI. Reliability data were available for 10 instruments, and validity results were available for 12 instruments. Depending on the instrument, evidence spanned the spectrum of evaluation criteria varying from poor to excellent. Responsiveness data were not reported in any of the studies.

The variability of diagnostic criteria or symptoms and time periods utilized in the instruments is an issue. Such variability result in difficulties in applying the findings to established classification systems of mood or anxiety disorders (for example, DSM-IV or International Classification of Diseases, ICD).32, 44 The PHQ-9 is the only instrument designed to parallel the DSM-IV symptom and duration criteria for diagnosing MDD. All other instruments assess symptom severity over the last week and use cutoff scores to determine the need for further evaluation.

Both the PHQ-9 and PHQ-9-Short screen for MDD and assess symptom severity. In the study by Bombardier et al. (2004), the sensitivity and specificity results for individual items predicting MDD indicate that somatic symptoms of depression, such as appetite change, sleep disturbance and poor energy, are predictive of MDD and should be included in depression scales.32 In a subsequent study, Krause et al. (2008) found an evidence in support of an underlying somatic factor, in addition to a general factor, with the PHQ-9.45 The somatic factor comprised three items, including appetite change, sleep disturbance and poor energy. These results suggest that symptoms of depression may not be different in the population with SCI than in the general population, and that issues related to overestimating the prevalence of depression are due to SCI sequelae.

Overestimating the prevalence of symptoms is an issue that has affected the results of several instruments included in this review. Radnitz et al. (1997),33 found three items on the BDI to be poor discriminators of depression in an SCI sample, resulting in artificially inflated scores. They therefore recommend higher cutoff scores to correct for the similarities between somatic symptoms and SCI sequelae. Similarly, Kuptniratsaikul et al. (2002)37 has recommended a cutoff score of 19, versus the traditional cutoff score of 16, when the CESD-20 is used with populations with SCI. However, as the Thai version of the CESD-20 was used in this study, it is unclear whether the results are applicable to the original CESD-20. The BSI depression and anxiety subscales have also been found to overestimate the prevalence of depression and anxiety symptoms.43 Heinrich et al. (1994)43 developed cutoff scores specific for use with populations with SCI. In a subsequent study, Heinrich and Tate (1996)39 developed depression and anxiety subscales specific for use with individuals with SCI. They recommend the SCI-specific subscales for use in clinical settings, or using the SCI-specific cutoff scores when administering the traditional subscales.

In considering direct comparisons between instruments, Mitchell et al. (2008) used both the BSI traditional and SCI-specific cutoff scores, and compared them with the DASS-21 depression and anxiety subscales. Prevalence of depression and anxiety were higher on the DASS-21, despite the DASS-21 excluding many somatic symptoms not relevant to populations with SCI.41 The DASS-21 was as sensitive as the BSI but had lower specificity to identify either depression or anxiety.41 In another direct comparison between the BSI depression scale and SRS, the SRS was found to be superior, due to a higher level of sensitivity, in identifying people with SCI who were at risk of being depressed.35 Campagnolo et al. (2002)36 compared the efficacy of the Ilfeld-PSI to the SRS in individuals with SCI. They found that the Ilfeld-PSI poorly discriminates somatic symptoms unrelated to depression, and therefore is not an adequate instrument for use in populations with SCI. The SRS, however, showed acceptable discriminative ability (AUC=0.76), warranting its use among populations with SCI.36

The instrument that is closest to being specific for SCI is the MEDS. It is designed to assess the severity of depressive symptoms for use in populations with physical disability or illness. It excludes somatic symptoms that could be the result of the condition such as SCI. Although the MEDS has excellent reliability and validity, according to our criteria, it possesses the greatest administrative burden because it has the most items and can take as long as 45 min to complete, depending on the amount of distress experienced by the patient. At present, there is only one psychometric-specific study that has assessed its properties among populations with SCI, and its use among these populations is very limited.

Limitations

As studies were excluded if they were not published in English or did not have a specific focus to assess the psychometric properties of depression and anxiety instruments used with populations with SCI, evidence is potentially missing that could influence both the psychometric ratings of reliability and validity, and rigor results.

Conclusion

The current reliability and validity findings of depression and anxiety instruments range for the most part from adequate to excellent. Given the effort to develop cutoff scores specific to SCI populations, there does not appear to be a clear indication for the development of SCI-specific instruments at this time. At the same time, however, as psychometric properties of one instrument do not clearly stand above the others, it is difficult to recommend the use of one over another. We can, however, mention that as the Ilfeld-PSI has poor discriminative ability, it should not be used with populations with SCI until further evaluation proves it useful. If we consider the few direct comparisons between some of the measures, the SRS was found to be superior over both Ilfeld-PSI and the BSI, and the BSI was found to be as sensitive as the DASS-21, but with higher specificity to detect both depression and anxiety. The administrative and respondent burden are similar for most instruments except for the MEDS, which comprised more number of items and has the potential to take 45 min to complete. As a result, when selecting an instrument, the clinician must consider the specific purpose, in addition to other clinical considerations within the context of their practice and intended use (for example, hospital, outpatient clinic, drug/alcohol treatment programs).

In this review, the SRS and BSI were the only instruments with more than one paper reporting psychometric evidence. Therefore, all instruments require additional investigation in SCI samples for reliability and validity purposes, and if they are to be used to evaluate outcomes related to treatment or change over time, responsiveness data should be investigated as well. Longitudinal studies and/or appropriately configured clinical trials are required. Administering the instruments in tandem with each other and with clinical interviews for diagnostic purposes would provide valuable information, as would comparison of results to normative data specific for individuals with SCI.

More data on the psychometric properties of the depression and anxiety instruments used with populations with SCI are important. Further evidence will lead to either determining the need for the development of new instruments specific for populations with SCI, or identifying and resolving the problems in the instruments currently used, potentially leading to agreement on the use of common instruments with populations with SCI.