Introduction
The Depression Anxiety and Stress Scales (DASS) is a 42-item self-report instrument that was developed to improve the discrimination between depression and anxiety (Lovibond and Lovibond
1995a,
b). The DASS is widely used and several psychometric studies have shown the internal consistency, convergent/divergent validity and factorial structure to be satisfactory (Lovibond and Lovibond
1995a; Brown et al.
1997; Page et al.
2007). Despite abovementioned work, the dimensionality, discriminatory ability and item-functioning of the DASS remain incompletely understood. The previous work has only employed classical psychometric analyses that provide limited information about these important measurement characteristics (e.g., Sijtsma
2009). Fortunately, these aspects can be effectively investigated with modern psychometric methods based on item response theory (IRT; Embretson and Reise
2000). In addition to providing information about the functioning of scales and items within an instrument, IRT allows for deeper investigations of the relationships between scores on one measurement scale with scores on another scale. IRT-based linking, for instance, can be used to map scores on two separate scales that are designed to measure similar constructs (e.g., depression) onto a common underlying severity dimension (Kolen and Brennan
2004; Orlando et al.
2000; Wahl et al.
2014). This mapping provides valuable information about two scales’ relatedness in terms of measurement range and discriminative properties, and can be used to evaluate if a measure has the properties needed for administration in specific target groups. Unfortunately, IRT work has only been conducted with the shorter DASS-21 (Shea et al.
2009; Parkitny et al.
2012) but, to our knowledge, not with either a paper-and-pencil or internet-administered version of the full-length DASS.
Another subject that has received relatively little attention in the literature is the psychometric quality and measurement characteristics of the DASS when administered via the internet. Some classical psychometric work that was conducted with an internet-administered version of the full-length DASS (Zlomke
2009), showed that the scales had good internal consistency (alpha = 0.93–0.95). However, modern psychometric (i.e. Rasch) analyses have only been conducted with the internet-administered DASS-21 (Shea et al.
2009). An extensive IRT-based study of the full-length internet-administered DASS could give more insight into the potential usefulness and added value of the instrument for large-scale and low-cost mental health research (e.g., Coles et al.
2007; Naglieri et al.
2004; Gosling and Mason
2015). Several advantages of online mental health assessments are: (a) the lower rates of socially desirable responding and decreased social anxiety (e.g., Joinson
1999), (b) the possibility to include those otherwise unable or unwilling to visit a research site (internet samples tend to be substantially more diverse than conventional samples), and (c) the potential for using (computerized) adaptive testing to shorten assessment time and personalize measurements (Gibbons et al.
2008; Buchanan
2002; Gosling and Mason
2015). Ideally, the psychometric characteristics of internet-administered versions of instruments should be investigated with dedicated studies, as findings for paper-and-pencil versions of questionnaires do not necessarily generalize to internet-administered versions (Buchanan
2002).
The current study addresses the above described issues by evaluating the classical and modern psychometric properties of an internet-administered Dutch version of the DASS in a group of population-dwelling Dutch adults (N = 7972). First, preliminary classical psychometric analyses were conducted (internal consistency and convergent/divergent validity). Next, IRT was used to investigate each scale’s measurement properties (i.e. discriminative ability; range of measurement). To gain more insight into the meaning and functioning of the DASS depression scale in the context of more broadly-defined clinical depression, IRT-based linking was used to place the DASS depression scores onto a common scale with scores on the Quick Inventory of Depressive Symptomatology (QIDS) that is conceptually closer to the clinical definition of major depressive disorder (MDD; Diagnostic and Statistical Manual fifth edition) and includes a broader set of clinically relevant criterion symptoms (i.e. several somatic/vegetative symptoms and suicidality). These analyses were used to gain some insight into the extent to which the DASS depression-scale scores actually capture severity variations in clinically-defined depression severity.
Discussion
This paper presented an investigation of the psychometric properties of an internet-administered version of the DASS in a sample of Dutch adults. Previous work showed high internal consistency for the DASS scales, while associations with other instruments indicated good convergent/divergent validity, especially for the depression scale. In line with these previous findings, the current results show that the scales of the internet-administered version also have good classical psychometric properties. Additional modern psychometric analyses showed that the items within each DASS scale showed varying severity and discrimination parameters, although some overlap in item-functioning was observed in the depression and anxiety scales. The measurement information provided by items along the underlying severity dimension also varied within each scale and showed most variation in the anxiety and stress scales. Linking the DASS depression scale items to the items of the QIDS showed that, within the context of a more heterogeneous, clinically defined depression severity spectrum, the DASS items mostly measure in the mild-moderate range of depression severity.
The high alpha coefficients (0.94–0.98) indicated very good internal consistency for the DASS scales. However, together with the high average inter-item correlations (0.55–0.74), these coefficients also suggested that the DASS scales were quite homogeneous in their coverage, especially the DASS depression scale. This is probably because this scale includes overlapping items that measure quite narrow concepts (i.e. depressive cognitions and mood) resulting in a scale that measures a narrow construct (Clark and Watson
1995). Indeed, another direct comparison of the DASS-21 depression scale and the QIDS in a clinical sample showed higher internal consistency for the DASS-21 depression scale, which the authors explained by the fact that the DASS-21 scale is rather homogeneous (mainly cognitive and emotional symptoms) compared to the more comprehensive QIDS, which covers all clinical criteria for a major depressive disorder, including sleeping problems, appetite/weight change, energy-loss and psychomotor retardation/agitation (Weiss et al.
2015). Indeed, deeper investigation of the depression scale with IRT analyses showed strong overlap in item-functioning between items with similar content. For instance, sets of items that all assessed cognitions of worthlessness (items 17, 21, 34, 37 and 38) and items that all assessed lack of positive emotions (items 24 and 31) showed strong overlap. From a theoretical perspective, the fact that many items function in the same way, implies that the severity dimension as indexed by the complete scale score has a restricted range. Clusters of similarly functioning items provide a lot of information about a rather small severity interval. Indeed, when mapped on a common severity scale, the DASS-items provided most measurement information at the lower end of the overall depression severity spectrum, whereas typical criterion symptoms of clinical depressive episodes that are included in the QIDS but not in the DASS depression scale (i.e. psychomotor symptoms, appetite/weight change and hypo/hypersomnia) were endorsed at higher severity levels. Importantly, this indicates that the DASS depression scale cannot provide meaningful information along the whole spectrum of depression severity, which could result in ceiling-effects when the scale is used in more severely depressed populations.
Note that it is not negligence that the DASS included items that are rather similar in content, as the original authors aimed to divide each scale into even more specific ‘subscales’ of 2–5 items (Lovibond and Lovibond
1995a,
b). For instance, the depression scale was meant to assess the following domains: ‘dysphoria’, ‘hopelessness’, ‘devaluation of life’, ‘self-deprecation’, ‘lack of interest/involvement’, ‘anhedonia’ and ‘inertia’. However, our results suggest that items of self-deprecation (item 21), devaluation of life (item 38) and hopelessness (item 37) functioned very similarly, indicative of a limited differentiation between these subdomains.
As stated above, the results show that the DASS depression scale is most useful to differentiate between mild-moderate severity levels. The finding of potentially redundant items may suggest that the depression scale, and possibly the other scales as well, can be shortened without compromising their differentiating ability within this range. Indeed, the short DASS-21 (Lovibond and Lovibond
1995b) includes only seven items per scale and has been quite thoroughly investigated using classical (e.g. Antony et al.
1998; Clara et al.
2001; Sinclair et al.
2012; Osman et al.
2012; Gomez et al.
2014) and modern (Shea et al.
2009; Parkitny et al.
2012) psychometric techniques. However, the depression scale of the DASS-21 still includes sets of items that were found to overlap in this study (DASS-21 items 17 and 21 [worthlessness/meaninglessness] and items 3 and 16 [lack of positive feelings/enthusiasm]). Based on the present findings, further shortening of the DASS scales could be considered. For instance, calculations in the current dataset showed that shortening the DASS depression scale to 5 items would still result in a scale with good internal consistency (alpha = 0.92; with DASS-21 item 5 [‘I found it difficult to work up the initiative to do things’] and item 21 [‘I felt that life was meaningless’] removed). Although this observation was based on data collected with the full-length DASS, it is in line with previous Rasch analyses (Shea et al.
2009), which suggested that the depression scale could be improved by removing item 5 (‘I found it difficult to work up the initiative to do things’). Alternatively, the DASS depression scale could be extended with a range of more diverse symptoms (e.g. vegetative symptoms) to increase the heterogeneity of the covered domains and the scale’s measurement range.
Although the properties of the anxiety scale could not be investigated in as much detail because secondary measures of anxiety were not administered, its average inter-item correlation was considerably lower (0.55) than for the depression scale. Although this indicates that scale homogeneity was less marked, some overlap in item functioning was observed in the IRT results, with four items that cover ‘situational anxiety’ (items 9 and 40) and ‘subjective experiences of anxious affect’ (items 20 and 28) providing most of their measurement information at the same severity level. Additionally, information at the mild-moderate end of the anxiety spectrum was mostly provided by items covering situational and subjective anxiety (i.e. panic, feeling scared), whereas information on the moderate-severe end of the spectrum was provided by items covering symptoms of autonomic/somatic arousal (i.e. trembling, perspiring, difficulty swallowing).
Within the stress scale, the average inter-item correlation was also lower than for the depression scale (0.56), but was still high enough to indicate some item redundancy. Although inspection of the IRT parameters of the stress scale showed that there were no sets or clusters of items with strongly overlapping functioning, most items were located relatively close together on the latent dimension (as indicated by their averaged item thresholds). This suggests that there is also room for improvement for the stress scale.
The current study had several strengths, including the large sample size, which provided the possibility to investigate the DASS’s psychometric properties in different demographic groups. Additional strengths were the use of modern psychometric techniques, and the linking of DASS depression scores with scores on the QIDS. However, some study limitations should be kept in mind. First, the data were collected in volunteers through an internet-platform, which attracted respondents that were relatively highly educated and often female. Consequently, the generalizability of the results to the general population - or subpopulations that are not covered by the current study - requires further investigation. Second, the full version of the DASS was used, instead of the shorter and often used DASS-21. The generalizability of the psychometric performance results from the current study to the short-form version needs further evaluation. Third, for the DASS anxiety and stress scales convergent validity could not be investigated very deeply, because more specialized anxiety and stress measures were not administered. Consequently the linking analyses could only be performed for the DASS depression scale. Finally, the sample was recruited from the general population and no information was available about formal (DSM-5) anxiety/depressive disorder diagnoses, limiting possibilities to test the scales’ relationships with diagnosed clinical psychopathology.
A promising direction for further research in the context of online-administered depression and anxiety instruments - including the DASS, is the implementation of computerized adaptive testing. The current results already provide some insight into how the scales’ items are distributed along their respective underlying severity spectra (Wahl et al.
2014). Such information is a good starting point for the development of algorithms that can quickly and effectively zero in on a person’s severity level, by strategically adapting each next administered item to the responses given on the previous items. Such algorithms could save administration time and would make measurement more personal (e.g., less administration of items that do not apply to the respondents) while increasing precision.
In conclusion, the present classical and modern psychometric investigation showed the internet-administered version of the DASS to (a) have good classical psychometric properties, (b) contain sets of items with similar item-functioning, and (c) be most suitable to measure dimensional depression severity variations in population samples (mild-moderate severity levels).