Adaptive screening for depression — Recalibration of an item bank for the assessment of depression in persons with mental and somatic diseases and evaluation in a simulated computer-adaptive test environment
Introduction
Self-report instruments, also referred to as patient-reported outcomes (PRO), are a common means of identifying depression in routine clinical practice and research. Many such questionnaires have been developed and persuasive psychometric characteristics have been reported for these instruments based upon Classical Test Theory (CTT) assumptions [1], [2].
However, in the last years it was demonstrated that PRO could benefit substantially from modern approaches such as item response theory (IRT) [3]. Generally, applying IRT models can provide additional perspectives on instruments used for depression diagnostics, such as revelation of item bias across subgroups [4], [5], [6], [7], infringement of unidimensionality [8], [9], or redundancies in the item sets [10]. Consequently, some potential for further improvement of depression specific PROs is evident. Because of its particular desirable properties such as parsimony and similar differentiation of items the one-parameter Rasch model, a member of the group of IRT models, was used for the present study [11], [12].
A recent and probably most appealing new perspective offered by IRT is the implementation of Computer-Adaptive Testing (CAT). CAT chooses and presents targeted items from a calibrated item bank to the respondent, thereby minimizing the standard error of measurement (SEM) and reducing test length [13], [14]. Simulation studies demonstrate that CAT may measure sufficiently precise with approximately six items [15], [16].
The central foundation stone of each unidimensional CAT is a calibrated item bank [17]. This is a set of items with proven unidimensionality for measuring the latent variable and with item difficulties capturing a wide range of this dimension. Items are calibrated, i.e., estimates of item parameters (such as the item difficulty) are provided for each item.
Forkmann and colleagues [18] developed the Aachen Depression Item Bank (ADIB) that has been calibrated on a mixed sample of both persons with primarily mental illnesses (depression) and primarily somatic illnesses (persons with cardiac or otorhinolaryngologic diseases). Using the software WINSTEPS 3.60.1 Forkmann et al. [18] showed that the ADIB is essentially unidimensional, fits the Rasch model and captures a wide range of the latent continuum. The ADIB proved to be useful for the derivation of high quality static short scales for the assessment of depression supporting its general psychometric quality [19], [20], [21], [22], [23]. However, Forkmann et al. [18] further reported that there were small signs of a potential secondary dimension constituted by items about suicidal ideation and behavior. This finding might be interpretable in line with the assumption that suicidal ideation and behavior might have to be considered as a nosological entity itself [24]. Signs for multidimensionality were only minor. Nevertheless, strict – as opposed to essential – unidimensionality is necessary for bias-free estimates in CAT procedures which requires a more rigorous statistical approach [[25], [26], and methods section]. Furthermore, local independence was assessed using a less rigid criterion than necessary if the item bank should be used for computerized adaptive testing so that the recalibration reported in the present study appeared inevitable.
The current study had three aims. The first aim was to conduct a recalibration of the ADIB through secondary analysis of data from the study of Forkmann et al. [18] using more strict criteria in order to improve unidimensionality, local independence and reduce DIF. Based on a thoroughly calibrated item bank a CAT program could be build, which is the final aim of the item bank development. A CAT that accesses an item bank calibrated on patient samples with mental and somatic diseases would help to reduce time and test burden, enhance precision of measurement and allow for bias free estimates of depression severity independent of somatic diseases. Based on more economic, precise and bias-free depression measurements it is conceivable that therapeutic interventions could be targeted more purposefully to the patient.
The second aim was to cross-validate the new calibration of the ADIB on an independently drawn sample of patients undergoing rehabilitation for cardiac diseases. The third aim was to conduct a preliminary simulation study in order to test the item bank's performance in a simulated CAT environment with regard to its precision, economy, and the validity of the interpretation of θ estimates based on the CAT. In a real CAT each patient fills in adaptively presented items at the computer. By contrast, in a simulated CAT, paper and pencil data on the items of the bank are treated as if they had been collected adaptively. That means that the algorithm chooses a first item of medium difficulty and then, based on the real answer given by the patient, the next item is chosen. Before real CAT application, CAT is usually used in simulation studies to see whether further improvement is necessary.
Section snippets
Samples
The recalibration of the item bank (step I) was conducted through a secondary analysis of data reported in Forkmann et al. [18] that was recruited from a German university hospital and a community psychiatric clinic (sample I; N = 367: 161 patients treated for a depressive syndrome (DP), 103 patients from cardiology (CP), and 103 patients from otorhinolaryngology (OP)). Participants' average age was 44.1 (SD = 14.0) and 44.7% were female (see [18] for details).
Cross-validation (step II) was
Step I) Initial evaluation of unidimensionality of the ADIB
The EFA revealed a three-dimensional solution for the ADIB (RMSEA = 0.087). The dominant factor (“cognitive/emotional depression”) contained 48 items (e.g., “…could nothing spark your interest?”). The second factor (“self-esteem”) comprised 17 items (e.g., “…have you felt worthless?”). The third factor (“suicidal ideation”) consisted of 9 items (e.g., “…have you thought about taking your life?”). Five items were removed due to cross-loadings (Table 1). The dominant factor was chosen as starting
Discussion
This study aimed at recalibrating the Aachen Depression Item bank ADIB [18] and cross-validated it in a newly drawn sample. Moreover, the performance of the ADIB was tested in a simulated CAT environment with regard to its precision, economy, and validity.
Generally, item bank characteristics in the recalibration were very good, indicated by absence of DIF, evidence for strict unidimensionality and good overall and individual item fit. The cross-validation of the recalibrated item bank
Conclusions
Taken together, based on the data of the current study a recalibration of the ADIB could be determined and the simulation studies conducted suggest good psychometric properties of the item bank when integrated in a simulated adaptive test environment. The item bank is now ready to use. It reasonably extends the range of depression item banks available since it was developed to explicitly target depression measurement in both patients with mental disorders and somatic diseases and shows good
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
TF contributed to conception and design of the study, collected the data, conducted the statistical analysis, interpreted the analysis and wrote the manuscript. UK designed the statistical analyses and supervised the technical procedures. MB participated in the design of the study and interpretation of the analysis. CN and MW contributed to the design of the study and the statistical analysis. HB and SG coordinated the data acquisition, contributed to the statistical analysis and revised the
Acknowledgments
This research project was supported by the START-program of the Faculty of Medicine, RWTH Aachen and the German Research Foundation (DFG, WI3210/2-1). The data for the cross-validation of the ADIB stem from the project “Development and validation of a computer adaptive test (CAT) for cardiac patients undergoing rehabilitation: RehaCAT-Cardio”, funded by the Illa and Werner Zarnekow-Foundation (T225-18.152).
References (63)
- et al.
Comparative validity of three screening questionnaires for DSM-IV depressive disorders and physicians' diagnoses
J Affect Disord
(2004) - et al.
Performance characteristics of depression screening instruments in survivors of acute myocardial infarction: review of the evidence
Psychosomatics
(2007) - et al.
Assessment of late life depression
Biol Psychiatry
(2002) - et al.
Development and validation of the Rasch-based depression screening (DESC) using Rasch analysis and structural equation modelling
J Behav Ther Exp Psychiatry
(2009) - et al.
Psychometric evaluation of the Rasch-based depression screening in patients with neurologic disorders
Arch Phys Med Rehabil
(2010) - et al.
The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008
J Clin Epidemiol
(2010) - et al.
Even minimal symptoms of depression increase mortality risk after acute myocardial infarction
Am J Cardiol
(2001) - et al.
Depression following acute coronary syndromes: a comparison between the Cardiac Depression Scale and the Beck Depression Inventory II
J Psychosom Res
(2006) - et al.
Simulated computerized adaptive test for patients with lumbar spine impairments was efficient and produced valid measures of function
J Clin Epidemiol
(2006) - et al.
Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS)
J Clin Epidemiol
(2008)
Item response theory for psychologists
Körperliche Beschwerden und deren Einfluss auf die Erfassung depressiver Störungen bei jüngeren und älteren Menschen
Z Gerontopsychol Gerontopsychiatr
Rasch analysis of the Beck Depression Inventory-II in a neurological rehabilitation sample
Disabil Rehabil
Rasch analysis of the hospital anxiety and depression scale (HADS) for use in motor neurone disease
Health Qual Life Outcomes
Validation of the Bech-Rafaelsen Melancholia Scale and the Hamilton Depression Scale in patients with major depression; is the total score a valid measure of illness severity?
Acta Psychiatr Scand
Outcome measures of antidepressive therapy
Acta Psychiatr Scand
The Geriatric Depression Scale should be shortened: results of Rasch analysis
Int J Geriatr Psychiatry
Applying the Rasch model: fundamental measurement in the human sciences
The Rasch measurement model in rheumatology: what is it and why use it? When should it be applied, and what should one look for in a Rasch paper?
Arthritis Rheum
Computer adaptive testing
J Appl Meas
Was ist adaptives testen? [What is adaptive testing?]
Psychother Psychosom Med Psychol
Development of a computer-adaptive test for depression (D-CAT)
Qual Life Res
Computerized adaptive measurement of depression: a simulation study
BMC Psychiatry
Item banks: what, why, how
J Educ Meas
Development of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis
Rehabil Psychol
Validation of the Rasch-based depression screening in a large scale German general population sample
Health Qual Life Outcomes
Das Rasch-basierte Depressionsscreening. Manual
Cross-sectional validation of the Rasch-based Depression Screening (DESC) in a mixed sample of patients with mental and somatic diseases
Compr Psychiatry
Suicidal disorders: a nosological entity per se?
Am J Med Genet C, Semin Med Genet
A new item response theory modelling approach with applications to unidimensionality assessment and ability estimation
Psychometrika
An initial application of computerized adaptive testing (CAT) for measuring disability in patients with low back pain
BMC Musculoskelet Disord
Cited by (19)
Testing the PROMIS® Depression measures for monitoring depression in a clinical sample outside the US
2015, Journal of Psychiatric ResearchCitation Excerpt :A case of particular importance is emotional disturbance and depression, constructs negatively influencing the course of health (Anderson et al., 2001; Scott et al., 2009) that have been recommended as main outcomes to assess the impact of treatments for various specific conditions (Turk et al., 2003). Efforts have been made to develop item banks for CAT depression instruments (Fliege et al., 2005; Forkmann et al., 2013; Gardner et al., 2004; Gibbons et al., 2008, 2012). Among them, the PROMIS system includes a depression domain as part of the overall health profile; it is also the only IRT-based depression measure available in Spanish.
Differential Performance of Computerized Adaptive Testing in Students With and Without Disabilities – A Simulation Study
2024, Journal of Special Education TechnologySimulating computerized adaptive testing in special education based on inclusive progress monitoring data
2022, Frontiers in EducationDevelopment of Rasch-based short screenings for the assessment of treatment motivation in patients with cardiovascular diseases
2020, Disability and Rehabilitation