Adaptive screening for depression — Recalibration of an item bank for the assessment of depression in persons with mental and somatic diseases and evaluation in a simulated computer-adaptive test environment

https://doi.org/10.1016/j.jpsychores.2013.08.022Get rights and content

Abstract

Objective

This study conducted a simulation study for computer-adaptive testing based on the Aachen Depression Item Bank (ADIB), which was developed for the assessment of depression in persons with somatic diseases. Prior to computer-adaptive test simulation, the ADIB was newly calibrated.

Methods

Recalibration was performed in a sample of 161 patients treated for a depressive syndrome, 103 patients from cardiology, and 103 patients from otorhinolaryngology (mean age 44.1, SD = 14.0; 44.7% female) and was cross-validated in a sample of 117 patients undergoing rehabilitation for cardiac diseases (mean age 58.4, SD = 10.5; 24.8% women). Unidimensionality of the itembank was checked and a Rasch analysis was performed that evaluated local dependency (LD), differential item functioning (DIF), item fit and reliability. CAT-simulation was conducted with the total sample and additional simulated data.

Results

Recalibration resulted in a strictly unidimensional item bank with 36 items, showing good Rasch model fit (item fit residuals < |2.5|) and no DIF or LD. CAT simulation revealed that 13 items on average were necessary to estimate depression in the range of − 2 and + 2 logits when terminating at SE  0.32 and 4 items if using SE  0.50. Receiver Operating Characteristics analysis showed that θ estimates based on the CAT algorithm have good criterion validity with regard to depression diagnoses (Area Under the Curve  .78 for all cut-off criteria).

Conclusion

The recalibration of the ADIB succeeded and the simulation studies conducted suggest that it has good screening performance in the samples investigated and that it may reasonably add to the improvement of depression assessment.

Introduction

Self-report instruments, also referred to as patient-reported outcomes (PRO), are a common means of identifying depression in routine clinical practice and research. Many such questionnaires have been developed and persuasive psychometric characteristics have been reported for these instruments based upon Classical Test Theory (CTT) assumptions [1], [2].

However, in the last years it was demonstrated that PRO could benefit substantially from modern approaches such as item response theory (IRT) [3]. Generally, applying IRT models can provide additional perspectives on instruments used for depression diagnostics, such as revelation of item bias across subgroups [4], [5], [6], [7], infringement of unidimensionality [8], [9], or redundancies in the item sets [10]. Consequently, some potential for further improvement of depression specific PROs is evident. Because of its particular desirable properties such as parsimony and similar differentiation of items the one-parameter Rasch model, a member of the group of IRT models, was used for the present study [11], [12].

A recent and probably most appealing new perspective offered by IRT is the implementation of Computer-Adaptive Testing (CAT). CAT chooses and presents targeted items from a calibrated item bank to the respondent, thereby minimizing the standard error of measurement (SEM) and reducing test length [13], [14]. Simulation studies demonstrate that CAT may measure sufficiently precise with approximately six items [15], [16].

The central foundation stone of each unidimensional CAT is a calibrated item bank [17]. This is a set of items with proven unidimensionality for measuring the latent variable and with item difficulties capturing a wide range of this dimension. Items are calibrated, i.e., estimates of item parameters (such as the item difficulty) are provided for each item.

Forkmann and colleagues [18] developed the Aachen Depression Item Bank (ADIB) that has been calibrated on a mixed sample of both persons with primarily mental illnesses (depression) and primarily somatic illnesses (persons with cardiac or otorhinolaryngologic diseases). Using the software WINSTEPS 3.60.1 Forkmann et al. [18] showed that the ADIB is essentially unidimensional, fits the Rasch model and captures a wide range of the latent continuum. The ADIB proved to be useful for the derivation of high quality static short scales for the assessment of depression supporting its general psychometric quality [19], [20], [21], [22], [23]. However, Forkmann et al. [18] further reported that there were small signs of a potential secondary dimension constituted by items about suicidal ideation and behavior. This finding might be interpretable in line with the assumption that suicidal ideation and behavior might have to be considered as a nosological entity itself [24]. Signs for multidimensionality were only minor. Nevertheless, strict – as opposed to essential – unidimensionality is necessary for bias-free estimates in CAT procedures which requires a more rigorous statistical approach [[25], [26], and methods section]. Furthermore, local independence was assessed using a less rigid criterion than necessary if the item bank should be used for computerized adaptive testing so that the recalibration reported in the present study appeared inevitable.

The current study had three aims. The first aim was to conduct a recalibration of the ADIB through secondary analysis of data from the study of Forkmann et al. [18] using more strict criteria in order to improve unidimensionality, local independence and reduce DIF. Based on a thoroughly calibrated item bank a CAT program could be build, which is the final aim of the item bank development. A CAT that accesses an item bank calibrated on patient samples with mental and somatic diseases would help to reduce time and test burden, enhance precision of measurement and allow for bias free estimates of depression severity independent of somatic diseases. Based on more economic, precise and bias-free depression measurements it is conceivable that therapeutic interventions could be targeted more purposefully to the patient.

The second aim was to cross-validate the new calibration of the ADIB on an independently drawn sample of patients undergoing rehabilitation for cardiac diseases. The third aim was to conduct a preliminary simulation study in order to test the item bank's performance in a simulated CAT environment with regard to its precision, economy, and the validity of the interpretation of θ estimates based on the CAT. In a real CAT each patient fills in adaptively presented items at the computer. By contrast, in a simulated CAT, paper and pencil data on the items of the bank are treated as if they had been collected adaptively. That means that the algorithm chooses a first item of medium difficulty and then, based on the real answer given by the patient, the next item is chosen. Before real CAT application, CAT is usually used in simulation studies to see whether further improvement is necessary.

Section snippets

Samples

The recalibration of the item bank (step I) was conducted through a secondary analysis of data reported in Forkmann et al. [18] that was recruited from a German university hospital and a community psychiatric clinic (sample I; N = 367: 161 patients treated for a depressive syndrome (DP), 103 patients from cardiology (CP), and 103 patients from otorhinolaryngology (OP)). Participants' average age was 44.1 (SD = 14.0) and 44.7% were female (see [18] for details).

Cross-validation (step II) was

Step I) Initial evaluation of unidimensionality of the ADIB

The EFA revealed a three-dimensional solution for the ADIB (RMSEA = 0.087). The dominant factor (“cognitive/emotional depression”) contained 48 items (e.g., “…could nothing spark your interest?”). The second factor (“self-esteem”) comprised 17 items (e.g., “…have you felt worthless?”). The third factor (“suicidal ideation”) consisted of 9 items (e.g., “…have you thought about taking your life?”). Five items were removed due to cross-loadings (Table 1). The dominant factor was chosen as starting

Discussion

This study aimed at recalibrating the Aachen Depression Item bank ADIB [18] and cross-validated it in a newly drawn sample. Moreover, the performance of the ADIB was tested in a simulated CAT environment with regard to its precision, economy, and validity.

Generally, item bank characteristics in the recalibration were very good, indicated by absence of DIF, evidence for strict unidimensionality and good overall and individual item fit. The cross-validation of the recalibrated item bank

Conclusions

Taken together, based on the data of the current study a recalibration of the ADIB could be determined and the simulation studies conducted suggest good psychometric properties of the item bank when integrated in a simulated adaptive test environment. The item bank is now ready to use. It reasonably extends the range of depression item banks available since it was developed to explicitly target depression measurement in both patients with mental disorders and somatic diseases and shows good

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

TF contributed to conception and design of the study, collected the data, conducted the statistical analysis, interpreted the analysis and wrote the manuscript. UK designed the statistical analyses and supervised the technical procedures. MB participated in the design of the study and interpretation of the analysis. CN and MW contributed to the design of the study and the statistical analysis. HB and SG coordinated the data acquisition, contributed to the statistical analysis and revised the

Acknowledgments

This research project was supported by the START-program of the Faculty of Medicine, RWTH Aachen and the German Research Foundation (DFG, WI3210/2-1). The data for the cross-validation of the ADIB stem from the project “Development and validation of a computer adaptive test (CAT) for cardiac patients undergoing rehabilitation: RehaCAT-Cardio”, funded by the Illa and Werner Zarnekow-Foundation (T225-18.152).

References (63)

  • SE Embretson et al.

    Item response theory for psychologists

    (2000)
  • S Gauggel et al.

    Körperliche Beschwerden und deren Einfluss auf die Erfassung depressiver Störungen bei jüngeren und älteren Menschen

    Z Gerontopsychol Gerontopsychiatr

    (1994)
  • RJ Siegert et al.

    Rasch analysis of the Beck Depression Inventory-II in a neurological rehabilitation sample

    Disabil Rehabil

    (2010)
  • CJ Gibbons et al.

    Rasch analysis of the hospital anxiety and depression scale (HADS) for use in motor neurone disease

    Health Qual Life Outcomes

    (2011)
  • RW Licht et al.

    Validation of the Bech-Rafaelsen Melancholia Scale and the Hamilton Depression Scale in patients with major depression; is the total score a valid measure of illness severity?

    Acta Psychiatr Scand

    (2005)
  • R Rosenberg

    Outcome measures of antidepressive therapy

    Acta Psychiatr Scand

    (2000)
  • WK Tang et al.

    The Geriatric Depression Scale should be shortened: results of Rasch analysis

    Int J Geriatr Psychiatry

    (2005)
  • TG Bond et al.

    Applying the Rasch model: fundamental measurement in the human sciences

    (2001)
  • A Tennant et al.

    The Rasch measurement model in rheumatology: what is it and why use it? When should it be applied, and what should one look for in a Rasch paper?

    Arthritis Rheum

    (2007)
  • RC Gershon

    Computer adaptive testing

    J Appl Meas

    (2005)
  • T Forkmann

    Was ist adaptives testen? [What is adaptive testing?]

    Psychother Psychosom Med Psychol

    (2011)
  • H Fliege et al.

    Development of a computer-adaptive test for depression (D-CAT)

    Qual Life Res

    (2005)
  • W Gardner et al.

    Computerized adaptive measurement of depression: a simulation study

    BMC Psychiatry

    (2004)
  • BD Wright et al.

    Item banks: what, why, how

    J Educ Meas

    (1984)
  • T Forkmann et al.

    Development of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis

    Rehabil Psychol

    (2009)
  • T Forkmann et al.

    Validation of the Rasch-based depression screening in a large scale German general population sample

    Health Qual Life Outcomes

    (2010)
  • T Forkmann et al.

    Das Rasch-basierte Depressionsscreening. Manual

  • T Vehren et al.

    Cross-sectional validation of the Rasch-based Depression Screening (DESC) in a mixed sample of patients with mental and somatic diseases

    Compr Psychiatry

    (2013)
  • M Leboyer et al.

    Suicidal disorders: a nosological entity per se?

    Am J Med Genet C, Semin Med Genet

    (2005)
  • WF Stout

    A new item response theory modelling approach with applications to unidimensionality assessment and ability estimation

    Psychometrika

    (1990)
  • AH Elhan et al.

    An initial application of computerized adaptive testing (CAT) for measuring disability in patients with low back pain

    BMC Musculoskelet Disord

    (2008)
  • Cited by (19)

    View all citing articles on Scopus
    View full text