Research paper
A computerized version of the Patient Health Questionnaire-4 as an ultra-brief screening tool to detect emotional disorders in primary care

https://doi.org/10.1016/j.jad.2018.01.030Get rights and content

Highlights

  • Emotional disorders are highly prevalent and comorbid but misdiagnosed in primary care centres.

  • The aim of the study was to determine the psychometric properties of a computerized version of the PHQ-4 questionnaire.

  • A two-factor structure and invariant model was found when the PHQ-4 was studied in a sample of 1052 primary care patients.

  • The PHQ-4 performed very well as a screening instrument with a cut-off value of 3 for both factors, the PHQ-2 for depression and the GAD-2 for anxiety.

Abstract

Background

The Patient Health Questionnaire-4 (PHQ-4) is an ultra-brief self-report consisting of a 2-item depression scale (PHQ-2) and a 2-item anxiety scale (GAD-2). The aim of the present study is to determine the psychometric properties of a computerized version of the PHQ-4 used to detect emotional disorders (anxiety and depression) in the primary care setting.

Method

A total of 1052 patients with suspected anxiety, depression, or somatic symptoms were recruited from 28 primary care centres participating in the PsicAP trial and completed the full version of the computerized PHQ. In addition, 178 of these patients also underwent in clinical interviews as a gold standard.

Results

Confirmatory factor analyses showed very good fit indices for a two-factor solution. This model was structurally invariant among the various age and gender groups and internal consistency was acceptable (PHQ-4; α = .83, PHQ-2; α = .86, and GAD-2; α = .76). The best cut-off points to obtain high sensitivity values was 3, on both the PHQ-2 (major depressive disorder) and the GAD-2 (generalized anxiety disorder). The criterion validity (sensitivity and specificity) for the PHQ-2 were .90 and .61 and for the GAD-2, .88 and 0.61.

Limitations

The study was not designed as a prevalence study. Therefore, does not contain information on patients whose general practitioners do not consider them to suffer emotional disorders.

Conclusion

This is the first study to provide evidence for the reliability and validity of a computerized version of the PHQ-4. This computerized tool can be used to detect depression and anxiety in a primary care setting.

Introduction

Major depressive disorder (MDD) and generalized anxiety disorder (GAD) are the most common emotional disorders (ED) (Remes et al., 2016, World Health Organization, 2017). Both are highly comorbid (Kelly and Mezuk, 2017, Roca et al., 2009) and have a large, negative impact on patient functioning and quality of life (Gili et al., 2010, Whiteford et al., 2015). These disorders are generally first identified and treated in primary care (PC) settings (Haro et al., 2006, Serrano-Blanco et al., 2010), where the time available for consultation is limited, with physicians having only 5–10 min to diagnose, treat and refer patients for additional treatment (Bowers, 1993). Moreover, the high prevalence of EDs in the PC setting can overburden the daily practice of general practitioners (GP), which explains why detection and treatment rates remain very low, with several reports indicating that fewer than half of depressive episodes (Mitchell et al., 2009) and anxiety disorders (Parmentier et al., 2013) are correctly diagnosed. Numerous reports have reported the tendency for EDs to be underdiagnosed in the PC setting (Bowers, 1993; Castro-Rodríguez et al., 2015; Fernández et al., 2010; Nuyen et al., 2005) although—paradoxically—other authors have found that these conditions may also be overdiagnosed (Aragones et al., 2006, Schumann et al., 2012). Fernández et al. (2010) reported that MDD and GAD were misdiagnosed (i.e., over- or under-diagnosed) by GPs in Spain in 78% and 71% of cases, respectively. This is a serious problem given that a diagnostic error (e.g., an incorrect or missed diagnosis) reduces the likelihood that the patient will receive optimal treatment for MDD or GAD.

The use of screening tools has been shown to improve the detection and diagnosis of EDs in PC centres (Ansseau et al., 2004, Fernández et al., 2012, Gabarron Hortal et al., 2002, Olariu et al., 2015, Patel et al., 2008). Unfortunately, time constraints in the PC setting, together with the high prevalence of repeat patients with mild and/or subthreshold conditions mixed with somatic complaints (Bowers, 1993), necessarily reduces the use of screening tools, which in turn negative impacts the quality of health care (Zimmerman and Mattia, 2001). To address this problem, several authors have proposed the use of ultra-short screening tools (Goldberg et al., 2017, Kroenke et al., 2009, Mitchell and Coyne, 2007), which could help to improve clinical outcomes, primarily by reducing misdiagnosis rates in the PC setting (Aragones et al., 2006, Arroll et al., 2010, Gilbody et al., 2008, Schumann et al., 2012).

One validated ultra-brief screening tool used in some PC settings is the Patient Health Questionnaire-4 (PHQ-4) (Kroenke et al., 2009). This instrument combines the two short versions (the PHQ-2 and the GAD-2) of the PHQ-9 and GAD-7, the specific depression and anxiety modules of the Patient Health Questionnaire (PHQ; Spitzer et al., 1999). The PHQ, based on the DSM-IV criteria for mental disorders, is the most widely used screening tool in PC settings (Arroll et al., 2010, Gilbody et al., 2008, Manea et al., 2015, Manea et al., 2016, Plummer et al., 2016). The PHQ-4 has a two-factor structure—one containing the two GAD-2 items and the other containing the two PHQ-2 items—that explains 84% of the total variance (Kroenke et al., 2009). This bidimensional structure has been supported in recent studies (Kocalevent et al., 2014; Löwe et al., 2010). The PHQ-2 contains only the two core items of the PHQ depression module (PHQ-9: Kroenke et al., 2001) and the GAD-2 is comprised of the two core items of the GAD-7 (Spitzer et al., 2006). Importantly, despite their brevity, these instruments nonetheless present adequate psychometric properties (Hinz et al., 2017, Löwe et al., 2010, Manea et al., 2016, Plummer et al., 2016). For instance, in a meta-analysis, Manea et al. (2016) reported that the PHQ-2 shows pooled sensitivity and specificity, respectively, of .76 and .87 at a cut-off point of 3, both of which were lower than found in the original study of the PHQ-4 (sensitivity = .83 and specificity = .90 at a cut-off point of 3; Kroenke et al., 2003). By contrast, in another meta-analysis, Mitchell et al. (2016) reported sensitivity and specificity that were similar to the original (.89 and .76, respectively). A recent meta-analysis of the GAD-2 found pooled sensitivity and specificity values (at a cut-off point of 3) of .76 and .81, respectively (Plummer et al., 2016). Despite some differences when pooling these test characteristics, some authors have argued that cut-offs should be adjusted to the target population, thus supporting some degree of flexibility in selecting optimal cut points (Kroenke et al., 2010).

Several authors have proposed internet-based testing to avoid some of the limitations of paper-and-pencil tests or face-to-face interviews (e.g. sameness through stigma or social desirability) and to improve the cost-effectiveness of screening (Aboraya et al., 2005, Barak and English, 2002, Luxton et al., 2014). Web-based screening questionnaires have also been proposed to screen for mental disorders, with several studies showing that such tests are both reliable and feasible, with the benefit of facilitating online data collection (Donker et al., 2010, Donker et al., 2011, Lin et al., 2007, Nguyen et al., 2015, van Ballegooijen et al., 2012). Online screening questionnaires have also shown good psychometric properties in PC settings to identify ED (Farvolden et al., 2003, Muñoz-Navarro et al., 2017a, Muñoz-Navarro et al., 2017b, Muñoz-Navarro et al., 2016). For instance, Farvolden et al. (2003) recruited 193 participants who completed a two-step web-based depression and anxiety test. The first step involved 11 preliminary items (based on DSM-IV criteria) to detect MDD and other anxiety disorders. In the second step, users were asked to complete additional questions specific to the DSM-IV disorder of interest. Results were sent to a PC health care professional, who then conducted a standardized clinical interview to evaluate the criterion validity. Results for the major diagnostic categories were good (sensitivity = .71–.95; specificity = .87–.97), with the exception of GAD, which presented a somewhat lower but still acceptable sensitivity (.63). Other studies have investigated the value of a web-based version of the GAD-2 (the most widely used ultra-short questionnaire to detect GAD), reporting a sensitivity of .83 and a specificity of .61 at a cut point of 4 (Donker et al., 2011). The PHQ-2 has also been evaluated to determine the criterion validity compared to the PHQ-9; in that study, patients completed questionnaires on a touch-screen computer while waiting to see their GP in the PC centre (Carey et al., 2016); the sensitivity of the PHQ-2 in that study was .90, with a specificity of .78 at a cut-off score of 3. However, to our knowledge, no studies have yet evaluated the utility of a computerized version of the PHQ-4 (GAD-2 and PHQ-2) in PC settings.

In this context, the aim of the present study was:

  • 1)

    to evaluate the psychometric properties (internal consistency, factorial validity, including calculations of measurement invariance of gender and age) of computerized versions of the PHQ-4, PHQ-2, and GAD-2; and

  • 2)

    to assess the criterion validity of these ultra-short questionnaires to determine the optimal cut-off scores for use in Spanish PC centres. The Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I) and the Composite International Diagnostic Interview (CIDI) were used as reference standards.

Section snippets

Sample

This study was conducted at 28 different PC centres included in the PsicAP study (Cano-Vindel et al., 2016) between the months of January 2014 and July 2017 (inclusive). The Ethics Committee at each centre, the National Ethics Committee, and the Spanish Agency of Medicines and Medical Devices (AEMPS), all approved the study protocol (code: ISRCTN58437086). Participants received a study information sheet, which provided full details about the purpose and structure of the study. All participants

Items characteristics, data examination and internal consistency

Descriptive characteristics for the items and subscales of the PHQ-4 are displayed in Table 2. The mean (standard deviation; SD) scores for the PHQ-4, PHQ-2, and GAD-2 were 11.33 (3.20), 5.51 (1.87), and 5.83 (1.75), respectively. Mean item scores ranged from 2.70 to 2.96. Skewness and kurtosis indices were ± 1, denoting compliance with univariate normality (George and Mallery, 2003). Multivariate normality was also corroborated by the Mardia index (< 70). No univariate outliers were detected,

Discussion

To our knowledge, this is the first study to provide evidence for the reliability and validity of a computerized version of the PHQ-4. No previous studies have evaluated an online version of the PHQ-4 and only a few studies have evaluated web-based versions of the PHQ-2 (Carey et al., 2016) and the GAD-2 (Donker et al., 2011). Our results indicate that this version is valid for detecting MDD and GAD in PC patients with a suspected ED.

Our findings are consistent with previous research with

Conclusion

This is the first study to provide evidence for the reliability and validity of a computerized version of the PHQ-4. Our results indicate that this instrument is useful to detect MDD and GAD in PC patients whose GP suspects an ED. However, as several meta-analyses (Manea et al., 2016, Mitchell and Coyne, 2007, Plummer et al., 2016) have suggested, while ultra-short questionnaires are useful as a screening tool to efficiently detect a possible mental disorder, more in-depth tests should be

Acknowledgments

We thank all the PsicAP Research Group who kindly participated in this large project.

Role of the funding sources

We thank Ministerio de Economía y Competitividad, Psicofundación (Fundación española para la promoción y el desarrollo científico y profesional de la Psicología) and Fundación Mutua Madrileña who kindly helped this project with support funding

References (73)

  • J.M. Haro et al.

    Prevalence of mental disorders and associated factors: results from the ESEMeD-Spain study

    Med. Clín.

    (2006)
  • A. Hinz et al.

    Psychometric evaluation of the Generalized Anxiety Disorder Screener GAD-7, based on a large German general population sample

    J. Affect. Disord.

    (2017)
  • K.M. Kelly et al.

    Predictors of remission from generalized anxiety disorder and major depressive disorder

    J. Affect. Disord.

    (2017)
  • K. Kroenke et al.

    An ultra-brief screening scale for anxiety and depression: the PHQ–4

    Psychosomatics

    (2009)
  • K. Kroenke et al.

    The patient health questionnaire somatic, anxiety, and depressive symptom scales: a systematic review

    Gen. Hosp. Psychiatry

    (2010)
  • B. Löwe et al.

    A 4-item measure of depression and anxiety: validation and standardization of the patient health questionnaire-4 (PHQ-4) in the general population

    J. Affect. Disord.

    (2010)
  • L. Manea et al.

    A diagnostic meta-analysis of the patient health questionnaire-9 (PHQ-9) algorithm scoring method as a screen for depression

    Gen. Hosp. Psychiatry

    (2015)
  • L. Manea et al.

    Identifying depression with the PHQ-2: a diagnostic meta-analysis

    J. Affect. Disord.

    (2016)
  • A.J. Mitchell et al.

    Clinical diagnosis of depression in primary care: a meta-analysis

    Lancet

    (2009)
  • F. Plummer et al.

    Screening for anxiety disorders with the GAD-7 and GAD-2: a systematic review and diagnostic metaanalysis

    General. Hosp. Psychiatry

    (2016)
  • M. Roca et al.

    Prevalence and comorbidity of common mental disorders in primary care

    J. Affect. Disord.

    (2009)
  • J.H. Steiger

    Understanding the limitations of global fit assessment in structural equation modeling

    Personal. Individ. Differ.

    (2007)
  • A. Aboraya et al.

    The validity of psychiatric diagnosis revisited: the clinician's guide to improve the validity of psychiatric diagnosis

    Psychiatry

    (2005)
  • E. Aragones et al.

    The overdiagnosis of depression in non-depressed patients in primary care

    Fam. Pract.

    (2006)
  • J.L. Arbuckle

    IBM SPSS AMOS (Version 21.0) [Computer Program]

    (2012)
  • B. Arroll et al.

    Validation of PHQ-2 and PHQ-9 to screen for major depression in the primary care population

    Ann. Fam. Med.

    (2010)
  • A. Barak et al.

    Prospects and limitations of psychological testing on the internet

    J. Technol. Hum. Serv.

    (2002)
  • P.J. Bowers

    Selections from current literature: Psychiatric disorders in primary care

    Fam. Pract.

    (1993)
  • T.A. Brown

    Confirmatory Factor Analysis for Applied Research

    (2014)
  • T.A. Brown et al.

    A proposal for a dimensional classification system based on the shared features of the DSM-IV anxiety and mood disorders: implications for assessment and treatment

    Psychol. Assess.

    (2009)
  • B.M. Byrne

    Structural Equation Modeling with AMOS: Basic Concepts, Applications, and Programming

    (2001)
  • B.M. Byrne

    Structural Equation Modeling with Mplus. 2012. Structural Equation Modeling with AMOS

    (2016)
  • A. Cano-Vindel et al.

    Transdiagnostic cognitive behavioral therapy versus treatment as usual in adult patients with emotional disorders in the primary care setting (PsicAP study): protocol for a randomized controlled trial

    JMIR Res. Protoc.

    (2016)
  • M. Carey et al.

    Validation of the PHQ-2 against the PHQ-9 for detecting depression in a large sample of Australian general practice patients

    Aust. J. Prim. Health

    (2016)
  • G.W. Cheung et al.

    Evaluating goodness-of-fit indexes for testing measurement invariance

    Struct. Equ. Model.

    (2002)
  • R. Eisinga et al.

    The reliability of a two-item scale: Pearson, Cronbach, or Spearman-brown?

    Int. J. Public Health

    (2013)
  • Cited by (45)

    • Population-based prevalence of somatic symptom disorder and comorbid depression and anxiety in Taiwan

      2023, Asian Journal of Psychiatry
      Citation Excerpt :

      These subscales are also named the Patient Health Questionnaire-2 (PHQ-2) and the Generalized Anxiety Disorder-2 (GAD-2) in the literature (Cano-Vindel et al., 2018). Previous studies revealed that scoring 3 or higher on these subscales can be considered as clinically meaningful depression or anxiety (Cano-Vindel et al., 2018). Because only 13 items of the PHQ-15 were used in this survey, we re-analysed the data on combining the PHQ-15 and the HAQ for calculating the suggested cutoff for SSD in our previous study (Huang et al., 2016a).

    • Does awareness of diabetic status increase risk of depressive or anxious symptoms? Findings from the China Multi-Ethnic cohort (CMEC) study

      2023, Journal of Affective Disorders
      Citation Excerpt :

      First, limited by the questionnaire length, PHQ-2 and GAD-2, two ultra-brief scales that identify depression and anxiety symptoms, were used in this large cohort. These two scales have been validated by several studies (Cano-Vindel et al., 2018; Luo et al., 2019; Scoppetta et al., 2021; Thibault and Steiner, 2004; Wicke et al., 2022) as valid and acceptable tools to detect the corresponding symptoms. And a study (Levis et al., 2020) found that the combined use of PHQ-2 and PHQ-9 showed a better specificity in screening of major depression, which demonstrated the importance of PHQ-2 in the diagnosis of depression.

    • The psychological features of distinct somatic syndromes: A cluster analysis according to population-based somatic symptom profiles in Taiwan

      2022, Journal of the Formosan Medical Association
      Citation Excerpt :

      The PHQ-4 was also developed by Kroenke et al.17 and uses a four-point Likert scale with a rating of 0–3 for each item. It has depression and anxiety subscales that are sometimes called the PHQ-2 and the GAD-2.25 Previous studies disclosed that scoring 3 or above for the subscales can be viewed as clinically meaningful depression or anxiety.25

    View all citing articles on Scopus
    View full text