The Clinical Global Impressions scale: errors in understanding and use

doi:10.1016/j.comppsych.2008.08.005

Comprehensive Psychiatry

Volume 50, Issue 3, May–June 2009, Pages 257-262

https://doi.org/10.1016/j.comppsych.2008.08.005 Get rights and content

Abstract

Objective

The Clinical Global Impressions Severity and Improvement scales (CGI-S and CGI-I) are widely included as efficacy data in psychopharmacology new drug application submissions. This study was conducted to determine the extent to which clinical trials investigators included information unrelated to efficacy in their CGI ratings.

Method

Forty-five principal investigators provided CGI-S and CGI-I ratings of narratives of patients with major depressive disorder or generalized anxiety disorder. Investigators were blindly randomized to receive narratives that either did (experimental) or did not (control) contain indication-unrelated medical or psychiatric adverse events. Investigators then completed a survey assessing CGI-S and CGI-I rating patterns.

Results

CGI-S and CGI-I ratings were significantly more severe and less improved when the narratives contained medical and psychiatric adverse events unrelated to the diseases under study (major depressive disorder and generalized anxiety disorder) than when the narratives did not (Ps < .04). In response to the survey, 46% and 56% of investigators reported that a psychiatric adverse event unrelated to the disease under study would not affect their CGI-S and CGI-I ratings, respectively. Although 87% of investigators reported that their CGI-S and CGI-I ratings would not be affected by a medical adverse event, actual CGI-S ratings were significantly more severe when an unrelated medical adverse event was described as occurring than when it was not (P < .03).

Conclusion

Clinical trials investigators' inclusion of indication-irrelevant adverse events threatens the validity of the CGI as an efficacy measure and may contribute to failure to detect efficacy signals in psychopharmacology clinical trials.

Introduction

To gain Food and Drug Administration (FDA) approval to market a new drug for a given disease indication, pharmaceutical companies are required to submit both data that demonstrate safety of the drug and data that demonstrate efficacy of the drug for the indication under consideration [1]. Pharmaceutical sponsors identify in their study protocols the measures by which safety will be demonstrated and the separate measures by which efficacy will be demonstrated. Statistical analysis plans for the analysis of safety data and the analysis of efficacy data are submitted and approved.

Two of the more widely used efficacy scales in central nervous system (CNS) trials are the Clinical Global Impressions —Severity and Improvement scales (CGI-S and CGI-I) [2]. The CGI-S and CGI-I were first published as part of an assessment packet promulgated by the US government for the study of psychotropic drugs [3]. The CGI-S and CGI-I were designed to provide a basis, independent of ratings on a questionnaire, for the study clinician to make a global assessment of a study patient's condition before and then after the initiation of a study medication. In this manner, it provided a means of determining whether in the view of an experienced clinician the condition under study had improved, worsened, or stayed the same.

For the CGI-S item, researchers conducting psychopharmacology trials of a pharmacologic agent for the treatment of a defined condition were asked to evaluate the patient's condition before the initiation of the studied medication (ie, at baseline): “Considering your total clinical experience with this particular population, how mentally ill is the patient at this time?” An illness severity rating was then made on a scale of 1 to 7, with 1 being “normal not at all mentally ill” and 7 being “among the most extremely ill patients.” Subsequently, the patient's condition on the study drug (or placebo) was to be compared to the patient's condition before the initiation of the study drug (or placebo) (baseline) via additional CGI-S ratings or the CGI-I item. For the CGI-I, the investigator assessed whether the patient's condition was improved, worse, or the same; the scale ranged from 1 “very much improved” to 7 “very much worse,” with 4 denoting “no change.” In 1985, an NIMH publication on assessments reminded raters that the CGI-S in its very early renditions (dates not given) used to read, “Considering your total clinical experience, how mentally ill is the patient at this time?” whereas the current version reads, “Considering your total clinical experience with this particular population, how mentally ill is the patient at this time?” As explained in the 1985 NIMH publication, this was done to make it very clear that the rating was designed to pertain only to the disease under study [4]. A third item, CGI-“Therapeutic Efficacy,” rarely used, was presented in the 1970 and 1976 manuals [1], [2]; this item, which consists of a 4-by-4 matrix, specifically directed the researcher to plot drug-related improvement or worsening of the condition under study on the y-axis and drug-related adverse events or side effects on the x-axis. The intersecting point was interpreted as the “risk/benefit ratio” of efficacy to safety. This third measure was explicitly different from the CGI-S and CGI-I, in which improvement or worsening in condition was considered irrespective of whether the investigator believed it was drug-related and in which adverse events or side effects were not considered.

Thirty years after their publication, despite a variety of proposed revisions and modifications [5], [6], [7], [8], the 1976 CGI-S and CGI-I continue to be widely included as efficacy data in FDA submissions. A March 2008 search of Clinical Trials.Gov, the online listing of clinical trials provided by the National Institutes of Health, identified 626 currently enrolling or recently completed studies that listed the CGI as an efficacy measure. In current usage, the CGI-S is often administered throughout the study, not just at baseline; furthermore, studies have expanded the importance of the CGI-S by requiring that a minimum CGI-S score (of 4, “moderate,” for example) be present as a criterion for study entry. Thus, not only is the CGI currently used as an efficacy measure, it is also used to define the population in whom drug efficacy will be studied [9], [10], [11], [12].

The importance of the CGI-S and CGI-I is not limited to the trials and approval process. Should a sponsor be allowed to market a drug for a specific indication, the CGI-S and CGI-I data that helped form the basis of approval are often then included as part of FDA-governed labeling claims for efficacy. Not infrequently, FDA-governed package inserts describe drug efficacy in terms of the percent of subjects on drug who were assigned a CGI-I rating of 1 “very much improved” or 2 “much improved.” At present, CGI-S and CGI-I data are part of the package insert of all major classes of marketed psychotropics [13].

CGI-S and CGI-I data are also relied upon by the scientific community at large. An influential article in the Journal of the American Medication Association that examined published antidepressant drug and placebo response rates in publications across 2 decades characterized response as either 50% baseline to end point reduction on the Hamilton Depression Rating Scale or end point CGI-I ratings of 1 “very much improved” or 2 “much improved” [14]; this article further noted that such CGI-I classifications were routinely used to characterize treatment response.

Given the widespread use of the CGI-S and CGI-I as measures of efficacy of investigational agents for particular indications, as well as their apparent simplicity, we were often surprised to learn anecdotally that many active investigators were unclear as to whether they were to include in their efficacy ratings safety information and/or information concerning efficacy of other conditions.

Based on discussions with many investigators, it seemed to us that these investigators did not understand the CGI and did not understand its role as an efficacy assessment. Instead they seemed to be focusing on the term global and interpreting it to mean that all aspects of the subject's condition were to be considered in the rating, including those unrelated to efficacy of the drug for the condition under study. Thus, for example, in a study of an investigational medication for the treatment of major depressive disorder, these investigators might assign a subject with a drug-related or drug-independent “adverse event” (side effect or physical illness, for example, upset stomach) a lower CGI-I rating than they would a subject with identical improvement in major depressive disorder who did not experience an adverse event. This is a misapplication of the efficacy measure and confounds efficacy with safety. In this example, the FDA and, if marketed, the prescribing physician, would have an inaccurate picture of the actual efficacy of the agent for major depressive disorder. Furthermore, we suspected that many investigators included improvement or worsening of comorbid illnesses—that is, illnesses not under study—in their “global” ratings. For example, in a study to determine whether a given agent is efficacious for the indication, generalized anxiety disorder, such investigators might assign a subject with improvement in a comorbid condition, such as major depressive disorder, a higher generalized anxiety disorder CGI-I rating than they would for a subject with identical generalized anxiety disorder improvement who did not have a comorbid illness that improved. Again, this would be a contamination of the process by which efficacy for the drug for the indication under study is determined. The problem did not seem to be limited to junior investigators. Even very senior CNS “key opinion leaders” with whom we spoke seemed to have widely discrepant views as whether adverse events and non-indication illnesses belonged in CGI ratings.

Given the high rate of failure of many CNS investigational drugs to separate from placebo [15], [16], [17], [18], the reduction of error in the measure of efficacy is a critical clinical trials priority.

The present study was designed (a) to explore empirically whether experienced trials investigators were unclear about the information they were to consider when performing a CGI rating and (b) to explore empirically whether actual CGI ratings are affected by the presence of information unrelated to efficacy of the drug for the disease under study.

Section snippets

Subjects

Potential subjects were 167 principal investigators actively engaged in industry-sponsored CNS clinical trials who had been trained on anxiety or depression efficacy scales by United BioSource Corporation (UBC, Wayne, Pa; formerly PharmaStar—a rater training company) within the past 4 years.

Procedure

Potential principal investigator subjects (PI subjects) were solicited by email for interest in participating in a CGI ratings project for which they would receive compensation. Consenting PI subjects were

Results

Forty-five PI subjects (24 experimental and 21 control) returned the CGI narratives by the assigned deadline; 39 (87%) of the 45 then returned the follow-up survey. Principal investigator subjects were highly experienced CGI raters (overall self-reported mean years' experience conducting anxiety or depression CGI ratings = 11.59 years [SD = 6.67 years], overall self-reported mean number of anxiety or depression CGIs rated in the past 12 months = 349.74 ratings [SD = 521.68]). Mean years'

Discussion

The results of this study suggest that active clinical trials principal investigators are unclear about whether to include in CGI ratings safety information and efficacy information unrelated to the indication under study. Only about half of PI subjects felt certain that they would not include an unrelated psychiatric condition in their CGI-S and CGI-I ratings. Furthermore, the results suggest that the problem may actually change CGI ratings. In our analogue of the actual trials situation,

References (19)

SpearingM.K. et al.
Modification of the Clinical Global Impressions scale for use in bipolar illness (BP): the CGI-BP
Psychiatry Res
(1997)
GreenA.I. et al.
First episode schizophrenia-related psychosis and substance use disorders: acute response to olanzapine and haloperidol
Schizophr Res
(2004)
KhanA. et al.
Relationship between depression severity entry criteria and antidepressant clinical trial outcomes
Biol Psychiat
(2007)
Food and Drug Administration
Guidance for industry: E9 statistical principles for clinical trials
(1998)
GuyW.
Clinical global impressions
GuyW. et al.
Clinical global impression
RapoportJ. et al.
Clinical global impressions
BenekeM. et al.
Clinical global impressions (ECDEU): some critical comments
Pharmacopsychiatry
(1992)
HaroJ.M. et al.
The clinical global impression-schizophrenia scale: a simple instrument to measure the diversity of symptoms present in schizophrenia
Acta Psychiatr Scand
(2003)

There are more references available in the full text version of this article.

Cited by (57)

Internet-based emotion-regulation training added to CBT in adolescents with depressive and anxiety disorders: A pilot randomized controlled trial to examine feasibility, acceptability, and preliminary effectiveness
2023, Internet Interventions
Citation Excerpt :
The CGI-I rates the clinical improvement at the moment of assessment compared to baseline, on a 7-point scale (1 = very much improved to 7 = very much worse). Results on validity of this questionnaire have been contradictory (e.g., Busner et al., 2009; De Beurs et al., 2019; Zaider et al., 2003). However, its use is widely accepted and the questionnaire is one of the most frequently used therapist-assessments in psychiatry (Forkmann et al., 2011).
Dysfunctional emotion regulation (ER) is associated with symptoms of depression and anxiety in adolescents. This pilot study aimed to examine the acceptability and feasibility of a guided internet-based emotion regulation training (ERT) added to cognitive behavioral therapy (CBT). Furthermore, we aimed to examine the feasibility of the randomized study design and to provide a first estimate of the effectiveness of CBT + ERT compared with CBT alone in adolescents with depressive or anxiety disorders.
In a pilot randomized controlled trial (RCT) with a parallel group design, 39 patients (13–18 years) with depressive or anxiety disorder were assigned to CBT + ERT (n = 21) or CBT (n = 18). Assessments at baseline, three-months and six-months follow-up included treatment adherence, satisfaction, depressive symptoms, anxiety symptoms, and ER strategies.
Adherence to ERT was 66.5 %, and treatment satisfaction was adequate. 76.5 % of eligible patients participated in the study. Linear mixed-model analyses showed significantly reduced anxiety symptoms (p = .003), depressive symptoms (p = .017), and maladaptive ER (p = .014), and enhanced adaptive ER (p = .008) at six months follow-up in the CBT + ERT group compared to controls.
The sample size was small, and results regarding effectiveness remain preliminary. Data-collection took place during COVID-19, which may have influenced the results.
Both the intervention and the study design were found to be feasible. In a larger RCT, however, improvement of recruitment strategy is necessary. Preliminary results indicate potential effectiveness in decreasing anxiety, depression, and emotion dysregulation in adolescents. The next step should be the development of an improved internet-based ERT and its evaluation in a larger RCT.
Registered on January 14th, 2020 in The Netherlands Trial Register (NL8304).
Letter regarding the article entitled: ’Quantitative gait analysis value as a predictor of shunt surgery effectiveness in normal pressure hydrocephalus: A technical note’
2022, Clinical Neurology and Neurosurgery
Design and outcome measures of LAVENDER, a phase 3 study of trofinetide for Rett syndrome
2022, Contemporary Clinical Trials
Rett syndrome (RTT) is a debilitating neurodevelopmental disorder with no approved treatments. Trofinetide is a synthetic analog of glycine–proline–glutamate, the N-terminal tripeptide of insulin-like growth factor 1. In a phase 2, placebo-controlled trial in 82 females with RTT aged 5–15 years, a significant (p ≤ 0.042) improvement over placebo was observed with the highest trofinetide dose (200 mg/kg twice daily [BID]) on three measures: Rett Syndrome Behaviour Questionnaire (RSBQ), Clinical Global Impression-Improvement (CGI-I), and RTT-Clinician Domain Specific Concerns-Visual Analog Scale (RTT-DSC-VAS). Trofinetide was well tolerated at all doses (50, 100, and 200 mg/kg BID). A phase 3 trial utilizing disease-specific and novel scales was designed to investigate the efficacy and safety of trofinetide in girls and women with RTT.
This 12-week, double-blind, randomized, placebo-controlled study (LAVENDER; NCT04181723) will evaluate trofinetide in 187 females, aged 5–20 years, with RTT. Co-primary endpoints are the RSBQ and CGI-I scales. Clinical domains of the CGI-I include communication, ambulation, hand use, seizures, attentiveness, and social (eye contact) and autonomic (breathing) aspects. Secondary endpoints will leverage four novel RTT-specific clinician ratings (derived from the RTT-DSC-VAS) of hand function, ambulation, ability to communicate, and verbal communication, and existing scales, to evaluate other core symptoms of RTT, quality of life and caregiver burden. A 40-week, open-label extension study will follow.
This study was designed using disease-specific scales optimized to demonstrate changes in core symptoms of RTT and may provide the first phase 3 data demonstrating drug efficacy in individuals with RTT.
Trial registration: Clinicaltrials.gov NCT04181723
The meaningful change threshold as measured by the 16-item quick inventory of depressive symptomatology in adults with treatment-resistant major depressive and bipolar disorder receiving intravenous ketamine
2021, Journal of Affective Disorders
.To identify a meaningful change threshold (MCT) in depression outcomes in adults with treatment-resistant major depressive disorder (MDD) or bipolar disorder (BD) receiving intravenous ketamine treatment at a community-based mood disorders center.
.A triangular approach integrating both anchor-based and distributive methods was used to identify meaningful change on the patient-reported Quick Inventory for Depressive Symptoms Self-Report 16-Item (QIDS-SR16) as associated with the Patient Global Impression - Severity (PGI-S). Both the QIDS-SR16 and the PGI-S are self-report measures, and were collected at five timepoints (timepoints were approximately 2-7 days apart).
.A total of 297 adults with treatment-resistant depression (TRD) as part of either DSM-5-defined MDD or BD were included. The MCT for the QIDS-SR16 revealed that a mean improvement of 3.38 points from baseline was comparable to a 1-point improvement on the PGI-S. Together with an examination of the probability density function, a 3.5-point change is a reasonable MCT (i.e., 1-point PGI-S improvement) for the QIDS-SR16. A 2-point symptomatic improvement on the QIDS-SR16 was associated with no change on the PGI-S.
.A 3.5-point reduction in the QIDS-SR16 represents a MCT based on the PGI-S for adults with treatment-resistant MDD or BD receiving intravenous ketamine treatment at a community-based mood disorders center. These findings are limited by the post-hoc nature of this analysis and open-label case-series design. Measurement-based care decisions by patients, providers and clinicians, as well as cost/reimbursement decisions should include consideration of meaningful change along with conventional objective outcomes.
Are non-abstinent reductions in World Health Organization drinking risk level a valid treatment target for alcohol use disorders in adolescents with ADHD?
2020, Addictive Behaviors Reports
Citation Excerpt :
It is a valid and reliable measure of ADHD symptom severity (Bostic, et al., 2000; Prince, et al., 2000). The CGI-I evaluates overall improvement in ADHD symptoms since treatment initiation in comparison to the participant’s baseline, ranging from 1 (very much improved) to 7 (very much worse) (Busner, Targum, & Miller, 2009). ADHD treatment response was defined in the main study as a final CGI-I score of 1 (very much improved) or 2 (much improved) with respect to the participant’s baseline ADHD symptom severity.
Abstinence from drinking represents the primary treatment target for alcohol use disorders (AUD) in youth, but few adolescents who engage in problematic drinking seek treatment. A reduction in World Health Organization (WHO) drinking risk level has been established as valid and reliable non-abstinent treatment target for AUD in adults but remains unstudied in youth.
The present study used data from the NIDA-CTN-0028 trial to examine associations between reductions in WHO drinking risk level and changes in global functioning and attention-deficit hyperactivity disorder (ADHD) symptoms during treatment in a sample of adolescents (ages 13–18 years) with ADHD and comorbid substance use disorder (SUD) (n = 297, 61% with AUD) receiving a 16-week intervention that combined ADHD pharmacotherapy (OROS-methylphenidate vs. placebo) and drug-focused cognitive-behavioral therapy.
Shifts in drinking risk level during treatment were highly variable in adolescents treated for ADHD/SUD, and influenced by AUD diagnostic status. In the total sample, 15% of participants had a 2-level or greater reduction in WHO drinking risk level, with 59% and 24% showing no change or an increase in risk-level during treatment respectively. Achieving at least a 2-level change in WHO drinking risk level during treatment was associated with greater reduction in ADHD symptoms and better functional outcomes.
These findings parallel the adult AUD literature and provide preliminary support for the use 2-level reductions in WHO risk levels for alcohol use as a clinically valid non-abstinent treatment outcome for youth with ADHD and comorbid AUD.
A Randomized Double-Blind Placebo-Controlled Trial of Combined Escitalopram and Memantine for Older Adults With Major Depression and Subjective Memory Complaints
2020, American Journal of Geriatric Psychiatry
Geriatric depression is difficult to treat and frequently accompanied by cognitive complaints that increase risk for dementia. New treatment strategies targeting both depression and cognition are urgently needed.
We conducted a 6-month double-blind placebo-controlled trial to assess the efficacy and tolerability of escitalopram + memantine (ESC/MEM) compared to escitalopram + placebo (ESC/PBO) for improving mood and cognitive functioning in depressed older adults with subjective memory complaints (NCT01902004). Primary outcome was change in depression as assessed by the HAM-D post-treatment (at 6 months). Remission was defined as HAM-D ≤6; naturalistic follow-up continued until 12 months.
Of the 95 randomized participants, 62 completed the 6-month assessment. Dropout and tolerability did not differ between groups. Mean daily escitalopram dose was 11.1 mg (SD = 3.7; range: 5–20 mg). Mean daily memantine dose was 19.3 mg (SD = 2.6; range 10–20 mg). Remission rate within ESC/MEM was 45.8% and 47.9%, compared to 38.3% and 31.9% in ESC/PBO, at 3 and 6 months, respectively (χ²(1) = 2.0, p = 0.15). Both groups improved significantly on the HAM-D at 3, 6, and 12 months, with no observed between-group differences. ESC/MEM demonstrated greater improvement in delayed recall (F(2,82) = 4.3, p = 0.02) and executive functioning (F(2,82) = 5.1, p = 0.01) at 12 months compared to ESC/PBO.
The combination of memantine with escitalopram was well tolerated and as effective as escitalopram and placebo in improving depression using HAM-D. Combination memantine and escitalopram was significantly more effective than escitalopram and placebo in improving cognitive outcomes at 12 months. Future reports will address the role of biomarkers of aging in treatment response.

View all citing articles on Scopus

: Portions of the data were previously presented as posters at the 47th Annual Meeting of the NCDEU, Boca Raton, FL, June 11 to 14, 2007, and the 20th Congress of the European College of Neuropsychopharmacology, October 13 to 17, 2007, Vienna, Austria.

: The authors are all affiliated with United BioSource Corporation (Wayne, PA), which provides rater training services. Dr Targum is an equity holder of United BioSource Corporation.

View full text

The Clinical Global Impressions scale: errors in understanding and use

Abstract

Objective

Method

Results

Conclusion

Introduction

Section snippets

Subjects

Procedure

Results

Discussion

Psychiatry Res

Schizophr Res

Biol Psychiat

Guidance for industry: E9 statistical principles for clinical trials

Clinical global impressions

Clinical global impression

Clinical global impressions

Clinical global impressions (ECDEU): some critical comments

Pharmacopsychiatry

The clinical global impression-schizophrenia scale: a simple instrument to measure the diversity of symptoms present in schizophrenia

Acta Psychiatr Scand