Elsevier

Comprehensive Psychiatry

Volume 50, Issue 3, May–June 2009, Pages 257-262
Comprehensive Psychiatry

The Clinical Global Impressions scale: errors in understanding and use

https://doi.org/10.1016/j.comppsych.2008.08.005Get rights and content

Abstract

Objective

The Clinical Global Impressions Severity and Improvement scales (CGI-S and CGI-I) are widely included as efficacy data in psychopharmacology new drug application submissions. This study was conducted to determine the extent to which clinical trials investigators included information unrelated to efficacy in their CGI ratings.

Method

Forty-five principal investigators provided CGI-S and CGI-I ratings of narratives of patients with major depressive disorder or generalized anxiety disorder. Investigators were blindly randomized to receive narratives that either did (experimental) or did not (control) contain indication-unrelated medical or psychiatric adverse events. Investigators then completed a survey assessing CGI-S and CGI-I rating patterns.

Results

CGI-S and CGI-I ratings were significantly more severe and less improved when the narratives contained medical and psychiatric adverse events unrelated to the diseases under study (major depressive disorder and generalized anxiety disorder) than when the narratives did not (Ps < .04). In response to the survey, 46% and 56% of investigators reported that a psychiatric adverse event unrelated to the disease under study would not affect their CGI-S and CGI-I ratings, respectively. Although 87% of investigators reported that their CGI-S and CGI-I ratings would not be affected by a medical adverse event, actual CGI-S ratings were significantly more severe when an unrelated medical adverse event was described as occurring than when it was not (P < .03).

Conclusion

Clinical trials investigators' inclusion of indication-irrelevant adverse events threatens the validity of the CGI as an efficacy measure and may contribute to failure to detect efficacy signals in psychopharmacology clinical trials.

Introduction

To gain Food and Drug Administration (FDA) approval to market a new drug for a given disease indication, pharmaceutical companies are required to submit both data that demonstrate safety of the drug and data that demonstrate efficacy of the drug for the indication under consideration [1]. Pharmaceutical sponsors identify in their study protocols the measures by which safety will be demonstrated and the separate measures by which efficacy will be demonstrated. Statistical analysis plans for the analysis of safety data and the analysis of efficacy data are submitted and approved.

Two of the more widely used efficacy scales in central nervous system (CNS) trials are the Clinical Global Impressions —Severity and Improvement scales (CGI-S and CGI-I) [2]. The CGI-S and CGI-I were first published as part of an assessment packet promulgated by the US government for the study of psychotropic drugs [3]. The CGI-S and CGI-I were designed to provide a basis, independent of ratings on a questionnaire, for the study clinician to make a global assessment of a study patient's condition before and then after the initiation of a study medication. In this manner, it provided a means of determining whether in the view of an experienced clinician the condition under study had improved, worsened, or stayed the same.

For the CGI-S item, researchers conducting psychopharmacology trials of a pharmacologic agent for the treatment of a defined condition were asked to evaluate the patient's condition before the initiation of the studied medication (ie, at baseline): “Considering your total clinical experience with this particular population, how mentally ill is the patient at this time?” An illness severity rating was then made on a scale of 1 to 7, with 1 being “normal not at all mentally ill” and 7 being “among the most extremely ill patients.” Subsequently, the patient's condition on the study drug (or placebo) was to be compared to the patient's condition before the initiation of the study drug (or placebo) (baseline) via additional CGI-S ratings or the CGI-I item. For the CGI-I, the investigator assessed whether the patient's condition was improved, worse, or the same; the scale ranged from 1 “very much improved” to 7 “very much worse,” with 4 denoting “no change.” In 1985, an NIMH publication on assessments reminded raters that the CGI-S in its very early renditions (dates not given) used to read, “Considering your total clinical experience, how mentally ill is the patient at this time?” whereas the current version reads, “Considering your total clinical experience with this particular population, how mentally ill is the patient at this time?” As explained in the 1985 NIMH publication, this was done to make it very clear that the rating was designed to pertain only to the disease under study [4]. A third item, CGI-“Therapeutic Efficacy,” rarely used, was presented in the 1970 and 1976 manuals [1], [2]; this item, which consists of a 4-by-4 matrix, specifically directed the researcher to plot drug-related improvement or worsening of the condition under study on the y-axis and drug-related adverse events or side effects on the x-axis. The intersecting point was interpreted as the “risk/benefit ratio” of efficacy to safety. This third measure was explicitly different from the CGI-S and CGI-I, in which improvement or worsening in condition was considered irrespective of whether the investigator believed it was drug-related and in which adverse events or side effects were not considered.

Thirty years after their publication, despite a variety of proposed revisions and modifications [5], [6], [7], [8], the 1976 CGI-S and CGI-I continue to be widely included as efficacy data in FDA submissions. A March 2008 search of Clinical Trials.Gov, the online listing of clinical trials provided by the National Institutes of Health, identified 626 currently enrolling or recently completed studies that listed the CGI as an efficacy measure. In current usage, the CGI-S is often administered throughout the study, not just at baseline; furthermore, studies have expanded the importance of the CGI-S by requiring that a minimum CGI-S score (of 4, “moderate,” for example) be present as a criterion for study entry. Thus, not only is the CGI currently used as an efficacy measure, it is also used to define the population in whom drug efficacy will be studied [9], [10], [11], [12].

The importance of the CGI-S and CGI-I is not limited to the trials and approval process. Should a sponsor be allowed to market a drug for a specific indication, the CGI-S and CGI-I data that helped form the basis of approval are often then included as part of FDA-governed labeling claims for efficacy. Not infrequently, FDA-governed package inserts describe drug efficacy in terms of the percent of subjects on drug who were assigned a CGI-I rating of 1 “very much improved” or 2 “much improved.” At present, CGI-S and CGI-I data are part of the package insert of all major classes of marketed psychotropics [13].

CGI-S and CGI-I data are also relied upon by the scientific community at large. An influential article in the Journal of the American Medication Association that examined published antidepressant drug and placebo response rates in publications across 2 decades characterized response as either 50% baseline to end point reduction on the Hamilton Depression Rating Scale or end point CGI-I ratings of 1 “very much improved” or 2 “much improved” [14]; this article further noted that such CGI-I classifications were routinely used to characterize treatment response.

Given the widespread use of the CGI-S and CGI-I as measures of efficacy of investigational agents for particular indications, as well as their apparent simplicity, we were often surprised to learn anecdotally that many active investigators were unclear as to whether they were to include in their efficacy ratings safety information and/or information concerning efficacy of other conditions.

Based on discussions with many investigators, it seemed to us that these investigators did not understand the CGI and did not understand its role as an efficacy assessment. Instead they seemed to be focusing on the term global and interpreting it to mean that all aspects of the subject's condition were to be considered in the rating, including those unrelated to efficacy of the drug for the condition under study. Thus, for example, in a study of an investigational medication for the treatment of major depressive disorder, these investigators might assign a subject with a drug-related or drug-independent “adverse event” (side effect or physical illness, for example, upset stomach) a lower CGI-I rating than they would a subject with identical improvement in major depressive disorder who did not experience an adverse event. This is a misapplication of the efficacy measure and confounds efficacy with safety. In this example, the FDA and, if marketed, the prescribing physician, would have an inaccurate picture of the actual efficacy of the agent for major depressive disorder. Furthermore, we suspected that many investigators included improvement or worsening of comorbid illnesses—that is, illnesses not under study—in their “global” ratings. For example, in a study to determine whether a given agent is efficacious for the indication, generalized anxiety disorder, such investigators might assign a subject with improvement in a comorbid condition, such as major depressive disorder, a higher generalized anxiety disorder CGI-I rating than they would for a subject with identical generalized anxiety disorder improvement who did not have a comorbid illness that improved. Again, this would be a contamination of the process by which efficacy for the drug for the indication under study is determined. The problem did not seem to be limited to junior investigators. Even very senior CNS “key opinion leaders” with whom we spoke seemed to have widely discrepant views as whether adverse events and non-indication illnesses belonged in CGI ratings.

Given the high rate of failure of many CNS investigational drugs to separate from placebo [15], [16], [17], [18], the reduction of error in the measure of efficacy is a critical clinical trials priority.

The present study was designed (a) to explore empirically whether experienced trials investigators were unclear about the information they were to consider when performing a CGI rating and (b) to explore empirically whether actual CGI ratings are affected by the presence of information unrelated to efficacy of the drug for the disease under study.

Section snippets

Subjects

Potential subjects were 167 principal investigators actively engaged in industry-sponsored CNS clinical trials who had been trained on anxiety or depression efficacy scales by United BioSource Corporation (UBC, Wayne, Pa; formerly PharmaStar—a rater training company) within the past 4 years.

Procedure

Potential principal investigator subjects (PI subjects) were solicited by email for interest in participating in a CGI ratings project for which they would receive compensation. Consenting PI subjects were

Results

Forty-five PI subjects (24 experimental and 21 control) returned the CGI narratives by the assigned deadline; 39 (87%) of the 45 then returned the follow-up survey. Principal investigator subjects were highly experienced CGI raters (overall self-reported mean years' experience conducting anxiety or depression CGI ratings = 11.59 years [SD = 6.67 years], overall self-reported mean number of anxiety or depression CGIs rated in the past 12 months = 349.74 ratings [SD = 521.68]). Mean years'

Discussion

The results of this study suggest that active clinical trials principal investigators are unclear about whether to include in CGI ratings safety information and efficacy information unrelated to the indication under study. Only about half of PI subjects felt certain that they would not include an unrelated psychiatric condition in their CGI-S and CGI-I ratings. Furthermore, the results suggest that the problem may actually change CGI ratings. In our analogue of the actual trials situation,

References (19)

There are more references available in the full text version of this article.

Cited by (57)

  • Internet-based emotion-regulation training added to CBT in adolescents with depressive and anxiety disorders: A pilot randomized controlled trial to examine feasibility, acceptability, and preliminary effectiveness

    2023, Internet Interventions
    Citation Excerpt :

    The CGI-I rates the clinical improvement at the moment of assessment compared to baseline, on a 7-point scale (1 = very much improved to 7 = very much worse). Results on validity of this questionnaire have been contradictory (e.g., Busner et al., 2009; De Beurs et al., 2019; Zaider et al., 2003). However, its use is widely accepted and the questionnaire is one of the most frequently used therapist-assessments in psychiatry (Forkmann et al., 2011).

  • Are non-abstinent reductions in World Health Organization drinking risk level a valid treatment target for alcohol use disorders in adolescents with ADHD?

    2020, Addictive Behaviors Reports
    Citation Excerpt :

    It is a valid and reliable measure of ADHD symptom severity (Bostic, et al., 2000; Prince, et al., 2000). The CGI-I evaluates overall improvement in ADHD symptoms since treatment initiation in comparison to the participant’s baseline, ranging from 1 (very much improved) to 7 (very much worse) (Busner, Targum, & Miller, 2009). ADHD treatment response was defined in the main study as a final CGI-I score of 1 (very much improved) or 2 (much improved) with respect to the participant’s baseline ADHD symptom severity.

View all citing articles on Scopus

Portions of the data were previously presented as posters at the 47th Annual Meeting of the NCDEU, Boca Raton, FL, June 11 to 14, 2007, and the 20th Congress of the European College of Neuropsychopharmacology, October 13 to 17, 2007, Vienna, Austria.

The authors are all affiliated with United BioSource Corporation (Wayne, PA), which provides rater training services. Dr Targum is an equity holder of United BioSource Corporation.

View full text