Introduction

Objective monitoring of consciousness and responsiveness in sedated and critically ill patients in the intensive care unit (ICU) setting is conducted primarily via clinical bedside instruments such as sedation scales [1, 2]. While the use of appropriately validated scales is recommended in guidelines published by the Society of Critical Care Medicine [3], they are not routinely applied [4, 5, 6] and provide only an intermittent assessment of critically ill patients’ brain function. Bispectral index (BIS) represents an emerging technology that could be useful for both intermittent and continuous monitoring of ICU patients. BIS electroencephalography (EEG) monitoring uses a single, easy to apply, foam sensor containing several electrodes that are arranged in a frontal-temporal montage on the patient’s forehead. The sensor connects to a digital signal converter and then to a monitor, which displays both a raw EEG waveform and a numerical BIS value ranging from 0 (deeply unconscious with isoelectric EEG) to 100 (alert).

Processing the raw EEG data into a BIS value could theoretically provide the clinician with an objective assessment of the patient’s consciousness via a continuous display similar to that provided by pulse oximetry for oxygen saturation. While the BIS was originally developed and validated for use in surgical patients to ensure adequate depth of anesthesia-induced hypnosis (i.e., unconsciousness), it is now being studied in some ICU settings to assess “consciousness.” In fact, others have shown variable degrees of correlation between BIS values and different sedation scales in multiple different settings and patient types [7, 8, 9, 10, 11]. Consciousness, however, is not solely defined by level of sedation or arousal/wakefulness. Plum and Posner [12] define consciousness as having another key component—content, which can be partially ascertained by the presence or absence of delirium. A delirious patient’s raw EEG has long been known to demonstrate classic findings of alpha slowing and delta and theta wave intrusion [13, 14]. These raw EEG findings have been shown to be correlated with clinical severity of brain dysfunction regardless of the underlying medical condition [15].

We hypothesized that advances in the BIS algorithm would improve correlation to clinical instruments that have been validated to measure the two primary components of consciousness (clinical level of arousal and delirium). The purpose of this investigation was to determine which components of consciousness are assessed by BIS monitoring, and to compare the performance of the newer and older algorithms (BIS-XP and BIS 3.4, respectively) using the Richmond Agitation-Sedation Scale (RASS) to measure level of arousal and the Confusion Assessment Method for the ICU (CAM-ICU) to assess delirium.

Materials and methods

The study population included 124 mechanically ventilated, adult, medical, and coronary ICU patients admitted to Vanderbilt University’s 641-bed medical center (mean age 56±16 years; Table 1). The institutional review board approved the study, and informed consent was obtained from the patient or surrogate decision maker. During the study period (7 July 2000–5 March 2001) 390 mechanically ventilated patients were admitted to the ICU, of whom 125 (32%) were enrolled and 265 (68%) were excluded. Patients were excluded if they had a history of psychosis or neurological disease (e.g., cerebrovascular accident, n=62, 23%), inability to communicate with assessors (i.e., did not speak or understand English or were deaf, n=9, 3%), admitted to the ICU but extubated before study nurses’ assessment (n=34, 13%), previously enrolled in the study (n=22, 8%), patient or family refusal to participate (n=35, 13%), or died before study nurses’ assessments (n=39, 15%), and admission to the ICU after a predefined cap of six study patients per day had been reached because of research staffing limitations (n=64, 24%). In addition, one enrolled patient was excluded from this report because no BIS data were collected. The population of mechanically ventilated patients in this study had a very high baseline severity of illness as measured by their mean Acute Physiology and Chronic Health Evaluation II [16] scores of 26.9±8.8.

Table 1 Patient characteristics (n=124) (APACHE II Acute Physiology and Chronic Health Evaluation II, RASS Richmond Agitation-Sedation Scale)

While the correlative data between BIS-XP and RASS has been published [17], neither the comparative data between BIS 3.4 vs. BIS-XP presented in this manuscript nor their relationships to either RASS or CAM-ICU assessments have been previously published. Other demographic data from this cohort of patients (not central to the scientific message of this report) have been published as would be expected from prospective cohort investigations that have the capacity to address different issues [17, 18, 19].

Study design, BIS monitoring, and data collection

The Aspect Medical Systems employees helped with study’s conceptual study design (D.J.M.) and statistical analysis (J.C.S.) as well as writing of the manuscript. All of the BIS monitoring, data collection, and data processing were conducted by Vanderbilt University faculty and critical care research nurses without the presence of any Aspect Medical Systems employees. One research nurse recorded events and marked the timing of the clinical assessments while managing a laptop computer to collect the raw EEG and the processed BIS data coming from the A1050 monitor. Another research nurse who was blinded to BIS values throughout the complete study procedure on each patient performed the clinical assessments of consciousness as described below. After a brief skin preparation with dry gauze and isopropyl alcohol, the BIS sensor was applied resulting in a two-channel ipsilateral referential frontal-temporal montage. The sensor was connected to the portable BIS A1050 EEG monitor (Aspect Medical Systems). Impedances were measured to ensure they were 5 KΩ or less. After reaching the target impedance values both study nurses were blinded to the BIS values by covering the screen with an opaque shield. Raw EEG data were sampled at 128 samples/s and recorded continuously in real time; processed variables were downloaded live and recorded to the computer every 5 s. All BIS data were reviewed and analyzed off-line and not shared with the study team until the completion of patient enrollment.

BIS monitoring was divided into three periods: (a) prestimulation, (b) stimulation (during clinical assessment), and (c) poststimulation. Data recorded during the patients’ prestimulation period were chosen a priori for the study analysis and this report because this period of BIS monitoring represents a standardized and stable period of EEG recording that is the most free of artifact. This prestimulation period was the 2-min period immediately preceding the stimulation of clinical assessment but following sensor application and having allowed BIS to acclimate back to a stable baseline. The stimulation period was the entire RASS and CAM-ICU clinical assessment time, which lasted a median of 55 s. The poststimulation period was the 5 min immediately following the completion of the clinical assessment, during which the patient may or may not have returned to the original prestimulation state. Only one patient assessment was performed per patient per day. Sedative and/or analgesic medications were given to 98% of the patients within 24 h of BIS testing.

Differences in the BIS 3.4 and BIS-XP software versions

The comparator algorithms for this investigation were BIS 3.4 and the newer BIS-XP (also called BIS 4.0). BIS-XP was designed in an attempt to provide improved identification and filtering of electro-oculographic (EOG) and electromyographic (EMG) artifact and anomalous EEG patterns such as near-suppression and delta waves during periods of significant non-EEG artifact. A standard three-electrode EEG sensor was used for patients 1–74, and the new Quatro XP sensor that provided a second EEG channel using a fourth (above-eye) electrode was used for patients 75–124. The A1050 monitors used in this study were able to calculate simultaneously both BIS 3.4 and BIS-XP algorithms and their respective BIS scores on all 124 patients with data collected through a single frontal-temporal montage sensor (as described above). In addition to BIS values, several other processed variables derived from the raw EEG signal were recorded. These include suppression ratio (SR), EMG, and signal quality index (SQI), which BIS monitors use to calculate and categorize the quality of individual BIS values (see Table 2 for definitions).

Table 2 Suppression ratio, electromyographic parameter, and signal quality index. All mean values were calculated by the BIS monitor in an identical manner and were therefore equivalent for BIS 3.4 and BIS-XP algorithms; the BIS 3.4 and BIS-XP algorithms process these values differently in calculating the ultimate BIS score. Suppression ratio, used in the algorithm for calculation of the BIS during very deep sedation, is the percentage of time over the previous 60 s that the EEG was suppressed (isoelectric) and is expressed as a percentage (e.g., 10% means that 6 s of the previous 60 s of EEG was suppressed). The electromyographic (EMG) parameter represents the total power measured in the 70–110 Hz frequency range (25–80 dB). EMG provided complementary information to the BIS regarding the degree of concomitant muscle movement that might falsely elevate the BIS. Signal quality index indicates the percentage of time within the previous 60 s that artifact-free raw EEG signal was detected (RASS Richmond Agitation-Sedation Scale)

Clinical assessments of consciousness (RASS and CAM-ICU)

At the time of this investigation no formal protocol to guide analgesia, sedation, or neuromuscular blockade existed in our ICU, and no “target” levels of sedation were routinely identified according to disease state or ventilator settings. The clinical assessment indicators used by the study nurses were the RASS and CAM-ICU. The RASS is a standardized and reliable scale specifically validated in this patient population as a measure of arousal and as a clinical instrument for goal-directed titration of sedatives [17, 20]. This scale has ratings for agitated/combative behavior (RASS +1 to +4), a rating for spontaneously alert patients (RASS 0), ratings for those who respond to verbal stimulations (RASS −1 to −3) and those who either respond only to physical stimulation or those who have no response to any stimulus (RASS −4 or −5). The CAM-ICU is a delirium assessment instrument that takes 1 minute on average, which against reference standard geropsychiatric experts demonstrated a sensitivity of 93–100%, a specificity of 98–100%, and very high interrater reliability (κ=0.96) [18, 21]. The CAM-ICU was positive if patients demonstrated an acute change or fluctuation in the course of mental status (as determined by abnormalities or fluctuations in RASS scores), plus inattention, and either disorganized thinking or an altered level of consciousness. By definition, patients were delirious if they were responsive to verbal stimulation with eye opening (RASS −3 or higher) and were CAM-ICU positive. At the time of BIS testing, 22.7% were completely unarousable (RASS −5), 53.7% were unable to make sustained eye contact (RASS −2 to −4), and 23.6% were alert or able to make sustained eye contact (RASS 0 or –1). The median RASS scores at the time of BIS testing are presented in Table 1. Much more detailed information regarding these instruments may be found at http://www.icudelirium.org.

Statistical analysis

BIS data were not normally distributed and were therefore quantified using median and interquartile ranges (IQR, 25th–75th percentiles). Correlation between BIS and RASS were quantified using Spearman’s rank correlation coefficients; correlations were compared using Fisher’s transformation. The coefficients were squared to obtain R 2 values to allow for comparison with previously published studies of BIS and sedation scales [22, 23]. Generalized estimating equation (GEE) regression models [24] using multinomial distribution for RASS were also used separately for BIS 3.4 and BIS-XP to adjust for repeated measures within a patient to confirm the finding from the Spearman approach. Within-patient differences between BIS 3.4 and BIS-XP at delirium were compared using the sign test. To assess whether BIS 3.4 or BS-XP is independently associated with delirium regardless of RASS level, differences in BIS 3.4 and BIS-XP values between delirious and nondelirious groups were evaluated with the Mann-Whitney U test at each level of RASS. GEE regression models using binomial distribution for delirium (yes/no) were then used to fit BIS 3.4 and BIS-XP separately by controlling for RASS. Interactions between RASS and BIS 3.4 or BIS-XP were assessed by including the interaction term in each model. Statistical significance was defined as p<0.05. Statistical analyses were performed using SPSS statistical software (version 11.5.1, SPSS, Chicago, Ill., USA) and SAS software (version 8.2, SAS Institute, Cary, N.C., USA)

Results

BIS 3.4 and BIS-XP monitoring vs. level of arousal

The correlation coefficients between BIS 3.4 and XP algorithms were similar using the three- and four-lead sensors; therefore for ease in presentation data are presented in aggregate for all 124 patients. Using 484 assessments from the 124 patients, both BIS 3.4 (R 2=0.20, p<0.001) and BIS-XP values (R 2=0.36, p<0.001) were correlated with RASS, and BIS-XP values demonstrated statistically greater correlation with RASS than did BIS 3.4 values (p<0.05; Fig. 1). The results were similar when GEE regression models were used to adjust for repeated measures. BIS 3.4 values were correlated with BIS-XP values with R 2 of 0.81 (p<0.001). Consciousness levels were also grouped according to functional gradations provided by the RASS: (a) when unarousable to verbal stimulation (RASS −4 or −5), median BIS-XP values were 58 (IQR 46–72), (b) when responsive to verbal stimulation but unable to make sustained eye contact (RASS −3 or −2), median BIS-XP values were 72 (IQR 57–85), (c) and when alert and able to make sustained eye contact (RASS 0 or −1), median BIS-XP values were 97 (IQR 85–98).

Fig. 1
figure 1

BIS 3.4 and BIS-XP vs. Richmond Agitation-Sedation Scale (RASS). Horizontal bar Median value; boxes interquartile range (25th–75th); open boxes BIS 3.4 values; gray boxes BIS-XP values; whiskers 5th and 95th percentile values. Spearman’s rank correlation coefficient for RASS vs. BIS 3.4 was R 2=0.20 (p<0.001), and for RASS vs. BIS-XP was R 2=0.36 (p<0. 001). BIS-XP values demonstrated statistically greater correlation with RASS than did BIS 3.4 (p<0.05). A total of 12 observations in nine patients yielded “+” or agitated RASS scores. The median and interquartile ranges for BIS 3.4 and BIS-XP values in these nine patients, all of whom were delirious, were 97 (91–99) and 88 (80–97), respectively

BIS Monitoring vs. delirium

In the 210 observations during delirium the BIS-XP values were significantly lower than BIS 3.4 by a median of 3.5 (0–15, p<0.001). Median BIS-XP values for delirious (n=210) and nondelirious (n=86) were 74 (IQR 57–91) and 96 (IQR 85–98, p<0.001), respectively, while BIS 3.4 values were 91 (IQR 66–97) and 96 (IQR 91–97, p<0.001), respectively. However, when analyzed by RASS levels, the median BIS levels did not differ between delirious and nondelirious observations (Fig. 2). GEE regression models were used to analyze the effect of BIS-XP and BIS 3.4 on delirium controlling for RASS values to determine whether BIS scores had an independent association with delirium, or whether this relationship was dependent on the arousal component of consciousness (i.e., RASS levels). Using this approach, RASS remained highly significant (p<0.001) for both models with BIS 3.4 and BIS-XP, but neither BIS 3.4 (p=0.26) nor BIS-XP (p=0.35) were significantly associated with delirium after controlling for RASS value.

Fig. 2
figure 2

BIS-XP vs. Delirium plotted by arousal level using the Richmond Agitation-Sedation Scale (RASS). Horizontal bar Median value; boxes interquartile range (25th–75th); whiskers 5th and 95th percentile values. BIS-XP values are shown at each RASS level according to whether the patient was deemed delirious (gray boxes) or nondelirious (open boxes). There were no significant differences between groups within RASS levels (see text for data from GEE analysis). Coma patients, at RASS −4 or −5, were excluded because this state was not included in definition of delirium. Note: BIS 3.4 data also showed no significant difference in delirium status according to RASS levels (data not shown)

Suppression ratio, electromyographic parameter, and signal quality index

To understand the relationships between levels of arousal and three other elements of the BIS processing algorithm (i.e., SR, EMG, and SQI) at various RASS scores, these data are shown in Table 2. The levels of suppressed (isoelectric) EEG were under 5% at all RASS levels other than the deepest level of RASS −5. The EMG parameter increased with progressively alert and more normal RASS scores, thus reflecting patient movement. The correlation (R 2) between EMG and BIS 3.4 was 0.66, while it was 0.59 for BIS-XP (p<0.05 for comparison between BIS algorithms). The SQI was inversely proportional to the EMG (r=−0.53, R 2=0.28, p<0.001), reflecting that the algorithm grades the quality of the BIS value as progressively lower at higher EMG activity.

Discussion

This study compared BIS levels to a well validated clinical measure of arousal and is the first investigation to report BIS values compared to a validated measure of delirium. While it was our hypothesis that BIS monitoring might reflect both components of consciousness (arousal and delirium/content of consciousness), we found that BIS was unable to differentiate between delirium status after adjusting for arousal (RASS) scores. As compared to BIS 3.4, the BIS-XP algorithm did show significantly better distinction between BIS values at different RASS levels, although the clinical relevance of this improvement was not tested and is questionable.

Experience with BIS monitoring in the ICU has revealed several problems such as difficulty excluding artifact due to EMG or EOG interference [25, 26], which has precluded widespread application of BIS monitoring in the clinical setting. While BIS-XP and the newer four-lead sensor were developed to improve filtering of EMG, delta wave, and other sources of artifact [27], our data demonstrate a strong correlation with EMG and marked overlap across different RASS levels. These findings underscore the need to document the clinical utility of BIS monitoring through appropriately designed trials prior to application in clinical practice. The real issue is whether the use of BIS leads to better outcomes such as lower drug exposure, increased nursing satisfaction, fewer unplanned extubations or loss of vascular access, shorter or less costly ICU length of stay, less recall of discomfort, and more intact long-term neuropsychological function. Our study did not attempt to answer these important questions. While preliminary studies have used the BIS to help guide sedation [28], there remain no randomized, controlled trials documenting improved outcomes using such an approach.

Recent investigations using various study designs have explored the strengths and weaknesses of the BIS as a continuous monitor of brain function in critically ill patients. Billard et al. [29] concluded that the BIS can be used as a measure of sedative drugs’ effects on EEG. Two other recent studies concluded that BIS values are correlated with subjective sedation scales in patients receiving moderate to deep sedation [23, 26]. Another study indicated that the use of “BIS guidelines” for sedative titration in patients on neuromuscular blockade resulted in less sedative drug use, a reduction in average drug costs by US $150 per patient, and a four-fold lower incidence of unpleasant recall [30]. The use of BIS has also demonstrated value in optimizing comfort during terminal weaning support [31]. Lastly, BIS has been shown to aid in the assessment of brain death in severely comatose patients [32].

In contrast to these generally positive reports, others have suggested that BIS is not sufficiently reliable for general use [22]. DeDeyne et al. [25] found that at deep sedation patients had variable BIS values that ranged from 15 to 65, which the authors suggested could be due to limitations in either their sedation scale or the BIS itself. Most recently Vivien et al. [33] reported “overestimation” of BIS values due to EMG contamination. Using BIS software version 2.1 (n=45) and XP (n=16), they reported a strong correlation (R 2=0.61, p<0.001) between BIS and EMG levels in deeply sedated patients (very similar to our data). In addition, they found that both BIS 2.1 and BIS-XP values decreased following administration of a muscle relaxant. Our data, which provide the largest study to date using the newest BIS-XP algorithm, confirm that increasing muscle movement continues to be correlated inversely with the quality of the EEG signal (i.e., SQI).

In addition to the previously mentioned limitations of this study, several other issues deserve consideration. Statistically speaking, there are limitations in the ability to assess the relationships that we have set out to measure and understand. Most importantly, in the absence of a comprehensive multi-lead EEG there is not a “gold standard” against which we can measure the BIS. Due to the robust nature of the validity and reliability data in support of the CAM-ICU and scales such as the Sedation Agitation Scale [34, 35] and the RASS, we chose to use these functional measures of consciousness as the comparators for the BIS monitor in our study. This methodological design is certainly not without potential criticism, and one might argue that the BIS could actually be better than these clinical instruments at determining brain function. In addition, we have deliberately not taken into account the doses of sedatives and analgesics that these patients received because of the fact that psychoactive drug administration would be colinear with BIS levels. Lastly, we did not evaluate the role of metabolic, structural, or degenerative abnormalities on BIS values and their correlation with measures of consciousness. For example, baseline dementia present in ICU patients [36, 37] could alter BIS values and the correlation between BIS and clinical monitoring instruments.

In conclusion, the BIS-XP algorithm demonstrated higher correlations with level of arousal than did BIS 3.4, yet clinically significant variation within RASS levels and overlap across RASS levels persisted using the BIS-XP algorithm. In addition, after controlling for level of arousal, neither BIS algorithm distinguished between the presence and absence of delirium. Considering that the outcome of brain function following critical illness is arguably the most important determinant of quality of life among survivors [38, 39], we must improve neurological monitoring for ICU patients. The BIS monitor is being improved via newer versions of its processing algorithm, yet further advances in screening artifact will be required to avoid overlap across clinical levels of arousal and to allow differentiation between delirious and nondelirious patients.