Pupil dilation (i.e., pupillometry) is commonly used as an index of cognitive load (e.g., Beatty, 1982; Kahneman & Beatty, 1966) and arousal (e.g., Bradley, Miccoli, Escrig, & Lang, 2008; Cohen, Moyal, & Henik, 2015; Murphy, Robertson, Balsters, & O’Connell, 2011). Common eyetracking devices provide a continuous measure of pupil size. These devices are highly accurate in measuring pupil size and provide an output consisting of pupil size at each point in time, depending on the sampling rate. Importantly, during blinks, the eyetracking device loses track of the pupil, resulting in missing values in the output file (Fig. 1a). The detection of blinks in pupillometry data is highly important because their occurrence is not random, but depends on the task characteristics (Stern, Walrath, & Goldstein, 1984), such as cognitive load (Fukuda, 2001). As a result, eye blinks that occur during pupillometry recordings could lead to artifacts, which, in turn, could lead to wrong conclusions. The present work offers a novel algorithm for blink detection that is based on noise produced by an error measurement that characterizes the commonly used eyetracking devices.

Fig. 1
figure 1

a The pattern of recorded pupil sizes during an eye blink: The beginning and ending of the missing-sample time window are characterized by sharp slopes (an artifact that is created by the closing and opening of the eyelid). b Typical linear interpolation of an eye blink. The red line represents the correction. In this approach, the last valid sample (before the start of the blink) and the next valid sample (after the end of the blink) are connected by a line

Common methods to deal with blinks

There are several common solutions for dealing with blinks in pupillometry data. The most stringent solution is to remove trials with blinks (Weiskrantz, Cowey, & Barbur, 1999). This solution results in relatively high data loss and therefore lowers statistical power. Most researchers, however, do not remove trials with missing samples, but instead replace the missing samples with other values, usually using interpolation algorithms (e.g., linear interpolation; Sirois & Jackson, 2012). For example, in linear interpolation, the last valid sample (before the first missing value) and the next valid sample (after the last missing value) are connected using a line (Fig. 1b). The assumption underlying this approach is that the appearance of the blinks is immediate. Namely, the first missing value pinpoints the moment in time in which the blink started, and the reoccurrence of values pinpoints the end of the blink. However, because eyelid movement is not instantaneous, blink-related data start a bit before the first missing sample and end a bit after the last missing sample. Accordingly, blinks are characterized by a sharp decrement in pupil size during the start of the blink (Caffier, Erdmann, & Ullsperger, 2003). Namely, although the eyelid covers the eyeball rapidly, the eyetracker’s output shows a reduction in pupil size when there is no real change in pupil size (but simply an artifact created by the closing of the eyelid). As the blink ends and the eyelid opens up, a larger surface of the pupil is uncovered. This, in turn, is recorded by the eyetracker as a rapid growing of pupil size (Fig. 1a represents the pattern of recorded pupil size during an eye blink). When using interpolation procedures on the missing samples, these decrements and increments of pupil size are not removed from the data, and thus could create artificial differences in pupil size between conditions.

Another common approach used to treat blinks is to take a wider window for the interpolation, such that the interpolation starts and ends around 100–200 ms before the first and after the last missing samples (de Gee, Knapen, & Donner, 2014; Nyström, Hansen, Andersson, & Hooge, 2016; Satterthwaite et al., 2007; van Rijn et al., 2012). By using this approach, researchers define an arbitrary window of samples for the interpolation that contains a constant number of samples both before and after the missing-sample window. This method is better than the aforementioned one, because it removes the artifactual samples created by the eyelid movement and thus prevents effects resulting from the eyelid signal. On the other hand, this approach causes a loss of samples that may contain valuable information. Biological changes in pupil size have a latency of no less than 50 ms (Charman & Heron, 1988; Gray, Winn, & Gilmartin, 1993). Hence, removing a 100- to 200-ms window of data from each side of the missing-sample time window (a total of 200–400 ms of data for each blink) could remove real effects that appear in the data. Furthermore, some of the missing-sample periods do not result from blinks but from an inability of the device to detect the pupil (e.g., due to head movements). These periods, which can be very short (a few milliseconds) or very long (if the subject completely moves his or her head and no longer looks at the screen) are not characterized by sharp slopes (because there is no closing of the eyelids); therefore, removing 100–200 ms before and after these missing-sample periods excludes valid data.

Recently, a third algorithm based on the sharp slopes before and after the blinks was developed in order to get more accurate detection of blink boundaries (Mathôt, 2013; Piquado, Isaacowitz, & Wingfield, 2010; Stampe, 1993). This approach focuses on the velocity of change in pupil dilation. Specifically, a standardized score of the velocity of the pupil change is calculated across all the recorded samples, and extreme samples (i.e., those that exceed a predetermined threshold) mark the onset and offset of the blink (for more details, see Mathôt, 2013). Although this approach makes an important step toward the accurate detection of blink onset and offset, it suffers from two major limitations. First, it assumes that all blinks are characterized by a similar pattern. However, eye blinks show different patterns even within the same data set (Caffier et al., 2003). Second, computation of the standardized velocity scores is done before the removal of blinks from the data, and thus is affected by the eyelid signal. The inclusion of the eyelid signal increases the measured variance, which in turn affects the threshold (i.e., a larger variance produces a more extreme threshold). Moreover, setting the threshold could be challenging, since it should minimize false detections, on the one hand, and misdetections, on the other. Figures 2a and b present a case in which the threshold is set relatively low (e.g., Z = 2.5). In this case, the chosen threshold could cause a false detection of blinks. In contrast, Fig. 2 c and d present a case in which the threshold is set relatively high (e.g., Z = 4), which may lead to misses.

Fig. 2
figure 2

a and b A false alarm that results from a low threshold (Z = 2.5) used in the velocity-based algorithm. a The blue curve represents pupil size as a function of time, and the red dot represents wrong detection of a blink onset. b The blue curve represents the difference score in pupil size (the difference between consecutive samples of pupil size) as a function of time, and the dashed lines represent the threshold (difference score = 18.7 arbitrary units). In this case, the threshold is too stringent, and valid samples (see arrow) are detected as blinks. c and d Misdetection of blink offset caused by the use of a relatively high threshold (Z = 4) in the velocity-based algorithm. c The blue curve represents pupil size as a function of time, the left red dot represents a right detection of blink onset, the right red dot represents a wrong detection of blink offset, and the green dot represents the real blink offset. d The blue curve represents the difference score in pupil size (the difference between consecutive samples of pupil size) as a function of time, and the dashed lines represent the threshold (difference score = 29.91 arbitrary units). In this case, the threshold is not stringent enough, and it prevents the required samples from being found (see the arrows)

The present study

In the present work, we present a new blink detection algorithm. Our algorithm is based on accurate detection of the sharp slope that precedes and follows the missing-sample time window, and thus avoids some of the limitations of the previously mentioned methods. The recorded pupillometry signal is characterized by small-amplitude, high-frequency fluctuations (above 20 Hz). These fluctuations do not result from physiological processes, which are characterized by relatively slow frequencies (Nowak, Hachoł, & Kasprzak, 2008), but from measurement error created by the eyetracking device. Specifically, these fluctuations result from noise that is inherent in the image-processing algorithms of commonly used eyetracking devices and the data from the camera sensor that they receive. This observational error (or measurement noise) is eliminated when the eyelids shut and reopen, due to the strong signal recorded during these rapid movements of the eyelid. By identifying the moments in time at which the measurement noise ends and reemerges, we can detect blink onset and offset (green dots in Fig. 3a) more accurately.

Fig. 3
figure 3

a Typical eye blink (sampling rate = 500 Hz). The green dots represent blink onset and offset as detected by our algorithm. The red dots represent the last valid sample (before the first missing value) and the next valid sample (after the last missing value). The shaded area represents samples without values. Detection of blink onset before the real blink onset, or detection of blink offset after the real blink offset, provides a positive value. Detection of blink onset after the real blink onset, or detection of blink offset before the real blink offset, provides a negative value. (b) Zoom in on a typical eye-blink onset. The blue curve represents the recorded pupil size (please note that missing values follow the end of the blue curve), the black curve represents the smoothed pupil size, and the green dot represents blink onset as detected by our algorithm. Smoothing of the data increases the difference between the measurement noise and the eyelid signal

Method

Noise-based algorithm

To detect blink onset and offset, our algorithm first detects missing values in the data (yellow area in Fig. 3a). We then define the candidate samples for blink onset and offset as the last valid sample before the start of the missing-sample time window and the first valid sample after the end of the missing-sample time window (red dots in Fig. 3a). Next, we smooth the pupillometry data using smoothing of a 10-ms window,Footnote 1 to increase the difference between the measurement noise and the eyelid signal (see Fig. 3b). Then we check the difference between each pair of consecutive samples, starting from the candidate sample (n) and moving backward (to the previous sample n – 1) for blink onset, and moving forward (to the following sample n + 1) for blink offset, until we detect a change in the monotonic pattern. These changes in the pattern reflect the end of the eyelid signal and the start of the measurement noise (Fig. 3b). See Table 1 for an example of data for one blink onset, and the Appendix for the algorithm’s pseudocode. In addition, the algorithm’s code and example files for processing multiple eye blinks are freely available for download (https://osf.io/jyz43).

Table 1 Example of blink onset detection with our algorithm

Edge cases

Two edge cases (i.e., problems occurring outside of normal operating conditions) should be taken into account when using this noise-based approach. One of them is the case in which the recording starts or ends with missing values (i.e., at the beginning or end of the recording, there is an eye blink or the eyetracker cannot detect the pupil; for an example, see Fig. 4a). In this case, blink onset will be defined as the first missing value after the eyetracker starts recording (when the recording starts with missing samples), and blink offset will be defined as the last missing sample (when the recording ends with missing samples). Another edge case concerns the situation in which several missing-sample windows appear consecutively (for an example, see Fig. 4b). In this case, we suggest concatenating the missing-sample windows. By using this approach, the blink will be defined by the first onset (of the first set of missing values) and by the last offset (of the last set of missing values). The maximal gap between two sequential sets of missing values can be defined by the user. We recommend using a gap of 100 ms, on the basis of previous studies (Slagter, Davidson, & Tomer, 2010; Slagter, Georgopoulou, & Frank, 2015) that have shown that the difference between two sequential eye blinks is usually longer than 100 ms.

Fig. 4
figure 4

Examples of edge cases. The shaded areas represent the time window that defines the blink by our algorithm, and the green dots represent blink onset or offset. a A case in which the recording starts with missing values. In this case, blink onset will be defined as the first missing value after the eyetracker starts recording. b A case in which several missing-sample windows appear consecutively. In this case, we concatenate the missing-sample windows. Namely, the blink will be defined by the first onset (of the first set of missing values) and by the last offset (of the second set of missing values)

Testing our algorithm

Our data were taken from Cohen et al.’s (2017) study. In their study, Cohen et al. used an MR-compatible eyetracker (EyeLink 1000, SR Research, Ontario, Canada) with a sampling rate of 500 Hz (2-ms intersampling time) to collect pupil size data. Stimulus presentation and data acquisition were controlled by EyeLink software (Experiment Builder). The data we analyzed are taken from one participant (male, 27 years old), who was requested to keep his eyes fixated on the center of the screen and to avoid eye movements for the entire task. The task consisted of the presentation of brief movie clips (for more details, see Cohen et al., 2017). Pupil area was determined using EyeLink’s “centroid” algorithm. The task included 160 trials, each lasting around 21 s (SD = 1.66). The data set included 300 eye blinks.

Data analysis

To test our algorithm, we compared its results, as well as those of the three other algorithms mentioned in the introduction, to the mean results across two independent human judges who manually marked blink onsets and offsets. Specifically, we compared human detection of blinks to four algorithms:

  1. 1.

    Our noise algorithm

  2. 2.

    A velocity algorithm

  3. 3.

    A naïve algorithm (that defines blink onset and offset as the last valid sample before/after the missing-sample time window)

  4. 4.

    A naïve ±200-ms algorithm (that increases the missing-sample time window by 200 ms at each side, resulting in the removal of 400 valid samples).

Instructions to human judges

The optimal onsets and offsets of blinks in the data were defined by two human judges. The two human judges received a figure depicting the recorded raw pupil time course across the entire task (with a sampling rate of 500 Hz). They were asked to mark blinks in the data according to examples given to them at the beginning of the task. Specifically, they were asked to mark the points on the graph at which the change in pupil size likely resulted from an eye blink.

Comparing the algorithms to human detection

Errors were measured as the difference in milliseconds between the algorithm’s onset/offset and the observer’s onset/offset. We computed the average error in milliseconds across blinks. Negative values represent detections after blink onset and before blink offset. Positive values represent detections before blink onset or after blink offset (Fig. 3a). To compare the effectiveness of the different algorithms, we used a Bayesian one-sample t test (we set the Cauchy prior width to its JASP (JASP Team, 2017) default r = .707). In our analysis, the null hypothesis meant that there was no difference between the tested algorithm and human judgment. The Bayesian analysis provided a Bayes factor (Kass & Raftery, 1995) for each algorithm (see Table 2). These Bayes factors quantify the evidence in favor of the alternative over the null hypothesis. In our case, these values helped determine which of the tested algorithms should be preferred.

Table 2 Descriptive statistics and Bayes factors for each algorithm

Data exclusion

We excluded from our analysis missing-sample periods that started immediately after another missing-sample period. We also excluded blinks in which the two human raters differed from one another by more than 4 ms regarding their detection of blink onset/offset. Our final sample included 148 blinks; mean blink duration = 247.66 ms, SD = 42.71. This is compatible with previous findings, in which blink length ranged from 100 to 400 ms (Schiffman & Richard, 1990).

Results

The average differences between the detections of our noise-based algorithm for blink onset and offset versus the detections of the human judges were 1.77 ± 5.96 and 3.36 ± 18.04 ms, respectively. This means that our mean algorithm error for an arbitrary blink was about 5 ms of excess evaluation. Namely, we removed five valid milliseconds that were not part of the blink. Comparing these results to the results of the other algorithms (Table 2) shows that our algorithm is more accurate than the others.

In addition to mean comparisons, we conducted a series of Bayesian one-sample t tests, comparing the onset and offset of each algorithm to zero. Bayes factors (BFs) were calculated using JASP with the default Cauchy prior. Table 2 presents the BFs for each algorithm tested. The Cohen’s ds of our algorithm for blink onset and offset were 0.3 and 0.19, respectively. The results show that all algorithms except noise-based offset detection were significantly different from the human judges’ detections. The Bayes factor for noise-based offset detection indicates that there is no evidence for a difference between the algorithm’s error and 0. These results suggest that our noise-based algorithm should be preferred over the other algorithms. In addition, comparisons of all algorithms show that our algorithm is better than the others by a factor of between 106 and 10154 for blink onset detection, and between 1085 and 10107 for blink offset detection.

Discussion

In the present article we have introduced a novel algorithm for blink detection. The algorithm is based on noise produced by error measurement that characterizes the commonly used eyetracking devices. Our algorithm successfully predicted blink onset and offset as assessed by comparison to human detection. Furthermore, the currently used algorithms resulted in either loss of data or the inclusion of artifactual signals (resulting from eyelid movement), whereas our algorithm provided more precise detections (see the example in the Appendix, supporting information).

We hope that the algorithm provided in the present article will be implemented in pupillometry studies and therefore help researchers avoid artifacts in their data. In addition, our algorithm could be used in studies of blinks, such as studies assessing startle response (e.g., Graham, 1975). Specifically, in addition to the detection of blink onset and offset, our algorithm can be used to assess blink length and eye-blink rate (Monster, Chan, & O’Connor, 1978), a measure known to be correlated with dopamine level (Jongkees & Colzato, 2016; Karson, 1983). Moreover, our algorithm can provide useful information for studies that wish to control for the appearance of blinks in their analysis (as is commonly done in electroencephalogram and functional magnetic resonance imaging studies; Merritt, Schnyders, Patel, Basner, & O’Neill, 2004; Satterthwaite et al., 2007). Our algorithm (see the Appendix) is very easy to implement in various programming languages (e.g., MATLAB, R, and Python), and by using it in pupillometry preprocessing or to detect blink-related information, researchers can gain a better understanding of their behavioral and physiological results.

Although our algorithm yielded impressive results as compared to the commonly used algorithms, it has several limitations. Our algorithm is based on the inherent noise that characterizes eyetracker devices. We have verified the appearance of this noise in two eyetrackers [EyeLink (SR Research, Ontario, Canada) and EyeTribe (The Eye Tribe Aps, Copenhagen, Denmark)]. Studies using a different eyetracker should take into account that a different noise pattern might be less optimal for our algorithm. However, due to the technical features that characterize the currently used eyetrackers (i.e., sensitivity of the camera sensor and the image-processing algorithms), we believe that most eyetrackers used these days will show a similar noise pattern. Future studies that characterize the pattern of this noise and its boundaries could improve the blink detection algorithm and make it compatible with any eyetracker.

To summarize, the present study presents a new method for blink detection in pupillometry data. This method, which uses the measurement noise that characterizes common eyetrackers, provides better detection of blink onset and offset than do the currently used methods. By using this algorithm, researchers can avoid blink-related artifacts in their data and thus improve their assessment of pupil size.

Supporting information

To illustrate the importance of accurate blink detection in the analysis of pupillometry data, we present the pupil size time course after preprocessing that included removal of blinks using the different algorithms. The pupil size data were taken from a study that included 19 participants (for more details, see Exp. 1 in Cohen et al., 2017). Figure 5 presents the results from the four detection approaches. The figure illustrates several important issues. First, Fig. 5a and b, which represent the naïve approach and the velocity approach, respectively, indicate a relatively sharp decrease in pupil size about 2 s after the beginning of the trial (shaded areas in Fig. 5). This reduction in pupil size may be interpreted as a light reflex for a naïve observer. Figures 5c and d, which represent the naïve ±200 ms approach and the noise approach, respectively, do not show a similar reduction in pupil size, meaning that this reduction probably resulted from a tendency to blink during this time period.

Fig. 5
figure 5

Influence of blink correction as function of the blink detection approach. (a) The naïve approach. (b) The velocity-based approach. (c) The naïve ±200-ms approach. (d) The noise-based approach. The two curves represent the relative pupil size as compared to the first sample of each trial, as a function of time; the blue curve represents the trials in which the participants were asked to look at a presented video, and the red curve represents the trials in which the participants were asked to look at and remember what they saw on the presented video. The shaded areas represent the region of interest of the mentioned difference between the algorithms