Fixation durations in scene viewing: Modeling the effects of local image features, oculomotor parameters, and task

Nuthmann, Antje

doi:10.3758/s13423-016-1124-4

Fixation durations in scene viewing: Modeling the effects of local image features, oculomotor parameters, and task

Theoretical Review
Open access
Published: 01 August 2016

Volume 24, pages 370–392, (2017)
Cite this article

Download PDF

You have full access to this open access article

Psychonomic Bulletin & Review Aims and scope Submit manuscript

Fixation durations in scene viewing: Modeling the effects of local image features, oculomotor parameters, and task

Download PDF

Antje Nuthmann ORCID: orcid.org/0000-0003-3338-3434¹

5510 Accesses
59 Citations
Explore all metrics

Abstract

Scene perception requires the orchestration of image- and task-related processes with oculomotor constraints. The present study was designed to investigate how these factors influence how long the eyes remain fixated on a given location. Linear mixed models (LMMs) were used to test whether local image statistics (including luminance, luminance contrast, edge density, visual clutter, and the number of homogeneous segments), calculated for 1° circular regions around fixation locations, modulate fixation durations, and how these effects depend on task-related control. Fixation durations and locations were recorded from 72 participants, each viewing 135 scenes under three different viewing instructions (memorization, preference judgment, and search). Along with the image-related predictors, the LMMs simultaneously considered a number of oculomotor and spatiotemporal covariates, including the amplitudes of the previous and next saccades, and viewing time. As a key finding, the local image features around the current fixation predicted this fixation’s duration. For instance, greater luminance was associated with shorter fixation durations. Such immediacy effects were found for all three viewing tasks. Moreover, in the memorization and preference tasks, some evidence for successor effects emerged, such that some image characteristics of the upcoming location influenced how long the eyes stayed at the current location. In contrast, in the search task, scene processing was not distributed across fixation durations within the visual span. The LMM-based framework of analysis, applied to the control of fixation durations in scenes, suggests important constraints for models of scene perception and search, and for visual attention in general.

Scenes, Saliency Maps and Scanpaths

Salience-based object prioritization during active viewing of naturalistic scenes in young and older adults

Article Open access 16 December 2020

A Computational Dual-Process Model of Fixation-Duration Control in Natural Scene Viewing

Article Open access 01 September 2021

Introduction

Human vision during natural scene perception is an active process whereby observers selectively seek out information in the visual environment relevant to perceptual, cognitive, or behavioral goals (Findlay & Gilchrist, 2003). High-quality visual information is acquired only from the foveal region of the visual field (central ~2°). Therefore, we move our eyes about three times each second via rapid eye movements (saccades) to reorient the fovea around the scene. Between saccades, gaze position is relatively stable, and during these periods of fixation, visual information is acquired (for reviews, see Henderson, 2003; Rayner, 2009). During natural scene perception, the visuo-oculomotor system is required to make spatial decisions regarding the target location for the next saccade (i.e., the “where” decision), as well as temporal decisions regarding the time at which to terminate the current fixation (i.e., the “when” decision). The present article is concerned with the factors that influence the “when” decisions about fixation durations. Specifically, I introduce a linear mixed modeling (LMM) approach, which simultaneously considers various low-level, mid-level, and higher-level local image features, along with a number of oculomotor and spatiotemporal covariates that may affect fixation durations in real-world scene perception and search. As a second issue, I investigate how these influences depend on task-related control.

A majority of the research on eye movements during scene perception and search has focused on the “where” decision. The dominant theoretical and computational framework in the literature has been image salience, in which low-level image properties play a crucial role in guiding attention and the eyes (Borji & Itti, 2013; Tatler, Hayhoe, Land, & Ballard, 2011, for reviews). These models incorporate the concept of a bottom-up salience map (in differing implementations), with or without top-down control (e.g., Itti & Koch, 2000; Navalpakkam & Itti, 2005; Torralba, Oliva, Castelhano, & Henderson, 2006; Zelinsky, 2008). The scope of these models is to predict fixation locations (where), but not fixation durations (when). With regard to the “when” decision, the CRISP model is the first theoretical approach and computational model that was developed to account for variations in fixation durations during scene viewing (Nuthmann, Smith, Engbert, & Henderson, 2010). A key assumption of the CRISP model is that moment-to-moment difficulties in visual and cognitive processing can immediately inhibit (i.e., delay) saccade initiation, leading to longer fixation durations.

Empirical studies on the “where” decision have addressed the question of which image characteristics predict where people fixate when viewing natural images (e.g., Baddeley & Tatler, 2006; Mannan, Ruddock, & Wooding, 1996; Reinagel & Zador, 1999; Tatler, Baddeley, & Gilchrist, 2005). Nuthmann and Einhäuser (2015) combined a scene-patch analysis with generalized linear mixed models (GLMMs). Using this method, the authors estimated the unique contributions of various image features to fixation selection: luminance and luminance contrast (low-level features), edge density (a mid-level feature), and visual clutter and image segmentation, to approximate local object density in the scene (higher-level features). The GLMM results revealed that edge density, clutter, and the number of homogeneous segments in a patch can independently predict whether or not image patches are fixated. Importantly, neither luminance nor contrast had an independent effect above and beyond what could be accounted for by the other image features (Nuthmann & Einhäuser, 2015).

“When” decision about fixation duration

More recently, interest has been growing in the oculomotor decision of when to move the eyes during scene viewing (e.g., Glaholt & Reingold, 2012; Henderson & Pierce, 2008; Nuthmann et al., 2010; Pannasch, Schulz, & Velichkovsky, 2011). The underlying idea is that fixation durations in visual-cognitive tasks vary with processing difficulty (Rayner, 1998). In line with this general assumption, fixation durations during scene viewing have been shown to globally adjust to overall processing difficulty. Importantly for the present study, image-wide degradations of low-level features have been shown to prolong fixations. In one set of studies, image features were manipulated throughout the entire viewing period of the scene, and fixation durations were prolonged when the overall luminance of the scene was reduced (see below) or when color was removed (Ho-Phuoc, Guyader, Landragin, & Guerin-Dugue, 2012; Nuthmann & Malcolm, 2016). Fixation durations also increased when high-spatial-frequency information was removed through low-pass filtering (Mannan, Ruddock, & Wooding, 1995), or when higher-order scene statistics, including objects, were removed (Kaspar & König, 2011; Walshe & Nuthmann, 2015).

In addition, studies using gaze-contingent display-change paradigms have tested the direct-control hypothesis, which states that the processing of the scene stimulus currently in view produces an immediate fixation-by-fixation adjustment of the timing of the saccade that terminates the fixation (Rayner & Reingold, 2015, for a review focusing on reading). The scene-onset delay (SOD) paradigm (Henderson & Pierce, 2008; Henderson & Smith, 2009; Luke, Nuthmann, & Henderson, 2013; Shioiri, 1993) offers the most straightforward approach for demonstrating that the information extracted during a fixation impacts the timing of the saccade terminating that fixation. At the beginning of a critical fixation, a visual mask is presented, which delays the onset of the scene. The duration of the delay is varied. The scene is then presented normally until the observer looks at another scene region. The underlying rationale is that stimulus processing can only begin after the visual features of the stimulus have become available. Indeed, SOD studies have consistently revealed populations of fixations that increased in duration as the delay increased, suggesting that the durations were controlled directly and in real time by the current scene image. Simulations with the CRISP model substantiated that for these fixations, the initiation of a new saccade program is delayed due to the stimulus’s unavailability at the beginning of a fixation, resulting in an increase in fixation durations (Nuthmann & Henderson, 2012; Nuthmann et al., 2010). Further evidence in support of direct control has been provided by the fixation-contingent scene quality paradigm, in which the quality of the scene is manipulated during the entire duration of selected critical fixations (Glaholt, Rayner, & Reingold, 2013; Henderson, Nuthmann, & Luke, 2013; Henderson, Olejarczyk, Luke, & Schmidt, 2014; Walshe & Nuthmann, 2014a). In these studies, image-wide feature modifications have been used as a means to degrade or enhance the scene stimulus. The durations of the critical fixations were immediately affected by reductions in luminance (see below) or by filtering high or low spatial frequencies (Glaholt et al., 2013; Henderson et al., 2014). Collectively, these findings lend support to the notion that fixation durations are, at least partially, under the direct moment-to-moment control of the current visual stimulus.

All these experiments have in common that the entire scene was manipulated, to vary global scene processing difficulty. The present work extends this line of research by investigating local effects of image features on fixation durations under different task instructions. Specifically, the present study combines a corpus analysis approach with an experimental manipulation. The aim of the study was to collect a large corpus of eye movements from a large number of participants (N = 72) viewing a large number of scenes (N = 135). In addition, the observers’ viewing task (scene memorization, preference judgment, or scene search) was manipulated as part of the study design. This was done to investigate how the control of fixation durations depends on cognitive top-down influences in addition to a putative role of bottom-up image features.

With regard to local image features, the corpus analyses considered the sets of low-level, mid-level, and higher-level visual image features used in a related study on fixation selection in scenes (Nuthmann & Einhäuser, 2015). For a particular image and/or fixation location, different features tend to be correlated (Baddeley & Tatler, 2006). Although feature dependencies can be a consequence of the hierarchical definition of features, they oftentimes arise from the structural properties of natural scenes (Nuthmann & Einhäuser, 2015). To deal with feature dependencies, I used an LMM-based statistical control approach to assess each feature’s unique contribution to fixation duration. The main focus was on testing whether local image statistics exert immediacy effects on fixation durations in scene viewing. For example, does the luminance in a limited spatial region around the current fixation modulate this fixation’s duration? In addition, the analyses focused on whether scene processing is distributed across fixation durations within the visual span, an idea first proposed in research on eye movements in reading (e.g., Engbert, Nuthmann, Richter, & Kliegl, 2005; Kliegl, Nuthmann, & Engbert, 2006; Schad, Nuthmann, & Engbert, 2010). This approach implied testing whether the duration of the current fixation also reflected the processing demands of the previous and next fixation locations. Along with the image-related predictors, the LMMs simultaneously considered a number of oculomotor and spatiotemporal covariates. Separate models were built for the three different viewing tasks. In the remainder of this introduction, I will introduce the variables that are part of the analysis framework in more detail. Where relevant, the results from reading studies will be presented along with findings from scene-viewing studies.

Viewing task

Task effects have provided compelling demonstrations of the cognitive top-down influences on eye movements in scene viewing (Yarbus, 1967). On the basis of a subset of the present data (36 participants, two tasks), we previously reported longer fixation durations in a memorization task that probed scene memory, as compared with an object-in-scene search task (Nuthmann et al., 2010). This global effect of viewing task on fixation durations was modeled with the CRISP model (Nuthmann et al., 2010), with task-specific influences being realized by different parameter settings. Castelhano et al. (2009) compared a memorization task probing memory for objects in scenes with a search task in which participants were asked to locate a specified object in the scene. There were no differences in individual fixation durations between the two experimenter-directed task manipulations. However, longer gaze durations were observed on objects in the scenes during memorization than during search. In a study by Mills et al. (2011), participants completed one of four tasks (memory, pleasantness, search, or free view) under general viewing instructions that were participant-directed (i.e., the task instructions established general goals of viewing and left the participants free to translate them). The task set biased the timing of fixations, such that fixation durations were generally longer for free view and memory than for search and pleasantness judgment.

Image features

For every image location that observers sampled with their eye fixations, five local image-based indexes of processing difficulty were obtained. First, three common measures of local image statistics that characterize different properties of image luminance were examined: luminance, luminance contrast, and edge density. In addition, the effects of the processing load induced by the two more complex, higher-level image-based measures were evaluated. Specifically, the feature congestion measure of visual clutter (Rosenholtz, Li, & Nakano, 2007) was included as a surrogate measure for objects, and synergistic image segmentation (Christoudias, Georgescu, & Meer, 2002) as an approximation of local object density in the scene. A few studies have considered the association between fixation duration and individual measures, using experimental or correlational methods.

Luminance

It has been shown that reducing the luminance of the entire scene leads to longer fixation durations (Henderson et al., 2013; Loftus, 1985; Loftus, Kaufman, Nishimoto, & Ruthruff, 1992; Walshe & Nuthmann, 2014a). For example, in the Henderson et al. study, participants freely viewed scenes at three levels of luminance (100 %, 80 %, and 60 %) in preparation for a later memory test. In a first experiment, each scene was presented at one of the luminance levels for the entire trial, and fixation durations linearly increased as luminance decreased. Thus, fixation durations were globally slowed when scene processing became more difficult. In two additional experiments, scenes were reduced in luminance during saccades ending in critical fixations. The duration of these critical fixations was immediately affected by the reduction in scene luminance, with increasing durations for decreasing luminance. Walshe and Nuthmann (2014a) replicated and extended these results, and then modeled the key findings with a variant of the CRISP model (Walshe & Nuthmann, 2014b).

Luminance contrast

Einhäuser and König (2003) had five participants view outdoor scenes without any visible manmade objects; no task-specific instructions were given. The duration of fixations was correlated with neither contrast nor experimental contrast modifications.

Clutter

Clutter is an image-based feature of visual complexity, which has been studied mostly in the context of a search task. Rosenholtz et al. (2007) operationalized clutter using three image-based measures: feature congestion, sub-band entropy, and edge density (see below for details). With regard to fixation durations, it may be expected that a more cluttered scene would lead to longer average fixation durations. Henderson et al. (2009) tested this hypothesis by reanalyzing data from a difficult scene search task. Fixation durations were influenced by global scene clutter within the first second of search (significant correlations with all three measures of scene clutter), but not by the local clutter surrounding the current fixation location (square regions 1° or 3.3° in size), a counterintuitive finding.

Distributed-processing assumption: Lag and successor effects

Evidence that observers are able to process parafoveal information during scene viewing has been provided by visual-span studies. The visual span (also referred to as the perceptual span) is defined as the area of the visual field from which useful information can be acquired during a given eye fixation (see Rayner, 2009, 2014, for reviews). The size of the visual span can be measured using the gaze-contingent moving-window paradigm (McConkie & Rayner, 1975). The general logic is to reduce the size of the window to find the smallest window that still supports normal scene-viewing behaviors. The size of the visual span in scene viewing is large, encompassing up to half of the total scene (scene search: Nuthmann, 2013; scene memorization: Saida & Ikeda, 1979). For object-in-scene search,^{Footnote 1} the visual span corresponded to 8° in each direction from fixation (Nuthmann, 2013). When the radius of the high-resolution moving window was smaller than 5°–6° (fixation-duration-based visual span size), the fixation durations systematically increased. Conversely, we can infer from these findings that visual information within both foveal (~1° eccentricity) and parafoveal (~5° eccentricity) vision can influence fixation durations. This opens up the possibility that scene processing may be distributed across fixation durations within the visual span (distributed-processing assumption). Thus, the starting point for my investigation was that scene-level features can be processed across the visual field. I then tested the distributed-processing assumption in two steps. First, I tested whether there are immediacy effects of local image statistics on fixation durations in scene viewing. For example, does the luminance or clutter around fixation modulate fixation durations? Second, I tested whether the duration of the current fixation also reflects the processing demands of the previous fixation location (lag effect, spillover effect) or the next (successor effect, parafoveal-on-foveal effect).^{Footnote 2} Therefore, the analyses considered triplets of fixations—that is, sequences of three successive fixations (Fig. 1). The current fixation is referred to as fixation n, the preceding fixation as n – 1, and the next fixation as n + 1. The only dependent variable was the duration of fixation n. To test the local influence of visual image features, circular image patches with a radius of 1°, approximating foveal vision, were centered on each fixation point.

Lag effects refer to the influence of local image-based indexes of fixation n – 1 or the position of fixation n – 1 on the duration of fixation n. Corpus analyses of reading data have identified lag effects that are (a) due to incomplete processing of the previous word n – 1 and (b) due to the limits of visual acuity (Kliegl et al., 2006). The present analyses tested whether lag effects originating from these two sources also exist in scene viewing. First, if the processing of the scene region sampled with fixation n – 1 is not completed before the eyes move on to the next scene region, effects of image statistics at fixation n – 1 might spill over to the duration of the subsequent fixation n. Second, the distance between the locations of fixations n and n – 1—that is, the amplitude of the incoming saccade—might also influence the subsequent fixation duration. In reading, the finding that fixation durations increase with the amplitude of the incoming saccade is well-established (e.g., Kliegl et al., 2006; Schad et al., 2010; Vitu, McConkie, Kerr, & O’Regan, 2001). Likewise, in scene viewing we may observe long fixations after long saccades because the previous fixation n – 1 yielded less preview of the scene region sampled with the current fixation n than is true for fixations after short saccades. In free viewing, when there is no explicit task, the amplitude of the incoming (or last) saccade (Sac_n–1 in Fig. 1) has not predicted the duration of the following fixation (Tatler & Vincent, 2008).^{Footnote 3} To foreshadow the results, I found systematic effects of saccade amplitude on subsequent fixation durations across viewing tasks in the present data.

Successor effects refer to the possibility that processing of scene regions in parafoveal vision can influence foveal fixation durations during scene viewing. Parafoveal information is used to provide information as to where the eyes should move next (Nuthmann, 2013; Pajak & Nuthmann, 2013). Specifically, this information is used for selecting the next saccade target and determining the amplitude of the next saccade. However, it is currently unclear whether and to what extent such parafoveal processing modulates the duration of fixation n. Do successor effects generalize from reading (Kliegl et al., 2006; Schotter, Angele, & Rayner, 2012, for a review) to scene viewing? If so, is the parafoveal processing of upcoming fixation locations restricted to low-level properties related to image luminance, or does it also extend to higher-level image features that approximate the presence of objects in a scene? Finally, do successor effects depend on task-related control?

Oculomotor and spatiotemporal parameters

Along with the image-related predictors, the LMMs simultaneously assessed a number of oculomotor and spatiotemporal covariates, including the amplitude of the next saccade, the change in saccade direction, and viewing time.

Amplitude of the next saccade

The LMMs included the amplitude of the outgoing (or next) saccade. Tatler and Vincent (2008) found no systematic relationship between the current fixation duration and the amplitude of the outgoing saccade (Saccade n in Fig. 1) during free viewing of natural scenes. Reading studies have reported mixed results. In a number of studies, fixation durations were found to increase with the length of the outgoing saccade (e.g., Kliegl et al., 2006; Kuperman, Dambacher, Nuthmann, & Kliegl, 2010; Schad et al., 2010). However, corpus analyses by Angele et al. (2015, 2016) reported significant negative effects, with shorter single fixations and gaze durations when the next saccade was large.

Change in saccade direction

The change in saccade direction can be described as the angular difference between the last saccade n – 1 and the next saccade n (Δ in Fig. 1). An angle Δ = 0° is indicative of a saccade n that continues the trajectory of saccade n – 1, whereas Δ = 180° denotes a complete reversal of direction. A number of studies have observed an approximately linear increase in fixation duration and/or saccade latency as a function of the angular difference between the last and next saccades (Klein & MacInnes, 1999; MacInnes & Klein, 2003; Smith & Henderson, 2009, 2011; Tatler & Vincent, 2008; Wilming, Harst, Schmidt, & König, 2013). Fixation durations are shortest when saccade n continues the trajectory of saccade n – 1, whereas complete reversals in saccade direction are associated with the longest fixations. In the literature (see Klein & Hilchey, 2011, for a review), the effect has been associated with the temporal component of either (or both) of two biases: a bias away from previous fixations (i.e., oculomotor inhibition of return, O-IOR) or a bias for the eyes to continue moving in the same direction (i.e., saccadic momentum).

Viewing time

The finding is well-established that fixation durations change over time. Several studies have reported that fixation durations increased during initial viewing periods and stabilized during later viewing (e.g., Antes, 1974; Mills et al., 2011; Pannasch, Helmert, Roth, Herbold, & Walter, 2008; Unema, Pannasch, Joos, & Velichkovsky, 2005; but see De Graef, Christiaens, & D’Ydewalle, 1990). The study by Mills et al. (2011) investigated how task set influences the rate of change in fixation durations over the course of viewing. As was described above, fixation durations were generally greater for free view and memory than for search and pleasantness rating. The effect was present primarily during early viewing only (i.e., at 1 and 2 s), with the only difference during later viewing (i.e., at 5 s) being between the free-view and the search conditions (Mills et al., 2011). In contrast, in the study by Castelhano et al. (2009), no effect of task (memorization vs. search) was observed across the viewing period or during early viewing (the first five fixations).

Distance from scene center

Many studies have reported that observers fixate more often toward the center of the image than at the edges (e.g., Mannan et al., 1996; Tatler et al., 2005). This image-independent viewing bias (Tatler, 2007) is referred to as the central bias of fixation. In previous work, this bias has been quantified as a linear decrease in fixation probability as the distance from scene center increases (Nuthmann & Einhäuser, 2015). When the influence of image features was controlled for, the central bias was still a strong predictor of where observers fixated in a scene. To explore whether the central bias generalizes to fixation durations, the current fixation’s spatial distance from image center was considered as an additional input variable for analysis.

The present study

The present research aims at advancing our knowledge about the factors that control fixation durations during scene viewing. This study is the first to present a statistical modeling framework to simultaneously test the influences of a large set of variables on fixation durations during scene perception, with a specific focus on how local image-based indexes of processing difficulty influence the fixation durations at the current, previous, and next fixation locations. An LMM approach is introduced, which allows the researcher to assess each predictor’s unique contribution to explaining variance in fixation durations for a given viewing task, and its relative importance. Specifically, the goal of the LMMs was to test simultaneously the influences of 20 variables. These are the luminance, luminance contrast, proportion of edges, visual clutter, and number of segmented units around the current, previous, and next fixation locations; the amplitudes of the incoming and outgoing saccades (in degrees of visual angle); the angular difference between the two saccades (in degrees); the current fixation’s Euclidian distance from image center (in degrees of visual angle); and the viewing time (in milliseconds).

Method

Participants, apparatus, and materials

Analyses were based on a large corpus of eye movements during scene viewing.^{Footnote 4} Seventy-two participants (mean age = 22.6 years; 38 females, 34 males) each viewed 135 unique full-color photographs of real-world scenes from a variety of categories (indoor and outdoor). The 92 indoor scenes came from different subcategories, ranging from common rooms in one’s house (e.g., living room, kitchen) to images from shops, garages, and so forth. Scene images were presented on a 21-in. CRT monitor with a screen resolution of 800 × 600 pixels. The scenes subtended 25.78° horizontally × 19.34° vertically at a viewing distance of 90 cm. A chinrest with a head support was used to minimize head movement. During scene presentation, eye movements were recorded using an SR Research EyeLink 1000/2 K system (average accuracy: 0.25° to 0.5°, precision: 0.01° root-mean squared). The experiment was implemented with the SR Research Experiment Builder software.

Procedure

Participants viewed each of the 135 scenes once: 45 scenes in each of the three viewing tasks (memorization, preference judgment, and search). All scenes were presented for 8 s. In the scene memorization task, observers had to encode the scene to prepare for an old–new recognition test administered at the end of the experiment. In the aesthetic preference judgment task, participants rated how much they liked each scene. The visual search task had participants look for a prespecified object in the scene (e.g., the basket in Fig. 2a).

At the beginning of each trial, a fixation point was presented at the center of the screen and acted as a fixation check. In the search task, prior to the fixation check, a text label describing the target (e.g., basket) was presented for 800 ms. For details on selection of the search targets and their properties, see Nuthmann and Henderson (2010). To keep the viewing time constant across tasks, the scene remained on the screen until the 8 s were over. However, the present fixation duration analyses only considered fixations made until the buttonpress terminating the search.

Both the search block and the aesthetic preference block were preceded by three practice trials. After participants had completed the three viewing tasks, the memory test was administered (see Nuthmann & Henderson, 2010, for details).

Design

A dual Latin-square design was used in the study (Table 3). Participants were allocated to nine groups of eight participants (random factor Subject Group) to control for (a) which set of images they viewed in each task and (b) the order in which they performed the three viewing tasks. To control for item effects, the 135 scene images were assigned to three lists of 45 scenes each. The scene lists (random factor Scene List) were rotated over participants, such that a given participant was exposed to a scene list for only one of the three viewing-task conditions. The three orders in which the task blocks were completed were search–memorization–preference, preference–search–memorization, and memorization–preference–search. The design ensured that every order of tasks and combination of scenes with tasks was represented at least once across the nine participant groups (Table 3). Out of the 72 participants, 24 saw the same scene images in a given viewing task, and eight participants saw the same images in a given task and task order.

Data analysis

Data from the right eye were analyzed. Saccades were defined with a 50°/s velocity threshold using a nine-sample saccade detection model. The raw data were converted into a fixation sequence matrix using SR Research Data Viewer. Those data were processed further and analyzed using MATLAB 2009b (The MathWorks, Natick, MA) and the R system for statistical computing (version 3.2.0; R Development Core Team, 2015) under the GNU General Public License (Version 2, June 1991). All image processing was performed in MATLAB.

Gaze data analysis

A major goal of the present work was to test the influences of local image-based indexes of processing difficulty on the fixation durations at the current, previous, and next fixation locations. Therefore, the main analyses considered triplets of fixations (Fig. 1). Fixations were excluded if they were the first or last fixation in a trial. The triplet analyses therefore required a minimum of five fixations in a trial. To test the influences of visual image features, circular image patches were centered on each fixation point. Each circle had a radius of 1°, to approximate foveal vision while accommodating the inaccuracy of the eyetracker. A given fixation could potentially contribute to several triplets. For example, a sequence of five successive valid fixations would generate three triplets, with the middle fixation (#3) contributing to the first triplet as fixation n + 1, to the second triplet as fixation n, and to the third triplet as fixation n – 1. Fixation triplets that co-occurred with blinks were removed. For the investigation of fixation durations, it is common to exclude very short (e.g., <50 or 80 ms) and very long fixations, on the basis of the assumption that they are not determined by online cognitive processes (Inhoff & Radach, 1998). Triplets in which one or more fixations were shorter than 50 ms or longer than 1,000 ms were therefore disregarded. For the investigation of saccade properties, it is common to remove saccades with amplitudes less than 1°, to exclude corrective saccades and microsaccades (e.g., Smith & Henderson, 2009). In the present context, the inclusion of small saccades would potentially smear out the effects of distributed processing. Furthermore, the length of the next saccade from fixation n to n + 1 determined the overlap between the circular patches centered on fixations n and n + 1, and the same was true for the previous saccade and the patches centered on fixations n – 1 and n. When using circular patches with a 1° radius, a 1° saccade would lead to a 39 % overlap between neighboring patches. Overlap between patches would also aggravate the correlation between them. Only fixation triplets in which the circular patches around fixations n – 1, n, and n + 1 did not overlap were included in the analyses. Triplets in which the incoming or outgoing saccades (or both) were shorter than or equal to 2° were therefore removed. After exclusions, 76,685 fixation triplets (memorization: 28,442; preference judgment: 33,275; search: 14,968) remained for the analyses. Fewer data points were available for the scene search task because the analyses only considered fixations made until the buttonpress terminating the search (mean search time 3.77 s). The median saccade amplitudes were 5.1° (search), 5.2° (memorization), and 5.3° (preference).

The triplet analyses required filtering the data set in various ways, such that the analyses were based on subsets of data. Therefore, the triplet analyses were complemented by control analyses that exclusively tested immediacy effects of local image features around the current fixation—that is, no lag and successor effects. As before, the LMMs included the full set of oculomotor and spatiotemporal variables. As compared with the triplet analyses, the number of observations that entered the control LMMs was much increased (memorization: 67,472; preference judgment: 69,854; search: 33,170), thereby increasing statistical power. Moreover, the control analyses allowed for testing whether the results would generalize when fixations with short incoming or outgoing saccades were included.

Computation of image features

For each scene image, five different feature maps were calculated. On the basis of the various image feature maps, local image statistics were calculated by identifying patches subtending a circular area with a radius of 1° (31 pixels) around fixation locations. Patches were computed for each participant and scene on a fixation-by-fixation basis. Thus, the local image patches were analyzed for all three fixations in a triplet (Fig. 1).