Working memory (WM) is a fundamental cognitive function that enables to retain, manipulate, and transform information in the absence of steady perceptual input. The efficiency of WM not only depends on how well content is remembered but also in what order it appeared. Without order processing, complex cognitive skills such as language, reasoning, and learning are impossible (Baddeley, 2003). How serial order processing is accomplished is therefore considered to be one of the most important problems in cognitive science (Lashley, 1951). The aim of the current study is to address how serially stored information is accessed in verbal WM.

According to one of the most influential models of WM (Baddeley & Hitch, 1974), WM comprises (among other components) separate storage systems for verbal and visuospatial information, with the former known as the phonological loop and the latter as the visuospatial sketchpad (Baddeley & Hitch, 1974). The retention of serial order has been extensively studied within the framework of the phonological loop, which uses inner speech as a rehearsal mechanism. Inner speech naturally implies serial order as it forms a chain of phonological items, whereby each item serves as a retrieval cue for the next item. However, it has become clear that chaining cannot account for the error patterns typically observed in the serial order recall (Baddeley, 1986). For instance, a simple chaining model predicts a cascade of subsequent errors once an error has been made; in reality, however, typical errors are transpositions whereby two items change position without further affecting the rest of the sequence (e.g., Conrad, 1964; Henson et al., 1996; Lashley, 1951; Lewandowsky & Farrell, 2008; Murdock & vom Saal, 1967; see also Hurlstone et al., 2014). To overcome the limitations of the phonological loop, a number of researchers have developed computational models to specify the mechanism underlying serial order WM (Hurlstone et al., 2014; Marshuetz, 2005). In general, these models are based on the notion that serial inputs are bound to position markers, which could then be accessed for later recall. Several models incorporating position markers have been proposed, like the start-end model (Henson, 1998), oscillatory response model (Brown et al., 2000) and a model based on magnitude and rank codes (Botvinick & Watanabe, 2007). Despite their relative success in accounting for several empirical observations, these models were largely developed on theoretical grounds, and it remains unclear how specifically the position markers are implemented at the cognitive level.

Recently, the idea that space is used to mark positions in a sequence has received increasing support (Abrahamse et al., 2017; Abrahamse et al., 2014; Fischer-Baum, 2018). Abrahamse and colleagues (Abrahamse et al., 2017; Abrahamse et al., 2014) proposed that the representation of serial order in verbal WM is (typically) realized by spatially organizing the memoranda according to reading direction (e.g., from left-to-right in Western languages; see also Fischer-Baum & Benjamin, 2014). A robust observation which motivated this hypothesis is the fact that upon retrieval, items from the beginning of a verbal WM sequence facilitate left-hand responses relative to the right-hand responses, while the opposite is true for end items (Ginsburg et al., 2014; Ginsburg et al., 2017; Guida et al., 2018; Guida et al., 2016; van Dijck & Fias, 2011; see Abrahamse et al., 2017; Guida & Campitelli, 2019, for reviews). Moreover, it has been argued that searching and retrieving items from this spatial representation involves orienting spatial attention. In an adapted version of the Posner dot-detection task (Posner, 1980), in which the retrieval of an item from WM preceded the dot detection, it was observed that the further the position of the item in the memorized sequence, the faster right-sided dots were detected, suggesting that retrieving an item from WM involves a mechanism for orienting spatial attention that overlaps with orienting attention in physical space (van Dijck et al., 2014; van Dijck et al., 2013).

Although these behavioral observations show that there exists a link between serial position and space, the evidence in favor of the more specific hypothesis that spatial attention is operating on the spatially coded sequences is not conclusive. The evidence that is currently available shows that the congruency between the spatially defined position of the cue in the WM sequence and the spatial location of the target in the dot-detection task determines performance. Yet this does not prove that spatial attention is involved. In principle, any type of dimensional overlap, be it perceptual, conceptual, or verbal (Kornblum et al., 1990), between the cue position and the target location can account for the congruency effect, without necessarily involving attentional shifts. To be able to unequivocally demonstrate the involvement of spatial attention, it is necessary to directly measure attentional processing induced by the cue and preceding the probe (as was done at a neurophysiological level by Rasoulzadeh et al., 2021). Reaction times measured in response to the dot probes are not suitable for this purpose, as they reflect both cue-related and probe-related processes.

Eye-movements provide excellent opportunities because they can be measured continuously with high temporal resolution, thus allowing measurements between cue and dot probe, and because they directly relate to the location of the attentional focus (Corbetta, 1998; Rizzolatti et al., 1987). This is not only true for attentional shifts in physical space. There is also convincing evidence showing the involvement of the oculomotor system in attentional shifts in mental space (i.e., visuospatial WM; Nobre et al., 2000; Postle & Hamidi, 2007). For instance, van Ede et al. (2019) showed that the eyes systematically move in the direction of memorized locations, while there is actually nothing physically present during retention. It is possible that, also in case of WM for more abstract types of information, like verbal WM, mechanisms of spatial attention that are shared with the processing of physical space are recruited, especially if it is indeed the case that serial order information in verbal WM is based on space-based position marking with the involvement of spatial attention.

Recent work by Rinaldi and colleagues has examined the involvement of spontaneous eye-movements in verbal WM (Rinaldi et al., 2015). In a three-phase study, participants had to memorize a sequence verbal items first, and while spontaneous eye-movements were tracked, cue-based recognition of the memory items and verbal recall of the serial order were tested in Phases 2 and 3, respectively. In line with the hypothesis that memory for serial order information is spatially coded (Abrahamse et al., 2017; Abrahamse et al., 2014; Fischer-Baum, 2018), they found evidence for a left-to-right encoding characterized in the horizontal eye position. However, they were only able to find this during verbal recall of the serial order information and not during recognition. Such a finding is interesting, but the purported spatial coding in Rinaldi et al. (2015) is not conclusive as it may reflect serial production (e.g., the recitation of positions) rather than the serial representation. It has been shown that numbers recited in incremental order such as in counting induced rightward shifts in eye-movements(Hartmann et al., 2016). Therefore, it is likely that the positional context (e.g., “first,” “second,” and so on) is covertly produced during verbal recall of the serial order leading to the rightward shifts in eye-movements. Unlike serial recall, the underlying representational context is spontaneously activated during recognition without the explicit need for producing the positional information. Second, and conversely, from the absence of systematic eye-position shifts during the recognition phase, it cannot be concluded that no spatially determined processes are involved in accessing items at specific positions in WM. It is well possible that the conditions were not optimal to detect systematic eye-movements as a function of the serial position of the cued item (see below).

The aim of the current study is to address whether the spatial position of serial order information is accessed during cue-induced memory search. Our central hypothesis is that a sequence of verbal WM representations is spatially coded in WM and that accessing these representations is mediated through spatial attention as indexed by eye-movements. More specifically, we hypothesize that the processing of begin items results in left-ward eye-movements and the processing of end items in right-ward eye-movements. For this purpose, we continuously tracked the eye-movements while participants were freely viewing and performing an acoustic version of the task developed by van Dijck et al. (2013). One of the reasons that Rinaldi et al. (2015) could potentially not find any spatial shifts in eye-movements during cue-induced memory search is that they used visual cues that restricted the participants’ ability to free-gaze, leaving fewer degrees of freedom for spatial attention to modulate eye movements. We therefore presented auditory cues instead to have a restriction-free visual environment for the eyes to move. Participants were instructed to memorize a sequence of four random spoken number words. During the retention interval, they performed a series of speeded beep detection trials. Importantly, before each beep, a number-cue was presented that did or did not belong to the memorized sequence. To ensure WM access, participants were instructed to perform the beep-detection task only when the number-cue was part of memorized sequence. Finally, it was verified whether the entire sequence was still accurately stored in WM till the end of the block. In Experiment 1, the to-be-detected beeps were presented in the left or right ear while in Experiment 2, these beeps were centrally presented to further undo the task from spatial information that could have indirectly biased performance.

Experiment 1

Methods

Participants

Twenty Ghent University students (10 males, age M = 22.3 years, SD = 2.7 years) participated in return for payment. All participants were right-handed and had normal or corrected-to-normal vision. The research complied with the guidelines of the Independent Ethics Committee of the Department of Psychology and Educational Sciences of Ghent University. All participants gave written informed consent.

Apparatus

Stimuli were presented at a 70-cm distance from the participants on a 22-inch LCD monitor (1,920 × 1,080 pixels, refresh rate: 60 Hz). Stimulus presentation was controlled using MATLAB software (The MathWorks, Natick, MA) with Psychtoolbox-3 extensions (Brainard, 1997). Auditory materials were presented via headphones. An EyeLink 1000 tower-mounted eye-tracker (SR Research, Canada) was used to record the eye-movements at 1000 Hz. A chin rest was used to reduce the head movements. Prior to each session of 100 trials, the eye tracker was calibrated to the screen using an in-built 9-point calibration protocol. The eye tracker was recalibrated when the calibration accuracy exceeded a mean threshold of 0.5° and a maximum threshold of 1° visual angle (VA).

Procedure and design

Eye-movements were continuously measured while participants were performing an auditory version of the serial order position cuing task during which free viewing was allowed. The acoustic version was chosen based on Kinsbourne (1974) and Salvaggio et al. (2019), who argued that eye movements reflecting though processes are maximally detectable the position of the eyes is not constrained by task demands. The use of visual cues may have been the reason why a previous attempt to establish cue-induced shifts of spatial attention through eye movements as a function of serial position in WM failed (Rinaldi et al., 2015).

The experiment consisted of 40 blocks each containing three phases (see Fig. 1a). In Phase 1, participants memorized four serially and binaurally presented numbers. Encoding was self-paced by pressing the space bar with their right index finger and the participants were explicitly instructed to memorize the numbers in the correct order. The numbers were pseudorandomly selected from 1 to 9, excluding 5, and were balanced across WM positions. Over the entire experiment, each number was equally distributed across each WM position. Stereo audio files were recorded per number as a one-syllable Dutch number word spoken in a monotone male voice with a duration adjusted to 700 ms.

Fig. 1
figure 1

a Experimental paradigm. Eye-movements were continuously measured during free-viewing while participants performed an auditory version of the “serial order position cueing task” (11). b and dTime-resolved eye-tracking data as a function of WM position in Experiments 1 and 2. The plot represents the normalized time course of the eye-positions as ribbons, interpolated across the 31 discrete time points, whose thickness indicates ±1 standard error of the mean. The black bar on the y-axis indicates the time points at which the begin and end cues triggered statistically significant differences in eye-positions (p < .05, at a cluster-corrected level). c and bTime-averaged eye-tracking data as a linear function WM position in Experiment 1s and 2. The average eye-position followed a left-to-right gradient as a function of WM position (dotted line)

After a rehearsal period of 2,500 ms, Phase 2 was initiated. Participants performed a go/no-go speeded beep detection task. A number (i.e., cue) was first binaurally presented for 700 ms. After a fixed cue-target interval of 800 ms, a beep (i.e., target) was randomly played in either the left or the right ear for 150 ms. Participants were instructed to press the space bar if they detected the beep, but only when the preceding number was part of the memorized sequence (go trials, 60%). Notice that the side of the beep was unrelated to the cue. When the cue was not part of the memorized sequence, participants had to refrain from responding (no-go trials, 40%). Each block contained 20 trials. All the eight numbers were presented twice followed by a left or right beep. The remainder of trials were catch trials in which a go-cue (all memorized items presented once) was presented without the subsequent beep. These trials were included to make sure participants did not just press the space bar because they heard a go-cue, but performed the beep detection task. A response deadline and an inter-trial interval of 1,000 ms were administered.

Finally, in Phase 3, serial-order knowledge for the learned sequence was tested. Participants had to verify three visually presented questions of the form “was 1 preceded by 8?” These questions were randomly selected from all possible pairs of WM items, the order of which did or did not match to the corresponding order of the WM sequence. The questions were presented one at a time and the words within these questions were vertically arranged to reduce horizontal associations. The factors WM position (four levels: Position 1 through 4) and beep side (two levels: left vs. right) were fully crossed (40 measurements per condition).

Following Salvaggio et al. (2019), throughout the experiment, participants were shown a static noise background. The static noise background consisted of evenly aligned boxes (0.5° × 0.5° of VA) on the horizontal and vertical axes that were randomly filled with grayscale colors. The purpose of the static noise background was to offer participants fixation points during free viewing. With empty displays, the probability is high that participants fixate the edges of the screen or other environmental stimuli. The noise pattern was randomized every new block and remained the same across trials within a block.

Eye-tracking data preprocessing and analyses

Eye-movements were continuously measured at a rate of 1000 Hz while participants were performing the task. Only trials from WM sequences with accurate serial-order verification (Phase 3; M = 38/40, SD = 2/40) and correctly detected go-trials were considered (Phase 2; 97%). In each trial, the horizontal eye-position of data epochs from the onset of the number-cue to the beep-target onset (1,500 ms) were extracted and aggregated in bins of 50 ms. Missing data resulting from blinks were linearly interpolated. Trials in which the range of missing data exceeded 500 ms were excluded (1%; following Salvaggio et al., 2019). Eye-positions per trial (pixels) were z-transformed to obtain comparable measures across trials and participants. Epochs were baseline corrected to the period −50 to 0 ms before cue onset. These normalized traces of eye-position were subjected to statistical analyses (following Kustov & Robinson, 1996, and van Ede et al., 2019).

First, the time course of the eye positions related to cues pointing to the beginning (first item) and to cues pointing to the end (fourth item) of the verbal WM sequence were compared using paired t tests. The aim of this analysis was to first establish a spatial modulation for the extremities of the WM sequence. An alpha level of 0.05 was applied throughout the analysis. To control for multiple comparisons across time points, statistical comparisons were conducted using a cluster-based permutation approach (following Maris & Oostenveld, 2007). The permutation testing proceeded in three steps. First, two-tailed paired t tests were applied to compare between eye position in the two cue conditions (i.e., beginning vs. end positions) at each time point. Subsequent time points reaching statistical significance were then temporally binned into clusters. These clusters were eventually compared against a permuted distribution. More precisely, the labels begin and end of the time courses were permuted (i.e., randomly shuffled without replacement), and paired t tests were applied at each time point of the permuted data. Temporal clusters of permuted data were then isolated. Following a conservative criterion, the largest cluster was selected, and its t values were summed over all the bins of the cluster. These permutations were repeated 1,000 times to yield a permuted distribution of the sums of the largest cluster t values. Statistical significance was inferred from the position of observed data (i.e., the sums of t values of each observed cluster) under the permuted distribution (i.e., sums of t values of the largest permuted clusters). The comparisons between conditions (begin vs. end) were statistically significant if the probability with which the observed data exceeded the significance threshold of the 95th percentile under the permuted distribution.

Second, we included also the middle positions to test for linear effect of cued WM position on eye position using regression analysis (following Lorch & Myers, 1990). The presence of a linear effect in the eye position would indicate that search through serially stored items in verbal WM progresses in a left-to-right fashion. For each participant, eye position was aggregated across the significant time points obtained from the comparison between the beginning and end positions in the sequence. These aggregated eye position data were then regressed onto WM position (i.e., 1 through 4). The resulting slopes were tested for a significant increase, indicating a left-to-right modulation, by using a one-tailed one-sample t test.

Finally, we wanted to explore the possibility that the numerical magnitude of numbers could systemically shift the eye movements. According to one important tradition in the numerical processing literature, numbers are also mapped on a spatial medium onto which similar principles of spatial attention apply. These spatial attention mechanisms are believed to underlie effects such as spatial numerical associations (SNARC; Dehaene et al., 1993). For instance, in an eye-tracking study by Myachykov et al. (2016), gaze drifted along with the numerical magnitude of spoken numbers while maintaining fixation. Relatedly, the numbers used in our experiments may have influenced the eye-movements, with smaller numbers moving attention to the left side of space and larger numbers to the right side. To rule out the possibility that it was magnitude and not serial order that induced the observed attention shifts, we included magnitude of the numbers as an independent variable in the analysis. The trials were collapsed as to the magnitude of the numbers with numbers smaller and larger than 5. For each participant, eye position was aggregated across the significant time points and subjected to a 2 × 2 repeated-measures analysis of variance (ANOVA ),with WM position (first, end) and magnitude (small, large) as within-subjects factors.

Behavioral data analysis

Previously van Dijck et al. (2013) have shown that the right sided targets—relative to the left sided targets—were detected faster when the preceding numbers cued the end of the WM sequence. As a verification, we assessed whether these findings generalized to the acoustic version of the task. Only the go-trials without the catch trials were considered for this purpose (accuracy on the beep-detection task was 97%, 99%, and 99% for go, no-go, and catch trials, respectively). Although we considered trials with a response during the beep presentation (<150 ms) as anticipatory trials that were to be excluded, no such trials were observed. First, we conducted a repeated-measures ANOVA with WM position (four levels) and Target location (two levels: left & right) as independent variables. Subsequently, to have a statistically more sensitive measureFootnote 1(Fias et al., 1996), we calculated the average reaction time differences (dRTs) between the right and left sided beeps at each WM position for each participant. These dRT measures were then regressed onto the WM position. Significance of the decrease in slopes were evaluated using a one-tailed, one-samplet test given that the dRT measures have earlier been reported to decrease with WM position.

Results and discussion

The ANOVA did not reveal a significant interaction between WM position and target location, F(3,57) = 1.458, p = .235, η2 = 0.002. The regression analysis, however, shows that cueing of WM items from the end of the sequence leads to faster detection of right compared to left beeps (slope = −3.7 ms, SE = 1.8, t(19) = −2.09, p = .025 (one-tailed), replicating the compatibility effects between serial order position and spatial location of the probes (see Table 1; van Dijck et al., 2013). More precisely, the decreasing slope indicated that right sided beep tones, compared with left sided tones, were detected faster the later the cued position in serial order (see Fig. 2). As in the study of van Dijck et al. (2013), retrieving early items resulted in similar RTs for left- and right-sided target detection. To the contrary, these RTs started to diverge incrementally the later the retrieved memory item in serial order, suggesting that serial search in memory progresses towards the right.

Table 1 Reaction times of Experiment 1 (means and standard errors, ms)
Fig. 2
figure 2

Beep detection performance in Experiment 1. Average reaction time (RT) differences between right and left sided beep detection as a function of the serial order position in WM. Positive values indicate faster responses after beep presentation to the left ear. The dotted lines represent the linear relationship between RT differences and WM position: right sided beep tones, compared with left sided tones, are detected faster the later the cued position in serial order. Error bars represent standard error of the mean

Crucially, cue-dependent differences in horizontal eye-position were observed during the cue-target interval (see Fig. 1b). After a generic leftward shift immediately after cue presentation, the eyes start to move in a specific way as a function of the position of the cue in the memorized sequence: A relatively larger rightward shift was observed when the number-cue came from the end of the sequence (cluster-corrected p = .018, two-tailed). The deviation in eye position averaged throughout this significant time window corresponds to 36 pixels (or 0.86 ° of VA) which is of similar order of magnitude compared to studies using a similar setup (Salvaggio et al., 2019). Using the same significant time points (obtained from comparing first and last items in the sequence), the presence of the spatial effects in verbal WM were tested including all cued positions. This analysis further showed that eye-movements linearly progressed in space with the sequential ordinal position of the cue in WM (see Fig. 1c, slope = 0.15, SE = .06), t(19) = 2.30, p = .016 (one-tailed). To rule out the possibility that number magnitudes might have induced WM-based shifts of spatial attention, magnitude was included in the analysis. While WM position effect remained significant, F(1, 19) = 5.26, p = 0.033, η2 = 0.091, magnitude did not change the eye position (F < 1) neither did it modulate the WM position effect (F < 1). We also performed exploratory analyses on the vertical dimension following the same analysis steps for the horizontal dimension. However, this did not reveal any systematic positional effects.

In sum, these findings suggest that spatial attention is involved in the processing of serial order verbal WM. Importantly, the sequence of the to-be-remembered items was lacking any form of spatial information: all items were acoustically presented number words. As gaze diverted more to the left side of space when searching items from initial parts of the memorized sequence and more to the right side for later parts, it can be concluded that the serial order WM is grounded in the spatial attention system.

Experiment 2

The systematic eye-movements as a function of the serial position of the cue in Experiment 1 confirm the involvement of spatial attention while exploring verbal WM. However, one could argue that this effect is artificially induced by the left–right spatial codes associated to the lateralized beep presentation even though its discrimination is task-irrelevant. Such an argument has been made in the domain of number–space associations. For instance, Pinto et al. (2019) have recently shown that spatial–numerical associations are triggered as soon as left–right spatial elements are introduced in the task, but that they do not occur if the task makes no reference to spatial elements. To rule out any form of task-induced spatial coding, we ran an additional experiment that was the same as Experiment 1, but with a centrally presented target-beep sound instead of lateralized beeps. We hypothesized that the effect of serial WM position should disappear if the eye-movements we found in Experiment 1 were induced by left–right spatial codes of the probes. If, however, spatial coding is spontaneously implemented, then eye-movements would still follow the searched position in the serial order of verbal WM.

Methods

Participants

Twenty-one different Ghent University students (three males, age M = 21.3 years, SD = 2.4 years) participated in return for payment. All participants were right-handed and had normal or corrected-to-normal vision. One female participant was excluded because of a calibration failure. The research complied with the guidelines of the Independent Ethics Committee of the Department of Psychology and Educational Sciences of Ghent University. All participants gave written informed consent.

Procedure, design, and statistical analyses

The procedure and design are identical to Experiment 1, except for the target-beep, which was binaurally presented. As there were no lateralized targets presented in Experiment 2, no dRTs could be calculated. The analysis was restricted to the eye-tracking data following the same analysis protocol as in Experiment 1. Only trials from WM sequences with accurate serial-order verification (Phase 3; M = 37/40, SD = 3/40) and the time course of horizontal eye position upon hearing the go-cues were considered. As in Experiment 1, excessive blink trials (1%) were excluded from the analysis. No anticipatory trials were observed. The overall accuracy on the beep-detection task (i.e., Phase 2) was 96%, 99%, and 99% for go, no-go, and catch trials, respectively.

Results and discussion

The same pattern of results was observed with binaural beeps, showing that the lateralized beeps of Experiment 1 were not responsible for the effect (see Fig. 1d). Eyes moved spontaneously as function of the ordinal position of the cues in WM: After an initial leftward shift, a relatively larger rightward shift was observed when the cue came from the end of the sequence (cluster-corrected, p = .005, two-tailed). The deviation in eye position, averaged throughout this significant time window, corresponds to 10 pixels (or 0.30 ° VA). Within this interval, a regression analysis confirmed that eye-movements linearly progressed in horizontal space as a function of ordinal position of the cue in WM (see Fig. 1e, slope = 0.08, SE = .02), t(19) = 3.64, p = .0008 (one-tailed ). To rule out the possibility that number magnitudes might have induced WM-based shifts of spatial attention, magnitude was included in the analysis. While WM position effect remained significant, F(1, 19) = 10.817, p = .004, η2 = 0.019, magnitude did not change the eye position (F < 1) neither did it modulate the WM position effect (F < 1). We also performed exploratory analyses on the vertical dimension following the same analysis steps for the horizontal dimension. However, this did not reveal any systematic positional effects.

Together, the key findings of Experiment 1 were replicated in that the eye-movements followed the searched position in serial order memory. Crucially, this effect was observed even in the absence of any lateralization in the beep presentation. This observation corroborates the evidence that verbal WM for serial order relies on visuospatial processes and excludes the possibility that these effects emerge as a consequence of dimensional overlap.

General discussion

The question of how the mind is able to retain a sequence of information has a long research tradition (Lashley, 1951). Here, we addressed the question how serial order verbal information is retained in WM. Consistent with ideas of spatial position marking, our data unequivocally demonstrate that retrieving items at a specific position in a sequence of words in WM is accompanied by horizontal eye-movements. In Experiment 1, we demonstrated that gaze diverted more to the left side of space when searching items from initial parts of the memorized sequence and more to the right side for later parts. These findings generalize the cue-induced shifts in spatial attention documented earlier in behavior to the level of oculomotor responses. Unlike behavioral data that can be modulated at any stage from the initial perceptual processing to the subject hitting the response button, our time-resolved gaze data clearly demonstrate that spatial shifts of attention were driven by memory cues. Importantly, the acoustic word sequence did not contain any spatial reference, from which it can be concluded that the spatialization of the serial input is internally generated. In Experiment 2, we replicate and extend this finding by demonstrating that the involvement of spatial attention is an intrinsic process that is not induced by any external spatial context whatsoever. The fact that the probe detection task was not defined in terms of left or right of space since we used central beep tones, eliminated any form of dimensional overlap between stimulus and/or response as an alternative explanation (Kornblum et al., 1990).

Thus, across two experiments, we provide clear evidence substantiating the view that memory for serial order is grounded in the spatial attention system (Abrahamse et al., 2017; Abrahamse et al., 2014; Fischer-Baum, 2018). Remarkably, in both experiments, we observed an initial deflection towards the left regardless of the cued position in serial order. This might indicate that spatial attention cycles through the mental space by always starting from the beginning and thus left side of space (Tan & Ward, 2000, 2008). Nevertheless, this initial bias towards the left has been reported in a variety of studies as a natural tendency to scan the visual scenes from left to right (e.g., Foulsham et al., 2013). Since leftward eye-movements have been related to the right hemispheric dominance in the deployment of spatial attention (e.g., Meador et al., 1989), it remains to be tested whether cycling through the list by starting from the left also contributes to this initial leftward gaze bias. After the initial leftward gaze bias, however, eye-movements started to incrementally diverge relatively more towards the right upon retrieval of later items. This finding suggests that spatial attention navigates through mental space towards the right. In line with the behavioral observations of van Dijck et al. (2013), our behavioral data also converge in support of the idea that spatial shifts of attention progress from left to right. Specifically, we showed that retrieving later items, compared with early items, produced incremental shifts of spatial attention toward the right.

The fact that eye gaze moved in the direction of the cued position in the sequence is in line with the assumption that eye-movements reflect shifts of spatial attention. A framework for this assumption has been provided by the premotor theory of attention, whereby it holds that brain circuits supporting oculomotor control are also involved in spatial shifts of attention (Corbetta, 1998; Rizzolatti et al., 1987). Today, there is compelling evidence for the recruitment of spatial attention and by extension the oculomotor codes in the retention of visuospatial information in WM (Postle, 2006; Van der Stigchel & Hollingworth, 2018). However, this is not so surprising given that processing of visuospatial memoranda shares many properties and brain networks with the processing of perceived visuospatial information (Awh & Jonides, 2001). Here, we provide evidence that the recruitment of the oculomotor system also extends to our verbal WM processes where memoranda are neither visually nor spatially defined. This resonates with the ideas of cortical recycling, where it is suggested that brain circuits that were initially dedicated to evolutionary older (cognitive) functions (like, e.g., the spatial attentional mechanisms used for spatial navigation) are reused in human abstract thinking (Anderson, 2010; Dehaene, 2005), leaving traces all the way to the eyes. Our study provides a promising novel approach by continuously tracking the focus of attention to get insight into the spatial processes involved in verbal WM.

Accepting the assumption that the position of the eye reflects the position of the focus of attention, one may then wonder to what extent the interplay between spatial attention and serial order processes is an inherent property of serial order processing, or, alternatively, whether it is merely epiphenomenal. Firstly, our findings argue against the idea that the involvement of spatial attention is epiphenomenal, since changes in eye-movements were spontaneously induced by the cued position in a sequence of verbal information, which is by nature nonspatial, and in a task context devoid from any spatial component (Experiment 2). Moreover, the physical locations in space were never explicitly probed. Secondly, De Belder et al. (2015) provide evidence for the functional involvement of spatial attention in serial order verbal WM in that, when spatial locations were explicitly cued, retrieval from a WM sequence of verbal information was influenced by the congruency between spatial cue and sequential position. Specifically, retrieving items from the beginning of a sequence was facilitated when the left side of space was cued and retrieving items from the end of a sequence was facilitated upon right sided spatial cuing. Whether eye-movements similarly play an active functional role in retrieving serial order information from WM remains to be tested. It is possible, for instance, that memory performance could be facilitated by the position of the gaze that is contingent to spatial position of serial order information in the representational space.

With the principles and mechanisms of position coding being specified, it is currently unknown how these principles can be implemented in current models of WM. According to the classic account (Baddeley & Hitch, 1974), a sequence of verbal information is maintained by inner speech that repeats the items in the phonological loop. As for visuospatial information, spatial attention is employed by iteratively cycling the attentional focus in the visuospatial sketchpad (Awh & Jonides, 2001; Baddeley 1986; Postle, 2006). From the current study, we can conclude that verbal WM is not exclusively supported by a domain-specific store, with domain-specific (i.e., phonological) rehearsal mechanisms. After all, from such a domain-specific account, eye-movements as a function of the serial order of the items are not expected to accompany retrieval from verbal WM. This could imply that mechanisms of visuospatial WM are adopted, even when verbal items must be maintained in mind. As such, the spatial effects observed in the current study could disconfirm the idea of a strict subdivision of WM in domain-specific stores. Alternatively, the visuospatial processes that we have observed in verbal WM could also reflect the operation of a domain-general mechanism. For example, Baddeley (2000) suggested that the episodic buffer (as a complementary component to the visual and verbal components) could serve as a binding site for multimodal information. The involvement of spatial processes in verbal WM highlights the need for a formal description of how the episodic buffer operates and interacts with the phonological and visuospatial subsystems. Although being speculative, our observation of spatial coding governed by spatial attention, may reflect the operating principles of the episodic buffer. Another interesting possibility, that can be derived from the present findings, is that the well-established principles of attention-based rehearsal in visuospatial WM (Awh & Jonides, 2001) may also apply to rehearsal processes in verbal WM. Indeed, it has recently been suggested that besides phonological rehearsal, information in verbal WM can also be maintained by an attention-based rehearsal mechanism (Camos, 2015; Vergauwe & Langerock, 2017). In other words, our findings may indicate that attentional refreshing in verbal WM operates in a spatially defined way. Finally, the present results could also have implications for more recent accounts of WM. An interesting theory in this respect is the three-embedded-components model of Oberauer (2002, 2009; see also Cowan, 1995). It distinguishes three components in WM: the activated part of long-term memory, the region of direct access and the focus of attention. To address the mechanism for serial order maintenance, the region of direct access has been proposed to recruit space as a representational coordinate system on which nonspatial information can be bound to maintain the serial structure of the memoranda (Oberauer, 2009; see also Guida & Campitelli, 2019, for related ideas of how spatial structures can be exploited for representing serial order in verbal WM). The results of the present study align with the idea that the region of direct access is spatial in nature and that navigation along this representational medium to bring items in the focus of attention is governed by principles of spatial attention.

In summary, the observations that the oculomotor system is also involved in the serial order processes in verbal WM further supports the idea that serial order information in verbal WM is spatially coded and governed by spatial attention (Abrahamse et al., 2017; Abrahamse et al., 2014). Where these observations are not in conflict with existing theories, they should be incorporated to guide future theoretical developments in the domain.