Motion segmentation in search

Motion is a powerful cue for the guidance of attention in visual search. Irrelevant distractors differing from the target in a motion feature can be excluded from search (McLeod, Driver, & Crisp, 1988; McLeod, Driver, Dienes, & Crisp, 1991). The present study examined the functional mechanism underlying filtering by motion.

The modal framework for understanding search (e.g., Treisman & Sato, 1990; Wolfe, 1994) is that stimuli are decomposed by the visual system into their elementary features (e.g., color, size, orientation, motion), each of which is represented in an independent map. Efficient search through motion-segmented displays can be attributed to a motion map (Treisman & Sato, 1990) or a filter (McLeod et al., 1991) being used to direct the selection of the moving objects. In general, such an approach in which feature information directs attentional selection may be termed guided search. In the accounts of Treisman and Sato (1990) and Wolfe and colleagues (e.g., Wolfe, 1994; Wolfe & Horowitz, 2004), spatial attention is guided by a feature-general activation or master map, which uses input from feature maps to code the attentional priority of candidates for selection. Observers may weight the inputs from the feature maps in accordance with their target relevance. Thus, when searching for a moving target, the system is configured to give priority to locations containing movement. According to Wolfe and colleagues (see also McLeod et al., 1991), this prioritization process is excitatory, with moving locations given additional activation. In contrast, Treisman and Sato proposed an inhibitory account where nontargets (e.g., static locations) have reduced activation.

Attentional engagement theory (AET; Duncan & Humphreys, 1989) provides an alternative framework for understanding search. In AET, search is directed by a template that describes the critical features of the sought-for target. Objects compete for selection on the basis of matching this template. Items matching the template gain attentional weight (similar to the notion of activation above), which governs the likelihood of selection. Items not matching the template lose attentional weight. Critically, AET includes two further mechanisms, not present in guided search, that contribute to search efficiency: weight linkage and spreading suppression. Weight linkage means that the weights of items that share features and group change together. Items grouped by weight linkage, and not matching the target template, lose attentional weight together; this is spreading suppression. Critically, and in contrast to the guided search framework, attentional priority is not always controlled independently for each feature. Grouping of objects, even by features that are task irrelevant, can modulate selection.

Temporal segmentation in search

Cues other than motion may also guide search (e.g., color, 3-D depth; see Wolfe & Horowitz, 2004, for a review). Strong effects on search are observed when search items are separated over time, suggesting that time can be used as a guidance cue. In preview search (Watson & Humphreys, 1997), one set of distractors is presented ahead of the remaining distractors plus the target. Provided that the temporal separation is greater than 400 ms, the previewed distractors can be effectively excluded from search—the preview benefit (see Watson, Humphreys, & Olivers, 2003, for a review).

Growing evidence suggests that this preview benefit is mediated, at least in part, by top-down inhibition directed toward the old distractors (see Watson et al., 2003). For example, there are costs for the detection of probe events on old items (e.g., Humphreys, Jung-Stalmann, & Olivers, 2004). Task-irrelevant features (color and orientation) of the old items also contribute negatively to search (see Olivers, Humphreys, & Braithwaite, 2006, for a review). Thus, if a task-irrelevant feature singleton in the new group shares either color or orientation with the old items, interference (e.g., Theeuwes, 1992) with search is reduced (Olivers & Humphreys, 2003). Additionally, it is difficult to detect new targets carrying a feature by which the previewed items group (e.g., color), resulting in negative effects on search guidance from feature carry-over (e.g., Braithwaite et al., 2003; Braithwaite, Humphreys, & Hodsoll, 2004).

It has been argued that negative effects of color carry-over in the preview paradigm result from spreading suppression, following inhibition of the features of the previewed stimuli (Braithwaite et al., 2003, 2004; Olivers & Humphreys, 2003). Thus, if a new item shares a feature with the old items, its effective salience is reduced, facilitating search when the item is a distractor but impeding search when it is a target. These characteristics can be accounted for most naturally within an AET framework, in which the loss of attentional priority for old items is distributed to new items sharing features courtesy of weight linkage and spreading suppression.

The present study

One open question is whether negative color effects are specific to preview search or whether they also apply to segmentation by other salient cues, including motion. Are effects seen under preview conditions reflective of general search processes, or are they exclusive to temporal segmentation? In the present article, we ask whether negative color carry-over effects occur in the context of motion-segmented search, as in preview search. Arguably, visual search with the simultaneous presence of objects is a somewhat more natural and frequent situation than preview search. Thus, demonstrating effects with motion-segmented search will help to establish the general relevance of spreading suppression.

The present study employed stimuli based on those of Braithwaite et al. (2004), extended to motion rather than time. In all cases, the stimuli were presented in a single simultaneous search display consisting of either a display where all items moved or, in the critical condition, a display where half the items were static and half the items (plus the target) moved—akin to the displays devised by McLeod et al. (1988). The static distractors were one color (i.e., green), and all the moving distractors another (i.e., red). The moving target could be either red or green, and thus was either a color singleton among the moving group (sharing its color with the static distractors) or was the same color as the moving distractors. We compared performance in this motion-segmented condition with that in baseline conditions in which motion segmentation was not possible. In the full-set condition, all the items moved, making all the items potential targets, giving a measure of performance when search cannot benefit from motion segmentation. In the half-set condition, only the moving items from the motion segmentation condition were presented, giving a measure of performance when static distractors do not contribute to search. The design is summarized in Table 1.

Table 1 Summary of the experimental conditons. Note that the mapping of static and moving distractors to the colors red and green was held constant within observers but varied across two separate groups

The proposal that search is guided independently by motion (i.e., McLeod et al., 1991) holds that static items may be filtered from search early on. In this case, performance in motion-segmented and half-set conditions should be matched, and there is no reason to expect any negative impact of the target’s sharing its color with the static objects. Indeed, given that motion-sensitive cells can be largely color-blind (see, e.g., Livingstone & Hubel, 1988), there is no reason to think that the color relations between the static and moving items will have any impact on detecting a moving target. In contrast, AET predicts that, as a consequence of weight linkage and spreading suppression, targets sharing color with static distractors will inherit a loss of attentional weight, producing a response time (RT) cost. To quote directly from Duncan and Humphreys (1989), “a tendency for two weights to covary is helpful if both are to be set low (two nontargets) or high (two targets), but must be harmful if one is to be set low and the other high. . . . Grouping between targets and nontargets will be harmful. We predict, for example, that targets could sometimes be camouflaged by placing them close to similar nontargets in an array” (p. 448). Here, we examine this prediction.

Our use of a target color singleton in the moving set on some trials permits the consideration of a second, related issue—that of second-order parallel processing (see Friedman-Hill & Wolfe, 1995). In our implementation of the search task, when there is a singleton among the moving group, it is always the target. In the half-set condition, we anticipate that participants will use this information to speed search, always deploying attention to the singleton. Separately from the question of whether color similarity between target and static distractors causes a cost, we can ask whether any benefit for the singleton is preserved in the motion-segmented condition. The idea of second-order parallel processing is that processing of feature differences along one dimension might be constrained by segmentation along a second dimension. Thus, for example, Friedman-Hill and Wolfe showed that search for a bar with an unknown but odd orientation among a subgroup of a known color could be efficient (with large set sizes), despite the presence of bars with the target orientation in a different known color. Thus, participants could use color to define a subgroup and then probe just this subgroup for an odd orientation, ignoring orientation in the other group, but this process took time. Essentially, this is the way in which McLeod et al. (1988) initially explained their finding of efficient search for motion–form conjunctions. Form-based differencing operations could be restricted to moving objects, due to the physiological properties of a motion filter generating parallel search. Currently, it is unknown whether motion segmentation can restrict color processing to a moving subset, so that a singleton target in the moving group can retain its power to attract attention even in the context of other, same-colored nonmoving items. This was tested here.

Method

Participants

Eighteen (6 males) undergraduate students from the University of Birmingham, between 18 and 20 years of age (M = 19), who took part in return for course credit, were included in the analysis. Two additional participants with more than 15% errors were excluded. A further 12 participants (2 males),between 18 and 24 years of age (M = 19.8), took part in a control experiment to establish the relative efficiency of segmentation by motion and color.

Equipment

Stimuli were generated by a PowerMac Dual G4 computer, using routines programmed with the Psychophysics Toolbox extensions to MATLAB (Brainard, 1997) and presented on a Mitsubishi DiamondPro 17-in. monitor.

Stimuli

The search displays (see Fig. 1) were made up of a random selection of the uppercase distractor letters HIVX, together with one target letter, either Z or N (which occurred equally often). Viewing distance was approximately 40 cm. The letters measured 0.6 × 0.6 cm (0.86° × 0.86°) and were composed of lines 0.6 mm (0.086°) wide. The letters were positioned at random in the cells of an 11 × 11 grid of 121 cells (excluding the center cell, which contained the fixation cross). The stimuli were bounded by an outline frame 0.3 mm (0.043°) wide, measuring 18 × 18 cm (25.4° × 25.4°), and the display center was marked by a fixation cross, 0.6 × 0.6 cm (0.86° × 0.86°), with each component line 0.3 mm (0.043°) wide. Static letters appeared centered within a cell, and moving letters were initially offset to the end point of the path they would move through. Motion took the form of a linear up/down oscillation (1.3 cm/s, 1.86 deg/s) centered on the relevant cell (magnitude of oscillation, 0.36 cm, 0.56°). Initial motion direction (up or down), was random across trials.

Fig. 1
figure 1

Example stimulus display from the motion-segmented condition. Different levels of gray stand in for red and green. Arrows indicate oscillating motion. Full-set and half-set displays were constructed with reference to the motion segmentation condition, such that in the full-set case, everything moved, and in the half-set case, only the moving half of items was presented

Design and procedure

The design consisted of three factors: condition (motion segmentation, full-set, half-set) × target color (target singleton, target nonsingleton) × set size (12, 24). The critical case was the motion-segmented condition (illustrated in Fig. 1). Here, either 12 or 24 letters were presented; half of the letters were static, and the other half, including the target, moved. The static items were uniformly colored (red or green). The moving distractors were the opposite color. Target color was manipulated such that on half the trials, the target was a singleton in the moving group (sharing its color with the static items;target singleton case). On the remaining trials, the target was the same color as the moving items, differing in color from the static group (target nonsingleton case). Two baseline conditions were also included. In the full-set condition, all (12 or 24) items moved. Thus, here, the target color manipulation was simply that in the target singleton case, the target was in a slight majority color group (7 vs. 5 or 13 vs. 11), whereas in the target nonsingleton case, the two color groups were of equal size. In the half-set condition, only the moving items from the motion-segmented condition were presented; thus, displays contained only 6 or 12 items. On half the trials, the target was a color singleton, and on the other trials, it was identical in color to the moving distractors (singleton and nonsingleton trials). In all conditions, the task was to identify the target form (Z or N), by pressing the Z or N key on the keyboard.

Each participant completed the three conditions in separate blocks of 144 trials (30 trials for each combination of set size and target color, plus 24 practice trials at the start of each block). The order of the three blocks was counterbalanced over participants such that each block occurred equally often in each position. Half of the participants viewed displays in which the static distractors were red and the moving distractors green, and the other half had the opposite assignment.

Each trial was initiated by a keypress from the participant. Each trial began with a blank screen for 100 ms, followed by the outline square and fixation cross for 500 ms. The search stimuli then appeared and began to move immediately. The display was cleared when the participant responded, and the next trial began.

A control experiment assessed the relative utility of each segmentation cue, in order to verify that the cues as implemented were effective. A separate group of participants searched for a Z or N target as in the main experiment. Three conditions were compared: (1) full-set, in which all items appeared in the target color and all items moved; (2) motion segmentation, in which all items appeared in the target color but half were stationary; and (3) color segmentation, in which all items were moving but half the items appeared in the target color and half in the nontarget color. Target color (red or green) was counterbalanced over participants. As with the main experiment, participants first completed a short practice block (12 trials with each condition), followed by a longer block of each condition (72 trials), 30 for each set size (12 and 24) plus 12 practice trials at the start of each block, with order counterbalanced over participants. All other details were as for the main experiment.

Results

Control experiment: Relative effectiveness of segmentation by color and motion

Accuracy

Mean percentages of error are presented in Table 2. An ANOVA with the factors of condition (full-set, motion segmentation, color segmentation) and set size (12, 24 items) revealed no significant effects, all Fs < 2.1, all ps > .15.

Table 2 Percentages of error in the control experiment

RT

Incorrect RTs (4.4%) and RTs that were >10 s or <0.2 s (0.19%) were removed prior to analysis (the resulting mean correct RTs are plotted in Fig. 2). An ANOVA with the factors of condition (full-set, motion segmentation, color segmentation) and set size (12 or 24 items) was used. There were main effects of set size, F(1,11) = 73.017, p < .0001, and condition, F(2,22) = 22.357, p < .0001, as well as an interaction between the two, F(2,22) = 15.087, p < .0001. Performance was, overall, both faster and more efficient in the segmented conditions, as compared with the full-set condition (slopes of 72.3, 41.8, and 37.5 ms/item in the full-set, motion-segmented, and color-segmented conditions, respectively). Direct comparison of the two segmentation conditions showed that neither overall RT nor efficiency differed significantly between the two conditions, Fs < 2.5, ps > .147. Thus, motion and color segmentation, as implemented here, are equally effective cues for the guidance of search.

Fig. 2
figure 2

Search response time as a function of condition (separate lines) and set size (horizontal axis). Error bars show standard errors of the means. Search slope in each condition is shown by each corresponding line

Main experiment: Negative color carry-over effects

Accuracy

Mean percentages of error are presented in Table 3. An ANOVA with the factors of condition (full-set, motion segmentation, color segmentation), target color (singleton or nonsingleton), and set size (12 or 24 items) revealed an effect of target color, F(1,17) = 21.206, p < .0001, and an interaction between target color and condition, F(2,34) = 4.814, p < .05. Separate analyses by condition revealed that errors increased significantly as a function of target color only in the motion-segmented and half-set conditions, F(1,17) = 14.853, p < .001, and F(1,17) = 18.204, p < .001, respectively, but in neither case was the effect of set size significant, Fs < 1. There was no effect of target color in the full-set condition, F < 1.

Table 3 Percentages of error in the main experiment

RT

Incorrect RTs (3.1%) and RTs that were >10 s or <0.2 s (0.19%) were removed prior to analysis (the resulting mean correct RTs are plotted in Fig. 3).

Fig. 3
figure 3

Search response time (RT) as a function of condition (separate lines), set size in the full-set and motion segmentation conditions (horizontal axis), and target color (left vs. right panel). Note that set size in the half-set condition is half of the value indicated. The target color nonsingleton is shown in the left panel, and target singleton cases are shown in the right panel. Error bars show standard errors of the means. Search slopes,in milliseconds/item,are shown by each corresponding line (note that slope in the half-set condition is calculated using actual set size, 6 and 12 items)

Motion segmentation versus full-set

A three-factor 2 × 2 × 2 within-subjects ANOVA was carried out with the factors of condition (full-set, motion segmentation), target color (singleton or nonsingleton), and set size (12 or 24 items). This revealed a significant three-way interaction, F(1, 17) = 6.124, p < .05. Separate analyses of each target color revealed that, in the nonsingleton case (where the target and static distractors differed in color), performance was, overall, both faster, F(1,17) = 35.194, p < .0001, and much more efficient in the motion-segmented condition, [F(1,17) = 13.869, p < .005, for the set size × condition interaction; slopes of 41.2 vs. 71.3 ms/item in the motion segmentation and full-set conditions, respectively]. In contrast, with singleton displays (where the target shared color with the static distractors), a very different pattern held. Despite faster responses, overall, in the motion segmentation condition, F(1,17) = 12.073, p < .005, performance was equally inefficient in both conditions (F < 1 for the set size × condition interaction; slopes of 66.1 vs. 69.5 ms/item in the motion segmentation and full-set conditions, respectively). Critically, in the full-set condition, there was no tendency for performance to vary as a function of target color either as a main effect or as an interaction with set size (Fs < 1.6, ps > .2, slopes of 71.3 vs. 69.5). In this case, the presence of a slight majority color group in the target singleton case did not affect performance.

Motion segmentation versus half-set

Comparisons against the half-set baseline revealed essentially the opposite pattern to the full-set condition. An ANOVA with the same factors as above was used. Again, there was a three-way interaction, F(1, 17) = 64.881, p < .0001. Separate analyses by target color revealed that, in the nonsingleton case, responses were, overall, faster in the half-set condition, F(1,17) = 26.873, p < .0001, and there was a main effect of set size, F(1,17) = 95.104, p < .0001. However, there was no sign of a search condition × set size interaction, F(1, 17) = 1.762, p = .202. This is consistent with static distractors being excluded from search. In contrast, in the singleton case, performance was, overall, faster, F(1,17) = 163.361 p < .0001, and much more efficient [F(1,17) = 110.434, p < .0001, for the set size × condition interaction] in the half-set, as compared with the motion segmentation, condition (slopes of 2.4 vs. 66.1 ms/item, respectively). In the motion segmentation condition, the search slope almost doubled (41.2 vs. 66.1 ms/item) in the singleton case when the target was the same color as the static items [F(1, 17) = 16.342, p < .001, for the set size × target color interaction]. In contrast, the opposite was true in the half-set condition, where the set size × color interaction, F(1, 17) = 56.463, p < .0001, was driven by markedly inefficient search in the nonsingleton case [a statistically significant effect of 72.1 ms/item, F(1,17) = 65.534, p < .0001], becoming markedly efficient in the singleton case [statistically nonsignificant effect of 2.4 ms/item, F(1,17) = 2.41, p = .139].

Discussion

When the target was a color singleton in a moving display (half-set condition), detection was advantaged, as compared with the nonsingleton case. This result is to be expected, since the presence of the singleton signaled the location of the target. Under these circumstances, the singleton will generate a strong bottom-up signal early in processing, capturing (e.g., Theeuwes, 1992) or guiding (e.g., Wolfe, 1994) attention to the target. What is remarkable about the present results is that the addition of a group of static items different in color from the moving distractors turned this singleton advantage into a large negative cost. Importantly, this occurred even though motion was a very effective guidance cue (control experiment).

The present findings clearly show that performance in the motion segmentation condition was not mediated independently by motion, contrary to many accounts of search (see, e.g.,McLeod et al., 1991; Treisman & Sato, 1990). Search efficiency in moving displays was influenced by the color of the target. It is clearly the case that color processing in these displays cannot be constrained to apply only to the relevant moving group (second-order parallel processing; cf. Friedman-Hill & Wolfe, 1995). If this were possible, surely participants would be motivated to exploit such a signal, since, when present, it assists target detection. Thus, one implication of the present result is that second-order parallel processing is not immediately available in these displays. Likewise, in the case of color and orientation, Friedman-Hill and Wolfe showed that although second-order parallel processing was possible, it was time consuming, operating only at high set sizes when overall RT was long. Thus, in this regard, motion- and color-based segmentation may be similar; further experiments will be needed to determine the extent of this similarity (e.g., is second-order parallel processing of color on the basis of motion possible with longer processing?).

Crucially, in the present study, not only did we abolish facilitation by the color singleton in the motion segmentation condition, but also we generated a significant negative cost. When the target shared its color with the moving distractors, motion segmentation could be exploited, and performance was more efficient than in the full-set condition. In contrast, when the target shared color with the static distractors, there was a cost such that (1) target detection was no more efficient in the motion-segmented than in the full-set condition and (2) participants were many hundreds of milliseconds slower to detect the target, relative to the half-set baseline. This negative effect of color similarity between the moving target and the static distractors cannot have been due to the target color being in a slight majority in the singleton case, since there was no such cost in the full-set condition. Nor is it likely that participants fell back on searching by color (prioritizing items in the nonsingleton color) in the face of an ineffective motion cue, since, in the control experiment, motion was highly effective in guiding search through uniformly colored displays.

The present findings are interesting in relation to studies of negative color carry-over effects in preview search (Braithwaite et al., 2003, 2004; Olivers & Humphreys, 2003). Typically, in those studies, negative effects of color emerge only in a preview condition with temporal separation between two groups of stimuli. However, here we present the first evidence that such large negative effects of color can occur with simultaneous displays, under some circumstances. Negative carryover costs are not confined to preview search.

The negative color effect observed in the present experiment challenges models in the guided search family, regardless of whether guidance is proposed to be excitatory (e.g., McLeod et al., 1991; Wolfe, 1994) or inhibitory (e.g., Treisman & Sato, 1990). These models hold that guidance should be based only on the task-relevant feature of motion. Clearly, this was not the case. In contrast, the negative color effect observed can be accounted for by the process of spreading suppression posited within AET. Targets that group by color with the static objects inherit some the negative loss of attentional weight suffered by the nonmoving items.