One key technique for the study of visual perception and awareness is masking (Breitmeyer & Öğmen, 2006). There are several different types of visual masking, all involving decreased discrimination or detection of a target image when a second stimulus, a mask, is presented in close spatiotemporal proximity to it. While the study of visual masking has a long history going back over a century, in 1997 a new form was discovered: masking via object substitution (Enns & Di Lollo, 1997). In a standard OSM paradigm, a sparse four-dot mask surrounds, but does not overlap with, the target in space. In addition, the features making up the mask are highly distinct from those that constitute the target. Typically, the mask onsets with the target, and when there is a delayed offset relative to the target, masking is observed (Di Lollo, Enns, & Rensink, 2000; Enns, 2004; Enns & Di Lollo, 1997, 2000; Goodhew, Pratt, Dux, & Ferber, 2013).

Until recently, it was thought that the key critical condition for OSM to be observed was the dispersing of spatial attention at the point of stimulus presentation. To achieve this, most studies presented the target at an unpredictable location and in the presence of multiple distractor stimuli (Enns, 2004; Enns & Di Lollo, 1997). A key finding, supporting the role of spatial attention in OSM, was that masking magnitude increased with distractor set size. In addition, convergent validity for the role of spatial attention in this form of masking came from studies showing that when spatial attention could be rapidly orientated to the target, for example, via cueing, masking was reduced (Germeys, Pomianowska, De Graef, Zaenen, & Verfaillie, 2010). Manipulations of other forms of attention have also been used in OSM settings, including executive attention. Here, executive attention refers to “high level” attention processes relating to cognitive control—which refers to operations associated with being able to flexibly adapt information processing based on task goals (Miller & Cohen, 2001). Specifically, Dux, Visser, Goodhew, and Lipp (2010) presented the to-be-masked target stimulus as Task 2 in a dual-task paradigm and manipulated the demands of the first task. Under these conditions, OSM was induced if the interval between the two tasks was sufficiently short in duration (i.e., when executive/temporal attention was sufficiently taxed).

An influential account of OSM suggests masking may be borne out of the interplay between feedforward and recurrent processes in the brain that interact to resolve conflicting hypothesis of the target and mask representations in the visual system and more anterior executive regions (Di Lollo et al., 2000). By this model, a low-resolution representation of the target and mask is formed through feedforward processes from visual cortex to higher level regions. The resulting perceptual hypothesis is then fed back, via recurrent processes, to early visual cortex to verify and compare to the initial representation. However, under conditions where only a mask remains physically present, a new feedforward representation has formed, and hence there is conflict between the recurrent (target and mask) and feedforward (mask only) hypotheses. This account predicts that masking is more likely when the initial target representation is weakened through dispersed attention, as a greater number of recurrent iterations may be required to check the less reliable hypothesis. The theme of attention playing a key role in OSM is also present in two other key models. First, the lateral inhibition model (Macknik & Martinez-Conde, 2007), where masking occurs via lateral inhibition processes when the representations have been made sufficiently weak via dispersed attention. Second, the attentional gating model (Põder, 2013), where signal strength for the mask is greater, and essentially overrides the target, for delayed offset timings when attention was distributed at the start of the trial.

Recent studies have now provided definitive evidence that OSM magnitude does not interact with spatial attention demands (Argyropoulos, Gellatly, Pilling, & Carter, 2013; Camp, Pilling, Argyropoulos, & Gellatly, 2015; Filmer, Mattingley, & Dux, 2014). Specifically, these studies have shown that past reports finding a strong link between spatial attention and OSM have been driven by ceiling effects for the smaller distractor set size conditions. Indeed, OSM has now even been reported for a target, presented at the fovea and fully attended both in space and time, when unmasked performance is brought off ceiling (Daar & Wilson, 2016; Filmer, Mattingley, & Dux, 2015). These results have led to questions regarding whether OSM is in fact a phenomenon independent of attention.

However, attention is not a unitary construct (Woodman, Vogel, & Luck, 2001). Thus, it is still possible that nonspatial forms of attention could interact with masking. Indeed, manipulations of spatial attention inherently involve potential confounds of crowding (Camp et al., 2015; Vickery, Shim, Chakravarthi, Jiang, & Luedeman, 2009), and spatial uncertainty. Such confounds may interact with masking processes (Camp et al., 2015), and hence complicate the assessment of the role of attention in OSM. In contrast, manipulations of executive/temporal attention avoid these issues as the presentation conditions of the target and mask stimuli can be identical across attention manipulations.

Here, we reexamined the role of executive/temporal attention in OSM when unmasked performance for the target stimulus was brought off ceiling—an important issue not address by Dux et al. (2010). We first replicate our previous findings of foveal OSM (Filmer et al., 2015), but with novel target stimuli and a thresholding procedure. Then, across two subsequent experiments, we added a task immediately prior to the to-be-masked target stimulus (a dual-task paradigm) to assess the impact of executive/temporal attention load on OSM magnitude. To preview the results, our findings support there being no interaction between executive/temporal attention and OSM. Collectively, with the literature discussed above, this result supports a clear dissociation between attention and OSM, suggesting OSM reflects low-level processes in the visual system that are independent of attention.

Experiment 1

For the first experiment, we developed a new target stimulus and performance thresholding procedure. The aim here was to replicate masking for a fully attended and foveated stimulus (Filmer et al., 2015), with the new paradigm that we could then examine with an executive/temporal attention load manipulation in subsequent experiments.

Method

Twenty-four participants completed Experiment 1 (20 females, mean age = 20 years), all with normal or corrected-to-normal vision. Stimuli were presented to participants using a 21-in. CRT monitor set to a 100 Hz refresh rate (an example trial is shown in Fig. 1). Target images consisted of a diamond shape of visual noise with a black, semitransparent, diamond overlaid (width: 1.1°; see Fig. 1a). The background color was white. The black diamond had one of the four points missing (size of the missing point = 0.24°), and participants were required to indicate which of the four points were missing when prompted with a response cue (presented in Helvetica 24-point font, 1.16 degrees above the center of the screen) 500 ms after target onset. Responses were made using the arrow keys on a keyboard. On most trials, a standard four-dot mask onset with the target stimulus and offset at variable intervals after the target (0, 90, 180, 270, 360, or 450 ms). The mask appeared in close proximity to the target stimulus, with a gap of one pixel between the edge of the stimulus and the edge of the mask. Trials were included with no four-dot mask to check baseline performance. There were a total of 80 trials per condition for the main task, split into 10 blocks.

Fig. 1
figure 1

Experiment 1 trial outline. a Example target stimulus, with the right point of the diamond missing. b Each trial began with a fixation point (600 ms), followed by the target stimulus (10 ms). On most trials, a four-dot mask was presented with the target, and offset either at the same time as target image or after a delay of up to 450 ms. A blank screen was shown until 500 ms had passed from target onset, and participants were then prompted to respond as to which of the four points of the diamond was missing using the arrow keys on a keyboard

Before completing the OSM task, participants performance at detecting the missing point of the diamond was thresholded to approximately 70% accuracy using a PEST staircase procedure (Taylor & Creelman, 1967) when no four-dot mask was present. The difficulty of the task was manipulated by varying the transparency of the black diamond, from zero (fully transparent) to 255 (fully opaque). The mean threshold across participants was 114 (SD = 29).

Results and discussion

The results of Experiment 1 are shown in Fig. 2. Accuracy for the no-mask condition was 76% (SD = 17), suggesting the thresholding procedure was successful in removing ceiling issues from the data. Overall, as the duration of the four-dot mask increased, performance decreased, plateauing at the lowest level of performance, around 180 ms mask offset. Indeed, this main effect of mask offset was significant, F(5, 115) = 14.362, p < .001, with a large effect size (ηp 2 = 0.384), supporting the presence of OSM. The results replicated those of Filmer et al. (2015), and confirm OSM for an attended and foveated target stimulus.

Fig. 2
figure 2

Experiment 1 results. Mean accuracy for each of the four-dot mask offset conditions. Error bars represented SEM for within-subjects variance (Loftus & Masson, 1994)

Experiment 2

In Experiment 2, we tested whether OSM would interact with executive/temporal attention load. The design used a dual-task approach, similar to that employed by Dux et al. (2010), but here, following a mathematical calculation for Task 1, our new thresholded target stimulus (Experiment 1) was used for the second task. If masking does interact with executive/temporal attention, then performance in Experiment 2 should vary as a function of Task 1 load.

Method

Twenty-four participants completed Experiment 2 (19 females, mean age: 20 years), all with normal or corrected-to-normal vision. The stimuli and method were the same as in Experiment 1, with the following exceptions. Four numbers were now presented on screen (Helvetica, 30-point font) before the masking task (see Fig. 3 for a trail outline). The numbers were presented at a rate of one per second (500-ms presentation time, plus 500-ms interstimulus interval), followed by the diamond stimulus either at a short (100 ms) or a long Task 1–Task 2 lag (800 ms). The four-dot mask onset with the diamond and offset either with the diamond (simultaneous offset) or 270 ms later (delayed offset). To manipulate Task 1 load, there were two block types, one where participants ignored the numbers and simply responded to the diamond (single task), and the second type where participants had to complete a calculation based on the numbers (#1 + #2 – #3 + #4) and respond whether the answer was odd or even via a keyboard as quickly and accurately as possible before responding to the diamond stimulus. There were a total of 64 trials per condition, split into 12 blocks. The blocks alternated between the two block types, with half of the participants starting with a single-task block, and the other half a dual-task block. Before completing the task, all participants were thresholded using the same PEST procedure described for Experiment 1 (mean threshold = 108, SD = 23).

Fig. 3
figure 3

Experiment 2 trial outline. Each trial began with a fixation point (600 ms), followed by a sequence of four numbers presented serially. Each number was shown for 500 ms, followed by a 500 ms blank screen (this was not the case for #4; see below). For dual-task blocks, participants had to calculate #1 + #2 - #3 + #4, and respond via a keyboard as to whether the answer was odd or even as quickly and as accurately as possible. For single-task blocks, these numbers were ignored. After the fourth number had been presented, there was a gap of 100 or 800 ms, followed by the diamond target stimulus with the four-dot mask. The mask could offset with the target, or 270 ms after the target. A blank screen was then shown until 370 ms had passed from the target offset. Finally, participants were prompted to respond as to the missing point of the diamond. For dual-task blocks, participants had to respond to the number before the diamond

Results and discussion

Task 2

Task 2 performance is shown in Fig. 4. For the simultaneous mask offset, accuracy for the diamond stimulus was 70% (SD = 14.3), suggesting the thresholding procedure was successful. Performance for the diamond task was entered into a repeated-measures ANOVA, with the factors of mask offset (simultaneous or delayed), task load (single or dual), and lag (short or long). As expected, there was an effect of mask offset with poorer performance for the delayed mask offset compared to simultaneous mask offset (masking magnitude = 16.11%, SD = 8.46), main effect of mask offset: F(1, 23) = 86.958, p < .001, ηp 2 = 0.791. Task load also modulated performance, with lower accuracy performance for the dual (mean accuracy = 59%, SD = 13.28) than single (mean accuracy = 65%, SD = 12.29) task trials, F(1, 23) = 13.204, p = .001, ηp 2 = 0.365. Hence, under the more demanding dual-task conditions, accuracy was reduced for the to-be-masked target. The lag between Tasks 1 and 2 modulated Task 2 performance, F(1, 23) = 8.412, p = .008, ηp 2 = 0.268, with higher performance for the long (mean accuracy = 64%, SD = 12.25) than short (mean accuracy = 61% SD = 12.31) lag conditions. In addition, the Task 1–Task 2 lag modulated masking magnitude, with greater masking under short (masking magnitude = 20%, SD = 11.09) than long (masking magnitude = 13%, SD = 8.75) lags, F(1, 23) = 10.05, p = .004, ηp 2 = 0.304, reflecting the added demands of having to complete two tasks in rapid succession.

Fig. 4
figure 4

Experiment 2 results. Mean accuracy for Task 2 (diamond task) for the simultaneous and delayed masking conditions. Data plotted separately for the single- and dual-task trials, for the long (800 ms) and short (100 ms) lag conditions. Error bars represented SEM for within-subjects variance (Loftus & Masson, 1994)

Of particular interest in this experiment was whether the impact of task load interacted with masking magnitude, as this could indicate a relationship between masking and executive/temporal attention. There was no indication of an effect of task load, or an interaction of task load and lag, on masking magnitude (F < 1, ηp 2 < 0.03, for both). In addition, Bayesian analyses were conducted, and both the interaction between task load and masking (BF10 = 0.24) and the interaction between task load, task lag, and masking (BF10 = 0.06) carried strong support for the null hypothesis. Thus, in contrast to Dux et al. (2010), in this experiment executive/temporal attention load did not appear to interact with OSM.

Task 1

Overall, Task 1 was completed with 79% accuracy (SD = 7.76) and with a reaction time of 3,100 ms (SD = 1,501). Thus, Task 1 was both challenging, as performance was off ceiling, and completed successfully. For accuracy, there was no effect of task lag on performance, F(1, 23) = 1.697, p = .206, ηp 2 = 0.069, or of mask offset, F(1, 23) = 0.327, p = .573, ηp 2 = 0.014, and no interaction between the two, F(1, 23) = 2.558, p = .123, ηp 2 = 0.1. For reaction time, responses were generally faster for the short (mean RT = 2,950 ms, SD = 1,480) than long (mean RT = 3,271, SD = 1,557) lag trials, F(1, 23) = 11.523, p = .002, ηp 2 = 0.334. However, reaction times were not modulated by mask offset, F(1, 23) = 0.002, p = .967, ηp 2 = 0.000, and there was no interaction between mask offset and lag, F(1, 23) = 1.68, p = .208, ηp 2 = 0.068. Hence performance at Task 1 was relatively consistent across the lag and mask offset conditions.

Experiment 3

The findings of Experiment 2 suggest there is no interaction between masking magnitude and executive load. However, the influence of Task 1–Task 2 lag on masking magnitude could be driven by an attentional load effect, as shorter lag conditions may be more demanding than longer lag trials. Alternatively, for short lag trials, there may be a low-level, perceptual effect as the close temporal proximity of the final Task 1 stimulus gives rise to forward masking that leaves the target more susceptible to masking with OSM. To clarify which of these two accounts best characterize the results of Experiment 2, here we changed the stimuli used for Task 1 to auditorily presented numbers. If the influence of lag on masking magnitude in Experiment 2 was due to forward masking from the calculation task, then this interaction (Lag × Mask offset) should not be present in Experiment 3. If, however, the effect of lag on masking was due to executive/temporal attention, the interaction should remain.

Method

Twenty-four participants took part in Experiment 3 (16 female, mean age = 18 years, SD = 2). The methods were identical to Experiment 2, except the number stimuli were now auditory (spoken numbers, each lasting 500 ms) as opposed to visual. The mean threshold following the PEST procedure was 108 (SD = 34).

Results and discussion

Task 2

The results for Experiment 3 are shown in Fig. 5. For the simultaneous mask offset, accuracy for the diamond stimulus was 58%, (SD = 15.52), suggesting the thresholding procedure was adequate and again that there were not ceiling issues in this experiment. Performance for the diamond task was entered into a repeated-measures ANOVA, with the factors of mask offset, task load, and lag. Again, there was an effect of mask offset with poorer performance for the delayed than simultaneous mask offset conditions (mean masking magnitude = 13%, SD = 8.28), F(1, 23) = 58.746, p < .001, ηp 2 = 0.719. Task load also modulated performance, F(1, 23) = 16.909, p < .001, ηp 2 = 0.424, reflecting lower performance for the dual-task blocks (mean accuracy = 48, SD = 13.41) compared to the single-task blocks (mean accuracy = 56, SD = 10.8). Thus, again, the demands of a second task reduced overall performance. At the short lag performance was poorer (mean accuracy = 50%, SD = 11.45) than the longer lag (mean accuracy = 54%, SD = 11.58), F(1, 23) = 11.083, p = .003, ηp 2 = 0.325, again reflecting the added difficulty of having to complete two tasks in rapid succession.

Fig. 5
figure 5

Experiment 3 results. Mean accuracy for Task 2 (diamond task) for the simultaneous and delayed masking conditions. Data plotted separately for the single- and dual-task trials, for the long (800 ms) and short (100 ms) lag conditions. Error bars represented SEM for within-subjects variance (Loftus & Masson, 1994)

Of import, again we found no evidence that executive/temporal attention load influences OSM. Lag and Task 1 load did not modulate masking magnitude, nor did these factors interact (F < 1, ηp 2 < 0.04, for all). Bayesian analyses provided support for the null hypothesis for both the interaction between task load and masking (BF10 = 0.19) and the interaction between task load, task lag, and masking (BF10 = 0.06). Overall, the data support there being no influence of attention on OSM.

Task 1

Overall, Task 1 was completed with 78% accuracy (SD = 14.86), and with a reaction time of 3,902 ms (SD = 2351). Thus, as in Experiment 2, Task 1 was both challenging and completed successfully. For accuracy, there was no effect of lag on performance, F(1, 23) = 0.427, p = .52, ηp 2 = 0.018, or of mask offset, F(1, 23) = 0.234, p = .633, ηp 2 = 0.01, and no interaction between the two, F(1, 23) = 0.418, p = .524, ηp 2 = 0.018. For the reaction times, responses were generally faster for trials with a short lag (mean reaction time = 3,715 ms, SD = 2,433, compared to a long lag (mean reaction time = 4,089 ms, SD = 2,304), F(1, 23) = 9.672, p = .005, ηp 2 = 0.296. Response times were faster on the delayed offset trials (mean reaction time = 3,773 ms, SD = 2239) relative to simultaneous offset (mean reaction time = 4,030 ms, SD = 2483), F(1, 23) = 6.409, p = .019, ηp 2 = 0.218. The interaction between mask offset and lag was also significant, F(1, 23) = 4.544, p = .044, ηp 2 = 0.165, with a greater difference in response times between the mask offsets for the short lag (mean difference = 464 ms, SD = 643) than the long lag (mean difference = 50 ms, SD = 730). Hence the manipulations of task load, lag, and mask offset did affect the reaction times for responses to Task 1. Critically, although this implies the mask timings and task load conditions did modulate Task 1 performance for Experiment 3, this was not mirrored by any influence on Task 2 (masking).

Conclusions

We asked whether, in the absence of a ceiling effect on performance, executive/temporal attention interacts with foveal OSM. First, we confirmed the presence of foveal OSM with a novel stimulus set and thresholding procedure to that previously employed (Filmer et al., 2015). Then, across two experiments, we manipulated executive/temporal attention load with demanding a visual (Experiment 2) or auditory (Experiment 3) calculation task presented prior to the to-be-masked stimulus. While the additional attention demands present in Experiments 2 and 3 did lead to an overall reduction in accuracy, there was no evidence that this added difficulty interacted with masking magnitude. Thus, executive/temporal attention and OSM appear to be independent of one another. This finding is at odds with those of Dux et al. (2010), and suggest a ceiling effect for the simultaneous mask offset condition may have led to the appearance of an interaction—a common issue in previous OSM studies (Enns, 2004; Enns & Di Lollo, 1997; Germeys et al., 2010).

The removal of ceiling limits in the data has provided a sensitive measure of OSM that, with adequate power, should allow for moderators of masking to be identified. In the experiments reported here, we believe we had adequate power to detect modulations from executive attention for the following reasons. First, our sample size was equal to or greater than past demonstrations of masking (Argyropoulos et al., 2013; Daar & Wilson, 2016; Enns & Di Lollo, 1997; Filmer et al., 2014, 2015; Pilling, Gellatly, Argyropoulos, & Skarratt, 2014), including Dux et al. (2010), where modulations with temporal/executive attention were detected. Second, the effect sizes are small and there is no hint of a numerical trend for an interaction between task lag, task load, and mask offset in Experiments 2 or 3. Finally, Bayesian analyses revealed strong support for the null hypothesis against an interaction for both experiments. Thus, it does not appear that temporal attention, via manipulations of task load and lag, influences masking in OSM.

While there was no influence of task load on masking magnitude, in Experiment 2 the lag between tasks did modulate masking. This was most likely due to perceptual interference—as, at the short lag, the last number of the calculation sequence occurred in close temporal proximity to the diamond target. This close temporal presentation may have led to the last number in the calculation sequence acting as a forward mask that weakened the target representation and resulted in greater masking with OSM. Indeed, when, in Experiment 3, the stimuli for Task 1 were changed to the auditory domain—removing modality specific perceptual overlap—this interaction disappeared. The source of the increased masking with the visual number stimulus is not likely to be due to an overall reduction in performance as accuracy for the simultaneous offset condition was not impaired. It is not clear why a forward mask of a number would specifically increase susceptibility of the target to masking, but it could relate to the level of noise in the target percept or could lead to difficulties individuating the target as a separate event. Future research to ascertain the precise conditions that can modulate masking may provide unique insight into the mechanisms behind OSM.

Previous reports of OSM with full attended and foveated stimuli provided completing evidence that attention does not need to be dispersed for masking to occur (Daar & Wilson, 2016; Filmer et al., 2015). However, this did not preclude the possibility of an interaction between types of attention and masking. The results from this study, in the context of the finding that masking does not appear to interact with manipulations of spatial attention (Argyropoulos et al., 2013; Camp et al., 2015; Filmer et al., 2014), provide a compelling account of OSM as a phenomenon that is independent of attention. In turn, this suggests that the mechanism of masking in OSM is a relatively low-level process in the visual system, operating independently of higher-level attention processes.

These findings challenge accounts of OSM that implicate attention as playing a key role in masking, such as the lateral inhibition model (Macknik & Martinez-Conde, 2007), and the attentional gating model (Põder, 2013). However, they do not necessarily conflict with accounts based predominantly on nonattentional processes in the visual system (e.g., interactions between feedforward and reentrant processing; Di Lollo, 2014; Di Lollo et al., 2000). Electrophysiological recordings taken during OSM may help to elucidate the precise basis of masking, along with the application of models to data such as reported here and in previous instances of foveal OSM (Daar & Wilson, 2016; Filmer et al., 2015). Given the similarity of the conditions for OSM compared to other forms of masking that are independent of attention (e.g., metacontrast masking; Agaoglu, Breitmeyer, & Ogmen, 2016), these findings also draw into question whether OSM is indeed a “special” form of masking.