Contemporary models of written-word recognition and reading in the Roman alphabet share the assumption that lexical access takes place on the basis of case-invariant abstract letter representations that are attained early in processing (Grainger, Dufau, & Ziegler, 2016). For simplicity’s sake, these models assume a minimal/null role of visual similarity across letters in lexical access. Using the default parameters in the interactive activation model (Rumelhart & McClelland, 1982) and its successors (e.g., spatial coding model; Davis, 2010), the visually similar substituted-letter prime PEQPLE is as effective at activating the word PEOPLE as the visually dissimilar substituted-letter prime PEYPLE (i.e., each condition yielded 60 processing cycles in masked priming lexical decision using Davis’s, 2010, simulator)—note that O and Q share all features but one at the feature-letter level (), whereas O and Y do not share any features (). Likewise, other leading models posit that all letters are equally confusable (Bayesian reader model: Norris, 2006; Rationale model of eye movements in reading: Bicknell & Levy, 2010) so they would also predict similar word identification times for PEQPLE–PEOPLE and PEYPLE–PEOPLE.

Nonetheless, if we assume that it takes time for the cognitive system to encode letter identity (or letter position), visual similarity across letters should have an impact in the early phases of word processing. Clearly, if PEQPLE–PEOPLE produces faster word recognition times than PEYPLE–PEOPLE, modelers should make an effort to develop in greater depth the underpinnings of the links between the feature and letter levels (i.e., this finding could be used as a benchmark for what is there to simulate). An analogy with letter position coding is relevant here: The slot-coding schemes in the interactive activation model and the Bayesian reader were admittedly oversimplifications (see Norris, 2006, p. 346; Rumelhart & McClelland, 1982, p. 89). The literature on letter transposition effects in past decades ruled out these schemes and led to more sophisticated accounts of letter position coding (e.g., spatial coding model: Davis, 2010; noisy Bayesian reader model: Norris, Kinoshita, & van Casteren, 2010).

Visual similarity effects have been reported with letter-like digits and letter-like symbols with the masked priming technique. For example, Perea, Duñabeitia, and Carreiras (2008) found that lexical decision times for a target word like MATERIAL were faster when preceded by a visually similar digit prime (M473RI4L; i.e., a prime that included letter-like digits such as 4 = A, 3 = E or 7 =T) or a visually similar symbol prime (e.g., MΔT€R!ΔL) than when preceded by a control prime (M568RI2L or M□T%R?□L; see Kinoshita & Lagoutaris, 2010; Lien, Allen, & Martin, 2014, for converging evidence). Furthermore, visually similar digit/symbol primes were nearly as facilitative as identity primes (Perea et al., 2008).

As visual similarity with letter-like digits/symbols plays a role early during written-word identification, one would expect a parallel effect with visually similar substituted-letter primes. Indeed, a number of experiments on letter identification have obtained effects of visual similarity (e.g., the letters B and R are more confusable than B and G; for review, see Mueller & Weidemann, 2012). However, the empirical evidence in the word recognition literature is scarce and nonconclusive. In a single-presentation lexical decision task, Perea and Panadero (2014) found more “word” responses to viotin-type nonwords (i.e., one-letter different nonwords that looked visually similar to their base word [violin]) than to viocin-type nonwords in individuals with dyslexia, but the effect did not occur in normally reading individuals. To study in detail the effects of visual similarity early in processing, masked priming is a better option than a single-presentation paradigm. Kinoshita, Robidoux, Mills, and Norris (2013) conducted a masked priming lexical decision experiment in which a target word (e.g., abandon) could be preceded by (a) a visually similar digit prime (484NDON; 4 is visually similar to A and 8 is visually similar to B); (b) a visually dissimilar digit prime (676NDON); (c) a visually similar letter prime (HRHNDON; H is visually similar to A and R is visually similar to B); or (d) a visually dissimilar letter prime (DWDNDON)—they also included an identity priming condition (ABANDON) and an unrelated condition (PRODUCT). As in prior research, they found faster word identification times for 484NDON–abandon than for 676NDON-abandon—the word identification times for 484NDON–abandon were nearly the same as those for ABANDON–abandon. But the critical finding was that they failed to find a significant difference between HRHNDON–abandon and DWDNDON–abandon.

To explain the null visual similarity effect for substituted-letter primes, Kinoshita et al. (2013) suggested that “the letter representations A and H may be connected by a bidirectional inhibitory link, such that the activation of one drives down the activation of the other” (p. 828) However, a closer look at the priming effects reported by Kinoshita et al. (2013) reveals an 8-ms advantage of HRHNDON–abandon over DWDNDON–abandon (p = .0982)—there were only 20 items/condition (N = 37; 740 data points in each cell). In a recent simulation study, Stevens and Brysbaert (2016) claimed that “a properly powered experiment requires at least 1,600 word observations per condition for the orthographic priming study” (p. 2). Thus, it may be premature to conclude that visual similarity across letters does not play a role early in word processing.

The goal of the current masked priming lexical decision experiments was to examine whether visual similarity across letters plays a role in the early phases of written-word recognition using a large number of data points per condition (2,160; 80 items/condition; N = 27 in each experiment). Each target word was briefly preceded by (a) a lowercase identity prime; (b) a visually similar substituted-letter prime; or (c) a visually dissimilar substituted-letter prime. In Experiment 1, we used two critical letters (u and v) that had a high degree of estimated visual similarity: 4.93 in a 7-point Likert scale (Simpson, Mousikou, Montoya, & Defior, 2012) (e.g., neutral–NEUTRAL vs. nevtral–NEUTRAL vs. neztral–NEUTRAL). Experiment 2 was designed to replicate Experiment 1 with a different set of words; furthermore, the critical letters (i and j) had an even greater degree of visual similarity (5.17 out of 7; Simpson et al., 2012).

The predictions are clear. If each letter only activates its own representation early in word processing, possibly via bidirectional inhibitory links across letters (e.g., the letter j, but not i or o, would activate the abstract representation of j), one would expect a similar advantage of the identity condition over both the visually similar and visually dissimilar letter conditions. This outcome would not require any major modifications in contemporary models of written-word recognition. Alternatively, if visual letter similarity plays a role in the early phases of written-word recognition (e.g., the letter j, and to some degree the visually similar letter i, would activate the abstract representation of j), one would expect an advantage of the visually similar letter condition over the visually dissimilar letter condition. This result would require more elaborated accounts of the feature/letter levels in models of written-word recognition.

Experiment 1

Method

Participants

The participants were 27 undergraduate students from the Universitat de València. All of them were native speakers of Spanish with normal/corrected-to-normal vision. Written informed consent was obtained from all participants.

Materials

We selected two hundred and forty Spanish words from the EsPal subtitle database (Duchon, Perea, Sebastián-Gallés, Martí, & Carreiras 2013). The average Zipf frequency was 3.67 (range: 1.71–5.91), the average number of letters was 7.5 (range: 5–11), and the average Levenshtein distance (OLD20) was 2.1 (range: 1.2–4.3). All words had the letters u or v in an internal position (e.g., NEUTRAL; CAVERNA [cavern]). Target words were presented in capital letters and were preceded by (a) an identity prime in lowercase (identity condition; neutral–NEUTRAL; caverna–CAVERNA); (b) a nonword prime in lowercase, in which the letter u/v from the base word was replaced by v/u (visually similar letter prime condition; nevtral–NEUTRAL; cauerna–CAVERNA); or (c) a nonword prime in lowercase, in which the letter u/v from the base word was replaced by a visually dissimilar letter—each keeping a neutral form (i.e., letters with no ascenders/descenders) and the same consonant/vowel status as the visually similar letter prime (visually dissimilar letter prime condition; neztral–NEUTRAL; caoerna–CAVERNA). Likewise, we created 240 nonwords, with the same length as words, using Wuggy (Keuleers & Brysbaert, 2010). All nonwords had the letter u or v in an internal position (e.g., CARCURA; OLCLIVO) and the same prime-target manipulation as that for the words (i.e., an identity condition, a visually similar letter prime condition, and a visually dissimilar letter prime condition). To counterbalance the prime-target pairs across conditions, we created three lists in a Latin square manner. The words/nonwords are available at http://www.uv.es/amarhe5/VisSim.pdf

Procedure

Participants were tested individually in a silent room. We used DMDX to present the stimuli and register the responses (Forster & Forster, 2003). Participants were informed that, in each trial, they would be presented with a letter string that could form (or not) a word in Spanish. Their task was to press, as quickly and as accurately as possible, the key for [yes] or no. The sequence of each trial was as follows: (1) a pattern mask composed of a series of #’s was presented in the center of a CRT screen for 500 ms (the length of the mask was the same as the length of the prime/target); (2) a prime stimulus (in lowercase) replaced the mask in the same spatial location for 50 ms; and (3) a target (in uppercase) replaced the prime in the same spatial location until the participant responded (or 2 s had elapsed). All stimuli were presented in a fixed-width font (14-pt Consolas). Stimulus presentation was randomized for each participant. Sixteen practice trials preceded the 240 experimental trials. The whole session lasted for approximately 18–20 minutes.

Results and discussion

Error responses (6.0 % for words; 3.6 % for nonwords) and correct response times (RTs) shorter than 250 ms (0.0 % of the data) were omitted from the latency analyses. The mean RTs for correct responses and accuracy are displayed in Table 1. As in the Kinoshita et al. (2013) experiment, we focused on the word targets—note that masked form priming effects for nonwords tend to be unreliable.

Table 1 Mean lexical decision times (in ms) and accuracy (in parentheses) for words and nonwords in Experiment 1

To examine the effect of type of prime (identity [ID], similar letter [SIM], dissimilar letter [DIS]), we conducted linear mixed-effects models using the lme4 and lmerTest R packages. Because of the positive skew of the RT data, we employed −1,000/RT (instead of raw RT) as the dependent variable in the latency analyses. There were 6,247 observations. We coded the levels of type of prime so that the model would test the two comparisons of interest (i.e., SIM vs. DIS and ID vs. SIM). The model included random intercepts for subjects and items as well as the by-subject and by-items random slopes for type of prime (i.e., the maximal random effects structure). The analyses on the accuracy data were modeled using the glmer function in R (lme4 package), where the accuracy data were coded as binary values (1 = correct, 0 = incorrect).Footnote 1

The statistical analyses of the word identification times showed an 11-ms advantage of the SIM priming condition over the DIS priming condition (613 vs. 624 ms, respectively), t = 2.83, p = .005. In addition, there was a significant 10-ms advantage of the ID priming condition over the SIM priming condition, t = -2.53, p = .012.

The statistical analyses of the accuracy showed higher accuracy in the SIM priming condition than in the DIS priming condition (0.968 vs. 0.953, respectively), z = -2.45, p = .014, whereas there were no differences in accuracy between the ID and SIM priming conditions (0.971 vs. 0.968, respectively), |z| < 1.

The results are straightforward: We found a significant 11-ms advantage of the SIM condition over the DIS condition—this effect was virtually the same for uv and v uv substitutions.

This finding suggests that visual similarity across letters does play a role in the early phases of written-word recognition. To reach firmer conclusions, it is important to replicate the experiment with another set of stimuli and critical letters. To that end, we designed Experiment 2. This experiment was parallel to Experiment 1, except that we employed the letters i/j instead of the letters u/v as the visually similar letters.

Experiment 2

Method

Participants

Twenty-seven students from the same population as in Experiment 1 took part in the experiment. None of them had participated in Experiment 1.

Materials

We obtained a set of 240 Spanish words from the EsPal subtitle database (Duchon et al., 2013). The average Zipf frequency was 4.08 (range: 3.33–5.50), the average number of letters was 7.6 (range: 5–11), and the average Levenshtein distance was 2.2 (range: 1.3–4.3). All words had the letters i or j in an internal position (e.g., DENTISTA [dentist]; PASAJERO [passenger]). For each target word, we created three primes: (1) an identity prime (dentista–DENTISTA; pasajero–PASAJERO); (2) a visually similar letter prime (dentjsta–DENTISTA; pasaiero–PASAJERO); (3) a visually dissimilar letter prime (dentgsta–DENTISTA; pasauero–PASAJERO). We also created 240 nonwords in the same manner as in Experiment 1. All nonwords had the letter i or j in an internal position (e.g., BESTINDA; MOMAJERA). The manipulation for the nonwords was the same as that for the words. The set of words/nonwords is available at http://www.uv.es/amarhe5/VisSim.pdf

Procedure

The procedure was the same as in Experiment 1.

Results and discussion

Incorrect responses (3.58 % for words and 6.03 % for nonwords) and correct response times (RTs) shorter than 250 ms (less than 0.02 %) were excluded from the latency analyses. The mean RTs for correct responses and accuracy are displayed in Table 2.

Table 2 Mean lexical decision times (in ms) and accuracy (in parentheses) for words and nonwords in Experiment 2

The statistical analyses were parallel to those in Experiment 1. That is, we examined the effect of type of prime (identity [ID], similar letter [SIM], dissimilar letter [DIS]) using linear mixed-effects models. There were 6,212 observations in the RT data. The model for the RT data included random intercepts for subjects and items as well as the by-subject random slopes for type of prime—the maximal random effects structure model did not converge.

The statistical analyses of the RT data showed a 19-ms advantage of the SIM priming condition over the DIS priming condition (606 vs. 625 ms, respectively), t = 5.05, p < .001, whereas there were no signs of a difference (<1 ms) between the ID and SIM priming conditions (606 ms in both conditions), |t| < 1.

The statistical analyses of the accuracy showed parallel results as the latency data: The SIM priming condition was responded to more accurately than the DIS priming condition (0.968 vs. 0.954, respectively), z = -2.45, p = .015, whereas there were no signs of a difference between the ID and SIM priming conditions (0.971 vs. 0.968, respectively), |z| < 1.

Thus, as in Experiment 1, we found an advantage of the SIM condition over the DIS condition. Furthermore, the SIM condition behaved similarly to the ID condition—note that the visual similarity of the critical letter pair was higher than in Experiment 1.

General discussion

The present masked priming experiments examined, using a large number of data points per condition (2,160), whether visual similarity across letters plays a role during the early phases of written-word recognition. Results showed a sizable advantage of the visually similar letter (SIM) condition over the visually dissimilar letter (DIS) condition: 11 ms in Experiment 1 (nevtral–NEUTRAL faster than neztral–NEUTRAL) and 19 ms in Experiment 2 (dentjsta–DENTISTA faster than dentgsta–DENTISTA). This finding is consistent with previous facilitative effects of visual similarity for letter-like digits (e.g., 4 in M4TERI4L–MATERIAL) and letter-like symbols (e.g., Δ in MΔTERIΔL–MATERIAL) in masked priming experiments (Kinoshita & Lagoutaris, 2010; Lien et al., 2014; Perea et al., 2008). The divergences between the present data and Kinoshita et al.’s (2013) data with substituted-letter primes are more apparent than real. Kinoshita et al. (2013) found an 8-ms advantage of the visually similar letter condition over the visually dissimilar letter condition (p = .0982), with a lower number of data points per condition (740).Footnote 2

The current findings have relevant implications for models of written-word recognition and reading. The presence of a masked priming effect of visual similarity with substituted-letter primes implies that there is some degree of ambiguity concerning letter identities in the early phases of word processing (see Norris et al., 2010) and, hence, models of written-word recognition should account for this effect (i.e., it can be used as a benchmark for future simulation studies). Indeed, a number of experiments have shown that word identification times (and eye fixation times) are longer for the word BRUNCH, which has a higher frequency one-letter different neighbor (BRANCH), than for a control word (e.g., BUFFET, which does not have higher frequency neighbors; Grainger, O’Regan, Jacobs, & Segui, 1989; Slattery 2009; see also Segui & Grainger, 1990, for masked priming evidence). Clearly, if a word’s letter identities were perfectly attained in the early phases of word processing, one would not expect neighborhood frequency effects to occur during written-word recognition and reading.

In their implementation of the Bayesian reader model, Norris et al. (2010) acknowledged that the assumption of similar confusability for all letters was “unlikely to be an accurate characterisation of human perception” (p. 347). Similarly, in their model of eye movements in reading, Bicknell and Levy (2010) indicated that this assumption was “ignoring work on letter confusability which could be added to future model revisions” (p. 1172). The same argument applies to those computational models of written-word recognition that employ the orthographic coding scheme of Rumelhart and McClelland’s (1982) interactive-activation model (e.g., spatial coding model; Davis, 2010)—these models assume an unrealistic letter feature level that only incorporates an uppercase font composed of straight lines. As Davis (2010) indicated, future implementations of these models should incorporate a more sophisticated letter coding scheme to encode letter representations from their visual features. Three of the main challenges for modelers are how to specify (1) the most diagnostic visual elements of letters (e.g., lines, curves, intersections, terminations) in the initial phases of word processing (see Blais et al., 2009; Rosa, Perea, & Enneson, 2016, for discussion); (2) how these visual features are dynamically weighted (see Wiley, Wilson, & Rapp, 2016)Footnote 3; and (3) how visual information is mapped onto abstract representations (see Grainger et al., 2016). Although a thorough description of these questions would be beyond the scope of this study, it is clear that additional research is needed to help determine the time course of visual similarity effects across letters in during written-word recognition.

In sum, we found an advantage of visually similar substituted-letter primes over visually dissimilar substituted-letter primes in the initial stages of word processing. This finding strongly suggests that future implementations of models of written-word recognition and reading should employ more refined letter-feature and letter levels.