The question of how we process words during reading has generated an extensive literature, probably because reading is such a frequent activity. One fruitful approach to studying reading and word recognition is to examine how characteristics of the word affect recognition using a lexical-decision task (LDT; i.e., determining whether a letter string forms a word or a nonword). Two of the commonly used manipulations of words are case mixing (e.g., mixed case [TeNnIs] vs. consistent lower case [tennis]) and word frequency (the frequency with which a word occurs in the language). Studies with a case-mixing manipulation have consistently revealed shorter response times (RTs) for consistent lower-case than for mixed-case words (the case-mixing effect). Studies with a word frequency manipulation have revealed shorter RTs for high- than for low-frequency words (the word frequency effect). Researchers generally agree that these two variables affect different processing stages, but they disagree about exactly which stages are affected (Allen, Smith, Lien, Grabbe & Murphy, 2005; Besner & McCann, 1987; Reingold, Yang & Rayner, 2010). The present study aims to shed light on this debate by using electrophysiological measures.

Case mixing versus word frequency

Most word recognition models assume that multiple processing stages are involved in a single, visual word identification process (e.g., from orthographic encoding, to lexical identification, to response decision; Monsell, Doyle & Haggard, 1989). Case mixing has been assumed to affect the early encoding of visual features (Besner & McCann, 1987). In contrast, the word frequency effect has been attributed to a later stage than encoding, such as stimulus categorization or decision (Allen et al., 2005; McCann, Remington & Van Selst, 2000).Footnote 1 For instance, McCann et al. examined the locus of the word frequency effect using a dual-task paradigm in which the time interval between a tone Task 1 and a lexical-decision Task 2 (i.e., the stimulus onset asynchrony [SOA]) was varied. A typical finding with this dual-task paradigm is that Task 2 RTs increase as SOA decreases, which has been attributed to a central bottleneck—that is, an inability to perform central operations (e.g., decision making) for the two tasks simultaneously. McCann et al. found that the word frequency effect on the lexical-decision Task 2 was similar at all SOAs, suggesting that word frequency affects processing stages that are subject to postponement (i.e., those located at or after central operations).

Distinct loci for case mixing and word frequency were also confirmed by Reingold et al. (2010), but with a critical difference in the conclusions drawn. Instead of arguing for an early locus of case mixing and a late locus of word frequency, Reingold et al. asserted the opposite view. They recorded participants’ eye movements during reading. The critical measure was the first-fixation duration (when the eyes first moved from the fixated word) with multiple first-pass fixations (trials in which the word was immediately refixated). They assumed that the initial saccade to the subsequent word depends on the completion of lexical access to the currently fixated word. The multiple first-pass fixations would, therefore, indicate incomplete lexical processing. By examining what factors influence the first fixation in multiple first-pass fixations, one could determine what variables influence early lexical encoding. They found additivity of the word frequency and case-mixing effects on fixation times, consistent with the claim that these factors influence different processing stages. However, word frequency, but not case mixing, affected the first-fixation duration in trials with multiple first-pass fixations (the word frequency effect was 14.8 ms, whereas the case-mixing effect was only 3 ms). In contrast to earlier studies, Reingold et al. concluded that word frequency influences early lexical encoding and case mixing influences later attentional, postlexical processing.

Although Reingold et al.’s (2010) finding raises a question regarding the exact loci for case mixing and word frequency, their conclusion was primarily based on sentence reading rather than single-word identification, as in the LDTs employed in most previous studies. Sentence reading involves not only word identification but also other processes, such as syntactic formation, semantic extraction, and sentence comprehension. It has been argued that these processes have a relatively weak influence on reading, especially for skilled readers (see, e.g., Besner & Humphreys, 1991; Humphreys, 1985); often we skip words when we read, or are unaware of misspelled, additional, or missing words. Accordingly, the sentence-reading paradigm may not be sensitive enough to determine the loci for case mixing and word frequency, or possibly it may produce different loci. Likewise, the other studies using LDTs were based on indirect behavioral measures of the time course of word recognition and the additive-factor method.

An electrophysiological measure of word processing

For the present study, we therefore used online measures of event-related potentials (ERPs) to examine word frequency and case-mixing effects in the LDT. The ERPs can provide continuous measures of single-word processing, and often reveal evidence of deeper processing than is apparent in behavioral data (see Vogel, Luck & Shapiro, 1998, for an excellent example of ERPs elicited by semantic activation even when participants could not report the targets in an attentional blink task). By examining the ERP components associated with word processing, it is possible to determine which word-processing stages are affected by word frequency and case mixing.

We used the N170 and P3 components. The N170 is a negative ERP that peaks 140–240 ms after stimulus onset. This component occurs strongly at occipito-temporal sites and relates to structural encoding that is specialized for faces and words (Rossion et al., 2000; Simon, Petit, Bernard & Rebaï, 2007). With respect to words, the N170 amplitude is larger for orthographic stimuli (e.g., words) than for nonorthographic stimuli (e.g., symbols) in the left hemisphere (Bentin, Mouchetant-Rostaing, Giard, Echallier & Pernier, 1999). These findings suggest that the N170 indexes orthographic encoding, an early process of object recognition.

The P3 component is a positive ERP that peaks 400–600 ms after stimulus onset and is larger over parietal midline sites.Footnote 2 The P3 is often taken as a measure of context updating and is sensitive to the relative frequency of stimuli (Donchin, Ritter & McCallum, 1978; Nasman & Rosenfeld, 1990). It reflects the time required to complete stimulus classification for response selection (Luck, 1998). With respect to word recognition, Polich and Donchin (1988) found larger P3 amplitudes for high- than for low-frequency words (see also Lien, Ruthruff, Cornett, Goodin & Allen, 2008). Acosta and Nasman (1992) also found that the P3 amplitude was modulated by within-experiment repetition of words in discrimination tasks, but not in detection tasks. These results suggest that the P3 indexes stimulus categorization, a later process of object recognition that leads to decision making.

The present study

While earlier studies have shown word frequency effects on the P3, the exact locus of the case-mixing effect has not yet been determined. We thus used N170 and P3 data to examine the processing loci of word frequency and case mixing within the same LDT experiment. Word frequency and case type were varied within blocks, and because we were interested in the two effects without contamination from stimulus-repetition-based increased familiarity, we presented each word and nonword only once for each participant.

Our main interest was in the word trials only. If mixing case disrupts early logographic encoding, we would expect to find case-mixing effects on the N170; if it affects later stimulus categorization, we would expect to find case-mixing effects on the P3. Likewise, if word frequency affects early encoding, we would expect to find word frequency effects on the N170; however, if it affects later stimulus categorization, we would expect to find word frequency effects on the P3.

Method

Participants

A group of 28 undergraduates (native English speakers) at Oregon State University participated in this experiment. The data from four of these participants were excluded because of excessive eye movement artifacts in the electroencephalographic (EEG) data (see below).

Apparatus, stimuli, and procedure

The stimulus was a string of letters (0.83º × 0.63º for each letter), printed in white against a black background in the center of the screen. The letters were presented either entirely in lower case or in alternating case (mixed case). The stimuli were taken from the Kučera and Francis (1967) norms. The low-frequency words ranged from 10 to 30 occurrences per million (mean orthographic neighborhood [ON] size = 5.10; Balota et al., 2007), and the high-frequency words from 151 to 1,016 occurrences per million (ON = 5.35). Nonwords were formed by changing one of the letters of a word. Each word and nonword appeared only once for each participant.

Each trial started with a fixation cross for 500 ms, which was then replaced with the stimulus until a response was made. Next, auditory feedback (a tone on error trials or silence on correct trials) was presented for 200 ms. The fixation cross for the next trial appeared 300 ms later.

The participants performed one practice block of 36 trials, followed by 16 experimental blocks of 72 trials each. They pressed the leftmost response-box button if a letter string was a word or the rightmost button if the stimulus was a nonword. Both speed and accuracy were emphasized.

EEG recording and analyses

EEG activity was recorded from electrodes F3, Fz, F4, C3, Cz, C4, P3, Pz, P4, O1, Oz, O2, P5, P6, PO5, PO6, T7, T8, TP7, and TP8. These sites and the right mastoid were recorded in relation to a reference electrode at the left mastoid. The EEGs were then re-referenced offline to the average of the left and right mastoids. A horizontal electrooculogram (HEOG) was recorded bipolarly from electrodes at the outer canthi of both eyes, and a vertical electrooculogram (VEOG) was recorded from electrodes above and below the midpoint of the left eye. Electrode impedance was kept below 5 kΩ. The EEG, HEOG, and VEOG were amplified using Synamps2 (Neuroscan) with a gain of 2,000 and a bandpass of 0.1–50 Hz, and they were digitized at 250 Hz.

Trials with possible ocular and movement artifacts were identified using a threshold of ±75 μV for a 1,000-ms epoch running from 200 ms before stimulus onset to 800 ms after stimulus onset. Each of the artifact trials was then inspected manually. This procedure led to the rejection of 5% of the trials, with no more than 22% rejected for any individual.

Averaged ERPs were time-locked to the stimulus onset. We conducted two different analyses on the ERPs: The first analysis concerned the N170, focusing on the occipito-parietal (electrodes O1, O2, PO5, and PO6) and temporo-parietal (electrodes T7, T8, TP7, and TP8) sites (see, e.g., Simon et al., 2007). We measured the mean amplitude of the N170 from 140 to 240 ms after stimulus onset, relative to the 200-ms baseline period before stimulus onset. The second analysis focused on the P3, using electrodes Cz and Pz (e.g., Lien et al., 2008; Luck, 1998). The mean amplitude of the P3 was measured from 400 to 600 ms after stimulus onset, relative to the 200-ms baseline period before stimulus onset.

Analyses of variance (ANOVAs) were used for all statistical analyses. Because word frequency and lexicality are not orthogonal (i.e., nonwords do not possess word frequency categories), the ANOVAs on words included word frequency as a variable, and separate ANOVAs examined the lexicality effect (words vs. nonwords) but excluded word frequency.

Results

In addition to trials with ocular artifacts, trials were excluded from analyses of the behavioral data (RTs and proportions of errors [PEs]) and the ERP data if the RT was less than 100 ms or greater than 3,000 ms (0.05% of trials exceeded these cutoff values). Incorrect-response trials were also excluded from the RT and ERP analyses.

Behavioral data analyses

The primary ANOVA (words only) was conducted as a function of word frequency (high vs. low) and case type (lower case vs. mixed case). Word frequency effects were revealed on RTs (67 ms), F(1, 23) = 151.53, p < .0001, η 2p =  87, and PEs (.097), F(1, 23) = 91.83, p < .0001, η 2p = .78. We also found case-mixing effects on RTs (50 ms), F(1, 23) = 73.02, p < .0001, η 2p = .76, and PEs (.040), F(1, 23) = 24.46, p < .0001, η 2p = .52. The interaction between word frequency and case type was significant for PEs, F(1, 23) = 9.27, p < .01, η 2p = .29, and it approached significance for RTs, F(1, 23) = 3.69, p = .0673, η 2p = .14 (lowercase frequency effect, 74 ms and .087; mixed case, 60 ms and .107; see Table 1).

Table 1 Mean response times (RTs, in milliseconds) and proportions of errors (PEs) as functions of lexicality (word vs. nonword), word frequency (high vs. low), and case type (lower vs. mixed)

The secondary ANOVA (excluding word frequency) was conducted on lexicality (word vs. nonword) and case type (lower case vs. mixed case). We were primarily interested in the lexicality effect; thus, only the effects involving lexicality are reported. The analyses revealed lexicality effects on RTs (63 ms), F(1, 23) = 51.86, p < .0001, η 2p = 69, and PEs (−.018), F(1, 23) = 4.66, p < .05, η 2p = .17. The lexicality effect was significantly larger for lower case than for mixed case on RTs (76 vs. 50 ms, respectively), F(1, 23) = 16.83, p < .001, η 2p = .42, and on PEs, F(1, 23) = 11.96, p < .01, η 2p = .34 (.004 vs. –.040).

ERP analyses

N170

For the primary ANOVA (words only), N170 amplitudes were analyzed as a function of word frequency (high vs. low), case type (lower case vs. mixed case), electrode site (occipito-parietal vs. temporo-parietal), and electrode location (left vs. right hemisphere).Footnote 3 Figure 1 shows the N170 amplitudes for these electrodes.

Fig. 1
figure 1

Grand average event-related brain potentials for the N170, as a function of word frequency (high vs. low) and case type (lower vs. mixed case) for the left-hemisphere (O1/PO5 and T7/TP7) and the right-hemisphere (O2/PO6 and T8/TP8) electrodes. The unfilled rectangular boxes indicate the time window used to assess the N170 (140–240 ms after stimulus onset). Negative is plotted upward, and time zero represents stimulus onset. The baseline period was the 200 ms prior to stimulus onset

No main effect or interactions involved word frequency, Fs ≤ 2.54, ps ≥ .1248. The N170 amplitude was larger for lowercase (−1.188 μV) than for mixed case (−0.720 μV), F(1, 23) = 2.95, p = .09, η 2p = .11 (i.e., the case-mixing effect). The case-mixing effect was larger for the temporo-parietal than for the occipito-parietal sites, F(1, 23) = 4.83, p < .05, η 2p = .17. Further simple main effect analyses revealed that the case-mixing effect approached significance for the temporo-parietal sites, F(1, 23) = 3.71, p = .06, η 2p = .14 (−1.241 and −0.662 μV for lower and mixed cases, respectively), but not for the occipito-parietal sites, F(1, 23) = 2.03, p = .17, η 2p = .08 (−1.136 and −0.778 μV for lower and mixed cases).

To examine the effect of lexicality, a second ANOVA was conducted. By necessity, we excluded the word frequency variable. The N170 amplitudes were analyzed as a function of lexicality (word vs. nonword), case type (lower case vs. mixed case), electrode site (occipito-parietal vs. temporo-parietal), and electrode location (left vs. right hemisphere). Figure 2 depicts the N170 amplitudes for these electrodes. Only effects involving lexicality are reported.

Fig. 2
figure 2

Grand average event-related brain potentials for the N170, as a function of lexicality (word vs. nonword) and case type (lower vs. mixed case) for the left-hemisphere (O1/PO5 and T7/TP7) and the right-hemisphere (O2/PO6 and T8/TP8) electrodes. The unfilled rectangular boxes indicate the time window used to assess the N170 (140–240 ms after stimulus onset). Negative is plotted upward, and time zero represents stimulus onset. The baseline period was the 200 ms prior to stimulus onset

No main effect of lexicality was found, F < 1.0. Although the lexicality effect on the N170 was significantly larger for the temporo-parietal (0.162 μV) than for the occipito-parietal (0.022 μV) sites, F(1, 23) = 4.71, p < .05, η 2p = .17, further simple-effects analyses failed to show a significant lexicality effect for the temporo-parietal sites, F(1, 23) = 2.75, p = .11, η 2p = .11.

P3

For the primary ANOVA (words only), the P3 amplitudes were analyzed as a function of word frequency, case type, and electrode site (Cz vs. Pz). Figure 3 shows the P3 amplitudes for these electrodes. Only the main effect of word frequency was significant, F(1, 23) = 24.33, p < .0001, η 2p = .51 (6.612 and 4.679 μV for high- and low-frequency words, respectively).

Fig. 3
figure 3

Grand average event-related brain potentials for the P3, as a function of word frequency (high vs. low) and case type (lower vs. mixed case) for the electrodes Cz and Pz. The unfilled rectangular boxes indicate the time window used to assess the P3 (400–600 ms after stimulus onset). Negative is plotted upward, and time zero represents stimulus onset. The baseline period was the 200 ms prior to stimulus onset

The secondary ANOVA on the P3 data (excluding the word frequency variable) was conducted as a function of lexicality, case type, and electrode site. Figure 4 shows the P3 amplitudes for these electrodes. The analyses revealed a significant lexicality effect, F(1, 23) = 57.76, p < .0001, η 2p = .72 (5.645 μV for words and 2.420 μV for nonwords). The lexicality effect was larger for the electrode Cz (3.421 μV) than for the electrode Pz (3.029 μV), F(1, 23) = 7.52, p < .05, η 2p = .25.

Fig. 4
figure 4

Grand average event-related brain potentials for the P3, as a function of lexicality (word vs. nonword) and case type (lower vs. mixed case) for the electrodes Cz and Pz. The unfilled rectangular boxes indicate the time window used to assess the P3 (400–600 ms after stimulus onset). Negative is plotted upward, and time zero represents stimulus onset. The baseline period was the 200 ms prior to stimulus onset

Discussion

The present study was designed to determine the processing loci for case-mixing and word frequency effects. Earlier studies addressing this issue had used behavioral measures on an LDT (Allen et al., 2005) or eye movements in sentence reading (Reingold et al., 2010) and had reached opposite conclusions. We addressed this issue by employing ERP measures in an LDT in which we manipulated word frequency and case mixing using specific “time stamps”: the relatively early N170 (an index of structural encoding) and the relatively late P3 (an index of stimulus categorization). It should be noted that although some studies have utilized the ERP approach (e.g., Donchin et al., 1978; Lien et al., 2008; Nasman & Rosenfeld, 1990; Polich & Donchin, 1988), they focused primarily on word frequency, but not on its interaction with case mixing, as in the present study.

Consistent with earlier reports, the RT data revealed word frequency (67 ms) and case-mixing (50 ms) effects. Furthermore, these effects were additive in the response latencies (frequency effect for lower case, 74 ms; for mixed case, 60 ms), indicating that word frequency and case mixing primarily influenced different processing stages in lexical decision.

With regard to the time courses of the case-mixing and word frequency effects, the ERP data revealed two notable results that help answer this question. First, case type influenced the N170 amplitude but not the P3 amplitude. While there was a trend for larger N170s for lower case than for mixed case (the case-mixing effect), the N170 modulation by case type was stronger for temporo-parietal than for occipito-parietal sites. Consistent with this finding, neuroimaging evidence has generally supported the idea that the temporo-parietal area (e.g., left fusiform gyrus) is associated with word recognition (Carreiras, Mechelli, Estévez & Price, 2007).

The second notable finding was that word frequency in the LDT affected the P3 amplitude but not the N170 amplitude. The modulation of the P3 amplitude by word frequency replicated the findings of Polich and Donchin (1988) and Lien et al. (2008). These findings suggest that, when words are presented only once, the word frequency effect emerges at a later, stimulus categorization stage. Simon et al. (2007) also found no word frequency effect on the N170 when words were presented only twice for each participant. Because they did not examine the P3 component, it is difficult to evaluate whether Simon et al.’s word frequency effect occurred later in processing. They did, however, observe a word frequency effect on the N170 when words were repeated 100 times. Thus, it is possible that the N170 is sensitive to familiarity, which could exaggerate the word frequency effect when words are repeated massively.

The present results were based on the examination of single-word identification using an LDT. Although the present conclusion is in contrast to the interpretation of Reingold et al.’s (2010) eyetracking study of sentence reading, their findings relied on fundamentally different assumptions. As we indicated above, sentence reading is a complex process involving multiple levels of processing in addition to word identification. Thus, in sentence reading it may be difficult to isolate word-level effects of case mixing and word frequency. The presentation of a single word in the LDT, however, provides a precise description of the time course of how a single word is processed. Furthermore, Rheingold et al.’s examination of how fixation was modulated by word frequency and case mixing may have been confounded by the possible decoupling of fixation and attention (e.g., Yantis, 2000).

We suspect that a more parsimonious interpretation exists of the Reingold et al. (2010) data (i.e., effects of word frequency and case type on four eyetracking dependent variables, but only a word frequency effect on first-fixation gaze durations for multiple-fixated words). Namely, the data can be explained by the two-stage model proposed by Allen, Wallace and Weber (1995) and Yap and Balota (2007). The first stage involves stimulus normalization, when familiarity-based information is assessed (e.g., the orthographic similarity of a letter string to a word), and the second stage involves stimulus categorization/lexical access. This model can account for the Rheingold et al. data, as well as for lexical decision and word naming/pronunciation involving stimulus quality/case mixing and word frequency manipulations (see Yap & Balota, 2007).

The present findings provide some insight into whether case mixing and word frequency exhibit cascaded, interactive processing, as suggested by the cascaded model (e.g., McClelland, 1979; Plaut & Booth, 2000) or the serial discrete model discussed above (e.g., Allen et al., 1995; Besner & McCann, 1987; Yap & Balota, 2007). While the additivity between case mixing and word frequency in the behavioral RT data could also be accounted by the cascaded model, the independence of the effects of case mixing and word frequency on different ERP components creates challenges for this model. The observed distinct temporal effects in the present study favor the serial, discrete-stage model of visual word recognition.

In conclusion, the present behavioral and ERP data indicate different loci for case-mixing and word frequency effects in the LDT. In particular, the ERP data suggest an early locus of case mixing (structural encoding, as indexed by the N170 modulation) and a later locus of word frequency (stimulus categorization, as indexed by the P3 modulation). In addition, lexicality also modulates the P3, which is consistent with an effect on later stimulus categorization (Monsell et al., 1989). The present study provides the first ERP demonstration of the time courses of case-mixing and word frequency effects within the same LDT. We argue that case mixing affects an earlier processing stage than does word frequency, at least with respect to lexical-decision processes.