In a seminal article, Seidenberg and Tanenhaus (1979) first documented an orthographic influence on spoken word identification: When participants performed timed rhyme judgement on spoken word pairs, rhyming pairs which were similarly spelled (pie–tie) were judged faster than pairs for which spelling differed (rye–tie). Since then, effects of spelling in spoken word identification have been documented in a wide variety of tasks (e.g., Castles, Holmes, Neath, & Kinoshita, 2003; Chéreau, Gaskell, & Dumay, 2007; Dijkstra, Roelofs, & Fieuws, 1995; Goswami, Ziegler, & Richardson, 2005; Morais, Cary, Alegria, & Bertelson, 1979; Pattamadilok, Perre, Dufau, & Ziegler, 2009; Perre, Pattamadilok, Montant, & Ziegler, 2009; Perre & Ziegler, 2008; Zecker, Tanenhaus, Alderman, & Siqueland, 1986, and others; but see Cutler & Davis, 2012; Mitterer & Reinisch, 2015, for contrary evidence).This may imply that listeners cross-activate orthographic codes online whenever a spoken word is processed, via bidirectional functional links between orthography and phonology (e.g., Chéreau et al., 2007; Pattamadilok et al., 2009). Alternatively, effect of spelling could emerge off-line, resulting from the restructuring of phonological representations during literacy acquisition (e.g., Montant, Schön, Anton, & Ziegler, 2011; Perre et al., 2009). This restructuring view assumes that the nature of phonological representations is altered during the process of learning to read and write, leading to “phonographic” representations that integrate orthographic knowledge. Note that online and offline accounts are not mutually exclusive.

From the abundance of findings on spelling effects in receptive tasks, one might predict parallel effects in spoken production. However, the available evidence (see Table 1 for an overview) is more limited and less consistent.

Table 1 Summary of existing results on orthographic effects in spoken production, ordered by task and publication date

As can be appreciated from this overview, these results paint an inconsistent picture. Approximately half of the studies utilised the “implicit priming” task, a popular tool in research on spoken word production. Speakers repeatedly produce a small set of spoken responses within experimental blocks and form overlap within a block is manipulated. Word-initial phonological overlap between responses within a block results in a facilitatory effect on naming latencies (e.g., Meyer, 1990, 1991), interpreted as the online use of partially available information for speech planning (Roelofs, 1997). With English speakers, Damian and Bowers (2003) reported orthographic effects: Priming arising from shared word-initial segments (“coffee”, “camel”, “climate”) was disrupted if one item was substituted with one which had a conflicting word-initial grapheme (“coffee,” “camel,” “kennel”). However, this positive result contrasts with several null findings across various languages (Dutch: Roelofs, 2006; French: Alario, Perre, Castel, & Ziegler, 2007; Mandarin: Bi, Wei, Janssen, & Han, 2009b; Zhang & Damian, 2012), suggesting that if orthographic effects are genuine, the implicit priming task does not reliably detect them.Footnote 1

By contrast, a number of recent contributions have demonstrated orthographic effects in spoken word production in tasks which required the learning of novel words. Rastle, McCormick, Bayliss, and Davis (2011) asked participants to associate novel spoken words with novel objects, and only subsequently introduced the spelling of the novel words. Target objects were associated with words such that their initial phonemes could be either spelled regularly or irregularly based on English spelling–sound relationships (e.g., /kIsp/ spelled as kisp or chisp). Orthographic regularity effects were obtained on naming latencies of the novel objects, with faster responses to regular items than irregular ones (as well as in perception tasks; however, not in auditory shadowing), and the authors argued that object naming involves simultaneous activation of phonological and orthographic codes. Bürki, Spinelli, and Gaskell (2012) studied the acquisition of novel French words containing consonant clusters which can be pronounced either with or without a schwa, although the reduced variants are more frequent in speech. For example, the initial consonant cluster of the novel French word /pluR/ typically results in a schwa reduction. Bürki et al. asked speakers to associate, over several days, the spoken reduced variants of these novel words with novel objects, and then introduced spelling of the novel words which either did, or did not, orthographically represent the schwa (i.e., “pelour” vs. “plour”). In a subsequent naming test, speakers produced more schwa variants for words which had been spelled accordingly than for words which had not. Moreover, novel words with an orthographic representation of the schwa were produced slower compared with those which did not orthographically represent the schwa. The slower responses were taken to be the consequence of competition between the phonological representation of the reduced variants established by repeated auditory exposure, and the phonological representation for the schwa variant generated via the novel word’s spelling. The authors favoured an off-line account, according to which orthographic exposure changes the way in which phonological variants are stored and processed. Similarly, Han and Choi (2015) used a similar word-learning technique to explore allophonic variants of /h/ in Korean, and again obtained effects of orthography, interpreted as offline restructuring of phonological codes via their spelling.

Results from a range of additional tasks (see Table 1) have rendered inconsistent results.Footnote 2,Footnote 3 In evaluating the available evidence, it is of course wise to treat null findings with caution. Nonetheless, it should be clear that the results exhibit considerable inconsistency. Perhaps this reflects the difference between online and off-line effects, as recently argued by Bürki et al. (2012), with production tasks exclusively detecting off-line processes. Some of the used tasks might be of questionable ecological validity, such as the implicit priming task which requires speakers to produce the same few responses over and over again. The inconsistency may also be explained by the fact that researchers of spoken production are generally quite restricted in their choice of materials. Finally, in alphabetic scripts, spelling and sound are strongly confounded, hence it is difficult to design experiments in which the two dimensions are properly dissociated.

To make progress, one would hence ideally use a task which (a) is plausibly sensitive to potential online interactions of sound and spelling, (b) has at least some degree of ecological validity, and (c) involves a target language in which spelling and sound can be largely dissociated (i.e., a nonalphabetic orthographic system). In the experiments reported below, we provided this sort of evidence. In Experiment 1, Mandarin speakers named coloured objects with adjective–noun phrases, and on critical trials, adjective and noun were orthographically related, which resulted in a significant facilitation effect. In Experiment 2, we replicated this pattern with slightly modified materials, and modified various aspects of the design in an attempt to reduce the likelihood of strategic variables impacting the effect. The orthographic effect from the first experiment was again found. These results convincingly demonstrate the online activation of orthographic codes in spoken production.

Experiment 1

Method

Participants

Twenty-seven native speakers of Mandarin Chinese (15 females, mean age 22 years, range: 20–28 years) participated in the experiment and were paid RMB35 (approximately US$5). None were colour blind, and all had normal or corrected-to-normal vision and no history of neurological or language problems. The project was approved by the Ethics Committee of the Institute of Psychology, Chinese Academy of Sciences.

Materials and design

Four colours (blue, brown, green, and orange) and 12 line drawings of objects with no canonical colour from the Snodgrass and Vanderwart (1980) picture set were used. All colour names in Chinese were monosyllabic, and all picture names were disyllabic. The average lexical frequency of object names was 3.6 per million in the Chinese Lexicon (Chinese Linguistic Data Consortium, 2003) database, and they could be written with an average of 16 strokes. Each colour was combined with three objects to form 12 orthographically related colour–object pairings (e.g., 蓝花瓶, blue vase, /lan2hua1ping2/; 橙梳子, orange comb, /cheng2shu1zi /; the two words shared an orthographic radicalFootnote 4). ,Colours and objects within the orthographically related condition were then recombined to form the unrelated condition with 12 orthographically unrelated pairings (e.g., 橙花瓶, orange vase, /cheng2hua1ping2/; 蓝梳子, blue comb, /lan2shu1zi/). In this way, identical stimuli were used across the two conditions (see Appendix A). Care was taken to minimise semantic or phonological overlap (in terms of onset, rhyme, and tone). We collected semantic ratings on a 7-point Likert scale for all colour–object combinations from a group of 16 native Chinese participants (1 = not related at all; 7 = closely related). Average rating scores were 2.8 and 2.9 for orthographically related and unrelated combinations, respectively (t < 1). Hence, stimuli were semantically well matched across orthographically related and unrelated combinations. As in English, adjectives precede nouns in Chinese.

Each participant was presented with six blocks of 24 trials with each of the related and unrelated combinations appearing once in each block, for a total of 144 trials. A new pseudorandom order of trials was generated for each participant and block, such that neither pictures nor colours were repeated on consecutive trials.

Procedure

Stimuli were presented using E-Prime 1.1 software (Psychology Software Tools, Pittsburgh, PA, USA). Naming latencies were measured using a voice key, connected with the computer via a PST Serial Response Box. Participants were first asked to familiarise themselves with the experimental stimuli by viewing them on the screen, with the expected name printed underneath each object. Subsequently, participants were told that they would see the objects in different colours presented on a computer screen, and their task was to name them with an adjective–noun combination as quickly and accurately as possible (e.g., 蓝椅子, /lan2yi3zi/, ‘blue chair’). Next, participants received a practice block consisting of eight objects which were not from the set of targets (each of the four colours was repeated twice). Subsequently, the six experimental blocks were presented, separated by a short break. On each trial, participants saw a fixation cross (500 ms), a blank screen (500 ms), and a picture which disappeared once the participant initiated a verbal response, or after 4,000 ms. The intertrial interval was 2,000 ms. The experimental session lasted approximately 20 minutes.

Results and discussion

Latencies on trials with incorrect responses (0.3%) and faster than 200ms or slower than 1,800 ms (3.0%) were excluded. Average response times and errors are presented in Table 2, showing a facilitatory effect (16 ms) of orthographic overlap on latencies.

Table 2 Response latencies (in milliseconds) and error percentages (standard deviations in brackets) for Experiments 1 and 2

Latencies were analysed using a linear mixed-effects model (Baayen, Davidson, & Bates, 2008; Bates, 2005). Preliminary data analysis showed that there was considerable variability among items in their naming times, with variability arising not only from the object, but also (and in fact more so) from the colour adjective (e.g., items in “blue” are named 100 ms faster than the average, and “orange” 121 ms slower than average). In other words, variability of latencies for objects was confounded with variability of latencies for colours. In order to partial out the variance associated with colours, colour was included as a fixed effect in all analyses which by itself is not of great interest. We initially constructed a “maximal model” (Barr, Levy, Scheepers, & Tily, 2013) which contained the fixed factors relatedness and colour, as well as adjustments to intercepts and slopes for the random effects participants and object names. However, the model showed clear evidence of overparameterisation via r = 1.00 (perfect correlations between intercept and slope adjustments for object names) and such a complex random effect structure is therefore not appropriate (Baayen et al., 2008). When the random-effect structure was stepwise reduced, the “most complex” model which did not suffer from overparameterisation included slope adjustments only for participants, and intercept adjustments for participants and object names. The comparison of the “most complex” model with the “maximal model” was not significant, suggesting that removing random slopes adjustments for participants did not reduce the fit, χ2(N = 3, 758) = 1.41, p = .49. Critically, the most complex model showed a significant effect of relatedness, β = 16, SE = 7.37, F = 4.80, p = .038, and colour, F = 178, p < .001.Footnote 5

Parallel analyses were conducted on the errors but with a binomial family due to the binary nature of the data (Jaeger, 2008). All models which included slope adjustments to participants, and object names showed evidence of overparameterisation. The most complex model was therefore the one which included intercepts only as random effects. In this model, relatedness was not significant, β = -0.96, SE = 0.79, Wald z = −1.22, p = .220.

In summary, the results showed a significant facilitatory effect on latencies when colour and object name shared an orthographic radical. However, there were some limitations in Experiment 1. First, although in choosing our stimuli we had attempted to avoid semantic or phonological overlap between colour and object names, there were two combinations in the orthographically related condition (绿线轴, green cotton reel, /lü4xian4zhou2/; 棕松鼠, brown squirrel, /zong1song1shu3/), and one combination in the orthographically unrelated condition (棕苍蝇, brown fly, /zong1cang1ying/), in which colour and first syllable of object names had matching tone. Second, it has been suggested that colour prototypicality of objects affects naming times (Naor-Raz, Tarr, & Kersten, 2003), such that objects in typical colours (yellow banana) are named faster than objects in atypical colours (purple banana). While we generally avoided objects with highly prototypical colours in our materials, one of the related combinations (“brown squirrel”) was potentially problematic in this regard. Third, stimuli on half of the trials were orthographically related, and the same unrelated/related combinations were shown six times across the experiment, which might have directed participants’ attention to the orthographic manipulation. Finally, because participants were familiarised with the object names prior to the experiment (see “Procedure”; a standard practice in experiments on spoken word production) they were explicitly exposed to orthographic properties of the target words.

In Experiment 2, apart from attempting to replicate the central finding, we aimed to extend Experiment 1 in the following ways. First, we used a revised set of materials in which additional care was taken to avoid residual phonological overlap or colour-object association. Second, to discourage potential strategies, we added a further 12 filler pictures to reduce the percentage of related trials to 25%, and we reduced the number of repetitions of each related/unrelated combination from six to three. Moreover, in the familiarisation phase, we introduced object and colour names verbally to participants. Finally, we conducted postexperimental interviews and asked participants after each testing session whether they had noticed a relationship between colour and object name.

Experiment 2

Method

Participants

Thirty-two native Chinese speakers (20 females, mean age 22 years, age range: 21–27 years), none of whom had been in the first experiment, participated in this experiment, and were paid RMB 35 (approximately US$5).

Materials, design, and procedure

All aspects of Experiment 2 were the same as those of Experiment 1, except that (1) we used a slightly revised set of materials, with an average lexical frequency of 3.41 per million (Chinese Linguistic Data Consortium, 2003) and an average stroke number of 15 (see Appendix B). (2) Besides 12 critical pictures, a further 12 filler pictures were added in order to reduce the percentage of related trials. As was the case for the critical target pictures, each filler picture was paired with two of the critical colours, thus forming 24 filler trials in which each colour appeared six times. Semantic, phonological or orthographic overlap between adjective and noun was minimized. Each combination was repeated three times, thus generating 144 trials in total, presented in three blocks of 12 related, 12 unrelated, and 24 filler trials. (3) Expected names of the pictures and colours were not presented visually in the familiarisation stage, but instead the experimenter named them to participants. (4) After testing, participants were asked to report whether they had noticed any relation between colour and picture names.

The experiment was run using DMDX (Forster & Forster, 2003), and vocal responses were recorded using a microphone connected to the computer. Vocal responses were inspected and analysed by a research assistance who was blind to the hypotheses and design of the study using CheckVocal (Protopapas, 2007).

Results and discussion

Data were analysed in the same way as described in Experiment 1. Latencies on trials with incorrect responses (5.3%) and faster than 200 ms or slower than 1,800 ms (0.7%) were excluded. Filler trials were not analysed. Average response times and errors are shown in Table 2, showing a facilitatory effect (39 ms) of orthographic overlap.

Analysis of latencies with a mixed-effects model showed that as for the results from Experiment 1, the maximal model with the full random effect structures was over-parameterised (perfect correlations between intercept and slope adjustments for participants). The simplified model included slope adjustments only for object names, and intercept adjustments for participants and object names. The comparison of the simplified model with “maximal model” was not significant, suggesting that removing random slopes adjustments did not reduce the fit, χ2(N = 2, 166) = 2.81, p = .25. The simplified but most complex model showed a significant effect of relatedness, β = 38, SE = 14.6, F = 6.84, p = .03, and colours, F = 27.97, p < .001.

Parallel analyses were conducted on the errors but with a binomial family due to the binary nature of the data (Jaeger, 2008). All models which included slope adjustments to participants and/or object names showed evidence of overparameterisation. The most complex model was therefore the one which only included random intercepts. In this model, relatedness was not significant, β = −0.12, SE = 0.19, Wald z < 1, p = .537.

Postexperimental interviews revealed that none of the participants had recognised the orthographic relation between colour and object names.

General discussion

Current evidence on whether for literate individuals the preparation of spoken language is affected by orthographic properties is mixed and inconsistent (e.g., Alario et al., 2007; Damian & Bowers, 2003; see Table 1 for details). In two experiments we presented strong evidence for such an involvement of orthography: when Mandarin speakers named coloured objects via adjective–noun phrases, orthographic overlap between the two words induced a facilitatory effect. The coloured object naming task is a well-established tool in research on spoken production and has been used both with speakers of Western languages (e.g., Damian & Dumay, 2009), and Mandarin (Qu, Damian, & Kazanina, 2012). The task has at least superficial ecological validity and the current results are unlikely to have arisen from strategies that participants developed.

We acknowledge that the size of the orthographic effect varied considerably across the two experiments, for reasons not yet determined. Differences in materials are unlikely to be relevant, as stimuli overlapped to a large extent between the two studies. Interestingly, the second experiment in which additional unrelated filler items had been inserted showed a larger orthographic effect than the first experiment, further arguing against a strategic origin of the effect. Participants also named objects faster in the second (853 ms) compared with the first (978 ms) experiment, with the larger orthographic effect arising in the faster experiment. A speed–accuracy trade-off is a possibility, with Experiment 2 showing faster response latencies but higher error rates (5.4%) than Experiment 1 (0.3%). It remains to be determined whether in the coloured picture-naming task the size of the orthographic effect varies with overall speed or depends on some other property of materials or participants.

Our results underscore the usefulness of nonalphabetic languages in order to investigate a potential role of orthography in speaking: Because sound and spelling are largely independent, experiments can vary orthographic properties (in this case, radical overlap) while avoiding phonological overlap. Equivalent experiments in languages with alphabetic orthography are difficult, although not impossible, to implement (see, e.g., Roux & Bonin, 2011, in which orthography and phonology were manipulated independently in a written picture naming task so that French target and context pictures shared the initial letter but not the initial sound, as in “cigar–camion”, or they shared the initial phonemes but not the initial letter, as in “singe–ceinture”). Given our evidence supporting a role of orthography in speaking in the current experiments, how can one account for the considerable degree of inconsistency in previous findings (see the Introduction, and Table 1)? Of course, some of the failures to obtain effects in spelling might simply be false negatives. Nonetheless, some null findings are obtained fairly consistently (e.g., in the “implicit priming” task for which one positive finding is countered by six negative findings). What could account for such null findings if orthographic effects in other tasks are accepted as genuine? As briefly summarised in the Introduction, a general distinction is between “online” and “off-line” sources of potential effects, with the former reflecting direct processing cross-talk between spelling and sound, whereas the latter attributes effects of orthography to a restructuring of phonology during literacy acquisition. Bürki et al. (2012) suggested that there is no “online” crosstalk between spelling and sound in production tasks; rather, to the extent that orthographic effects in production arise, they reflect “off-line” restructuring of phonological representations during acquisition of literacy. According to Bürki et al., this could account for the positive findings on word learning tasks, but the null findings on “implicit priming” tasks (see Table 1). However, the results from the current experiments do quite clearly reflect “online” cross-activation between spelling and sound, so the off-line/online distinction favoured by Bürki et al. appears less relevant.

A different possibility is that phonological and orthographic codes are accessed at different speeds in different tasks, with orthographic effects only emerging in tasks in which orthographic codes are accessed simultaneously with, or perhaps even slightly ahead of, access to phonological codes. This possibility was discussed by Rastle et al. (2011, p. 1592) in order to account for their positive findings of orthography in picture naming, but a null finding in auditory shadowing. In the shadowing task, participants hear a spoken word and are instructed to immediately repeat it, thus phonological activation can guide responses before activation of orthographic codes can exert an influence. By contrast, in picture naming, a task in which responses are much slower than in shadowing, orthographic and phonological representations are activated simultaneously, hence there is opportunity for orthographic feedback to influence spoken responses. This account, however, appears somewhat ad hoc when jointly considering all existing evidence from Table 1. It is acknowledged that direct evidence on the time course of access to phonological versus orthographic codes in spoken production, perhaps via EEG, would be extremely useful (see Zhang & Damian, 2009, for an initial attempt).

Our findings demonstrate activation of orthographic codes during phonological encoding, arising from orthographic (i.e., radical) overlap within a two-word spoken phrase. How could a mechanism which explains this finding be integrated into computational accounts of word production (Dell, 1986; Dell, Schwartz, Martin, Saffran, & Gagnon, 1997; Levelt, Roelofs, & Meyer, 1999)? Naming a coloured object requires phonological encoding of both adjective and picture name. Orthographic forms of adjective and noun could be either directly activated from meaning (as they presumably would in written picture naming), or, alternatively, phonological encoding of the response might result in coactivation of corresponding orthographic forms via bilateral links. Under either scenario, orthographically related colour–object pairs would prime each other at the orthographic level, and activation would be required to be transmitted, via bilateral links, to the phonological level, resulting in a priming effect in naming latencies. Note that such an account does not necessitate sublexical correspondences between sound and spelling, which in nonalphabetic languages such as Mandarin Chinese (our target language) are obviously much reduced compared to alphabetic languages. In the domain of language perception (rather than production), frameworks exist which incorporate such bilateral links between orthography and phonology. For instance, the bimodal interactive activation model (BIAM; Grainger & Ferrand, 1994) implements visual and spoken word recognition via orthographic and phonological pathways which are bidirectionally connected both at the sublexical and the lexical level. A recent extension (Diependaele, Ziegler, & Grainger, 2010) additionally implements an output phonological layer. To adapt this architecture to semantically driven language production, one would need to add higher level representations (conceptual; lexical-semantic; lexical-syntactic) which activate output phonology. Via cross-activation of the implemented orthographic and phonological pathways, such a model could plausibly account for orthographic effects such as those shown here. In summary, our experiments provide evidence for a genuine orthographic effect in spoken phrase production by Mandarin speakers. The results eventually will need to be accounted for in a computational framework of spoken production which implements online cross-talk between phonological and orthographic representations.