The role of language in cognitive arithmetic remains controversial (Gelman & Butterworth, 2005; Rusconi, Galfano, & Job, 2007). One view is that memory for arithmetic facts (e.g., 2 + 3 = 5, 6 × 5 = 30, etc.) is often based on language-specific codes (Dehaene, Spelke, Pinal, Stanescu, & Tsivkin, 1999; Rusconi et al., 2007; Venkatraman, Siong, Chee, & Ansari, 2006), whereas others have argued that arithmetic memory is based on abstract or language-independent representations (Brysbaert, Fias, & Noël, 1998; McCloskey, 1992; Noël & Fias, 1999; Whalen, McCloskey, Lindemann, & Bouton, 2002).

Here, we examined interoperation transfer to test whether Chinese–English bilinguals’ memory for simple multiplication (6 × 8 = 48) and addition (6 + 8 = 14) included distinct representations in both Chinese (L1) and English (L2). Transfer refers to the positive or negative effects of practicing a task on subsequent performance of the same or a different task (Singley & Anderson, 1989). One form of negative transfer in arithmetic is retrieval-induced forgetting (RIF): Retrieval practice of a multiplication fact (6 × 5 = 30) produces inhibition of its addition counterpart (6 + 5 = 11), slowing retrieval and increasing errors (Campbell & Phenix, 2009). This phenomenon is relevant to bilingual arithmetic because it represents direct manipulation of the long-term memory representations of addition facts. Campbell and Phenix (see also Campbell & Thompson, 2012) showed that RIF of addition facts occurred following retrieval practice of multiplication (repeatedly answering 5 × 6 = ?), but not following study practice of multiplication facts (repeatedly reading equations silently and stating the answer; e.g., 5 × 6 = 30). The retrieval dependency of RIF is well established in the general RIF literature (Anderson, 2003). Studying a fact without retrieval does not require resolution among competitors; consequently, potential competitors may be activated but are not inhibited. Once a retrieval attempt of the target memory occurs, however, retrieval competitors are a source of interference and are inhibited. The evidence that RIF reflects retrieval-dependent inhibition of long-term memory representations (see, e.g., Anderson, 2003) is what makes it a powerful test of the linguistic hypothesis of arithmetic memory. Consequently, if RIF of addition facts occurs only when multiplication practice and addition test involve the same language, this will support language-specific representation of arithmetic facts.

We also anticipated positive transfer effects from multiplication practice to addition performance. Campbell and Thompson (2012) proposed that, in the absence of RIF, addition facts may be primed by multiplication practice and retrieved faster, as compared with control problems. This priming effect presumably would operate in opposition to RIF, so that the net transfer effect reflects their superimposed counteracting influences. RIF, when it occurs, however, appears to be strong enough to overcome priming and produce negative transfer from multiplication to addition.

Despite these opposing influences, the predictions for interoperation transfer in the present study were straightforward. Figure 1 represents the assumption that number-fact representations may be duplicated in L1 and L2. According to the model, arithmetic facts can be stored in each language as a sequence of lemma-level word representations (Dehaene, Piazza, Pinel, & Cohen, 2005) or perhaps as phonological codes (De Smedt, Taylor, Archibald, & Ansari, 2010). Setting a goal to multiply or to add directs activation primarily to the family of facts associated with that operation, but problems in Arabic format activate related facts in both operations. In the example, the goal is to answer 6 × 7 in L1. The problem activates the corresponding multiplication and addition facts in both languages, and the goal to answer with L1 triggers retrieval of the corresponding multiplication fact in L1 (see Kroll, Bobb, Misra, & Guo, 2008, for a discussion of bilingual language-selection mechanisms). There are two important consequences of this process. First, the activation of related facts primes them, which potentially facilitates their subsequent retrieval. Second, retrieval of the L1 multiplication fact inhibits the corresponding L1 addition fact. This is the mechanism of RIF. This inhibition of the L1 addition fact counteracts priming and produces a net reduction in its accessibility (i.e., RIF). In contrast, the L2 addition fact is primed but not inhibited, which would produce retrieval facilitation. The same reasoning applies if L1 and L2 are exchanged in the preceding example.

Fig. 1
figure 1

A model of interoperation transfer in Chinese–English (L1–L2) bilinguals’ arithmetic memory. Arithmetic facts may be stored in each language as a sequence of word representations. In this example, the goal is to answer 6 × 7 in L1. The problem primes addition facts in both languages, but retrieval of the multiplication fact in L1 causes retrieval-induced forgetting (RIF; i.e., inhibition) of the corresponding L1 addition fact. Consequently, transfer to addition when multiplication practice is in the same language reflects counteracting effects of RIF and priming, whereas there is only positive transfer from priming when practice is in the other language. As a result, there is a response time cost for addition following multiplication practice in the same language, relative to practice in the other language

A key assumption of this prediction is that inhibition and RIF of arithmetic facts would be stronger within language than between languages. This follows from the assumption that arithmetic facts are stored as linguistic representations of the two operands and answer. RIF depends on retrieval competition; specifically, categorically related competitors receive more inhibition than do unrelated or weak competitors (Anderson, 2003; Norman, Newman, & Detre, 2007). Given language-specific representations, the problem operands (i.e., the addends or factors) constitute distinct retrieval categories in each language (i.e., there is a language-specific collection of facts associated with each operand in each language). As a result, within-language retrieval competition is relatively strong, which leads to language-specific inhibition and RIF.

To test this, adult Chinese–English bilinguals practiced multiplication problems (e.g., 4 × 5 = ?), answering a subset in L1 and another subset in L2. In a subsequent addition test phase, they were divided into two groups and answered corresponding addition problems (4 + 5 = ?) and control addition problems in either L1 (N = 24) or L2 (N = 24), which provided data for the analysis of transfer. On the basis of the model in Fig. 1, we expected performance costs for addition following multiplication practice in the same language, relative to practice in the other language.

These predictions might seem contrary to previous research showing that repeatedly naming pictures in L2 can reduce memory for the translation equivalent word in L1 (Levy, McVeigh, Marful, & Anderson, 2007; but see Runnqvist & Costa, 2012). In the context of arithmetic, this type of inhibition would be expected to impair performance on the same problem (e.g., 4 × 7 = 28) when answer retrieval is attempted in the other (i.e., not-practiced) language, because the retrieval targets are translation equivalents (e.g., L1 and L2 representations for “28”). In contrast, inhibition of translation equivalents during arithmetic practice in one language would not be expected to transfer to a different arithmetic fact or operation in a bilingual’s other language. For example, in our task, after practicing retrieval of “28” in L2 given 4 × 7, we then measure access to “11” in L1 given 4 + 7. Output competition between translation equivalents is not a factor, because lexical production of “28” would not compete with (or prime) production of “11.”

Campbell and Thompson (2012, Experiment 1) used a similar paradigm as the present study, but tested non-Asian Canadians (NACs) who responded only in English. They found that multiplication practice produced RIF for small addition problems (sum ≤ 10) but not large additions (sum > 10). Small arithmetic problems generally have greater memory strength than do large problems (Zbrodoff & Logan, 2005). Consistent with this, for large additions, their NAC participants reported nonretrieval strategies (e.g., counting, transformation) on 35 % of trials, as compared with only 15 % for small problems. Campbell and Thompson concluded that their NACs’ memory strength for large additions was too weak overall to attract inhibition and RIF during multiplication practice. In contrast, the participants in the present study were all Chinese nationals who received their primary and secondary education in China. This group was expected to have well-developed memory retrieval skills for virtually all the simple multiplication and addition facts (Campbell & Xue, 2001; Penner-Wilger, Leth-Steensen, & LeFevre, 2002). Consequently, we did not anticipate differences in RIF between small and large addition problems.

Method

Participants

Forty-eight Chinese–English bilinguals studying at the University of Saskatchewan were recruited. They included 28 women and 20 men, 18–30 years of age (M = 23.1), with 46 right-handed. Remuneration was $10 CA or course marks. Participants were alternately assigned to the L1 or L2 group for addition trials. All participants reported receiving their primary and secondary education in China. L1 was Mandarin (N = 45) or Cantonese (N = 3). Mean self-rated proficiency in English on a 6-point scale from poor (1) to excellent (6) was 4.0 (range, 3–6).

Design, stimuli, and apparatus

Stimuli were constructed from three sets of 12 single-digit pairs. Set 1 included 14 18 25 28 34 26 38 49 57 67 79 66, Set 2 included 15 16 24 27 37 33 39 48 56 59 89 77, and Set 3 comprised 13 17 23 45 35 44 29 47 68 69 78 99. The three sets were counterbalanced across the multiplication practice conditions (Chinese, English, not practiced). Addition problems corresponding to the not-practiced multiplication problems served as control problems. We also included problem size as a factor. Problems with operands that summed to 10 or less were small, and those with a sum >10 were large (Campbell & Thompson, 2012; LeFevre, Sadesky, & Bisanz, 1996). We used the same size definition for addition and multiplication in order to match them on the operand pairs belonging to small and large problem sets.

The experiment began with an addition pretest in which all 36 pairs were tested in L1 or L2 in random order in two consecutive blocks. The pretest served to activate the addition facts and, thereby, promote RIF of them during multiplication practice. For the subsequent multiplication practice phase, there were six blocks of 12 multiplication problems in random order, with the Chinese-practiced and English-practiced problem sets alternating across blocks. Response language for the first multiplication practice block was counterbalanced across participants. Finally, there was a postpractice block that included all 36 addition problems tested in L1 or L2.

Stimuli and instructions were displayed on a high-resolution monitor controlled by E-Prime 2.0 (Psychology Software Tools, Inc.) running on MS Windows. Problems appeared horizontally in Courier New 14-point font as black Arabic digits on a white background, with the operands separated by the operation sign and adjacent spaces. The smaller number was always on the left, which is the order for multiplication that may be preferred by Chinese students (LeFevre & Liu, 1997). Participants had a lapel microphone that triggered the sound-activated stop signal for a software clock accurate to ±1 ms.

Procedure

The study took about 20 min and occurred in a quiet, well-lit room, with a Chinese–English bilingual experimenter. English instructions appeared on the monitor and were explained in both English and Chinese. Participants were informed that they would be tested on simple addition or multiplication problems and that both speed and accuracy were important. Before each block, the operation and language for that block were explained verbally by the experimenter in either Chinese or English, as appropriate.

A fixation dot for each trial appeared at the center of the screen for 1 s and then flashed twice for 1 s. The problem then appeared and remained visible until the participant responded. The operation sign (+ or ×) appeared at fixation. Response timing began when the problem appeared and stopped when the microphone detected a signal, which also cleared the screen, allowing the experimenter to flag failures of the microphone. No feedback about accuracy or response time (RT) was provided.

Results

A total of 402 RTs (4.6 %) were flagged by the experimenter or excluded as 2.5 SD outliers trimmed around each problem size × practice condition (English, Chinese, no-practice control) mean per participant. In each of the following analyses, only effects with an observed significance level of p < .05 are reported and significance levels were p < .001, unless stated otherwise. All ANOVA tests reported had F(1, 46) degrees of freedom.

Prepractice addition

We analyzed the prepractice addition blocks to confirm that there were no significant differences prior to multiplication practice. Mean RT for correct answers and percentage of errors each received a group (English addition, Chinese addition) × multiplication practice condition (English, Chinese, control) × size (small or large) ANOVA. The means appear in Table 1.

Table 1 Mean response times (in milliseconds) and percentages of errors for the addition pretest by multiplication practice language, addition language group, and problem size

Addition was faster in L1, on average (731 ms), than in L2 (882 ms), F = 8.1, p = .007, η p 2 = .15. The group × size effect reflected a larger problem-size effect for L2 (+189 ms) than for L1 (+32 ms), F = 19.6, η p 2 = .30 (see also Campbell & Epp, 2004). The error analysis indicated only fewer errors on small problems (2.1 %) than on large problems (4.7 %), F = 15.7, η p 2 = .25. There were no significant effects to indicate that performance on the addition problems assigned to the three practice conditions differed prior to multiplication practice.

Multiplication practice

Mean RT for correct answers and percentage of errors for the three multiplication practice blocks in each language received a group (English addition, Chinese addition) × multiplication practice language (English, Chinese) × size (small or large problems) ANOVA. The means appear in Table 2.

Table 2 Mean response times (in milliseconds) and percentages of errors for multiplication practice by practice language and addition language group

With respect to RT, the two groups performed equivalently, with p ≥ .50 for all tests involving group. Participants responded faster in L1 (826 ms) than in L2 (1,012 ms), F = 70.3, η p 2 = .61. The language × size effect reflected a smaller effect of problem size in L1 (+176 ms) than in L2 (+322 ms), F = 16.3, η p 2 = .26. There were also fewer errors in L1 (3.9 %) than in L2 (7.1 %), F = 12.4, η p 2 = .21, and again the language × size effect reflected a smaller problem size effect in L1 (+4.3 %) than in L2 (+8.2 %), F = 7.1, p = .01, η p 2 = .13. There were no significant effects of group in the error analysis (all ps ≥ .07).

Postpractice addition

The control addition problems (i.e., those whose multiplication counterparts had not been practiced in either language) provided the baseline for measuring transfer to addition from multiplication practice. Table 3 presents mean RTs for correct answers and the mean percentages of errors for control problems by addition language group (L1, L2), multiplication practice language (same, different), and problem size (small, large). The table also includes mean transfer effects. Transfer was calculated by subtracting mean correct RT (or mean error rate) for addition problems tested in the same or different language as multiplication from the mean for control problems. Negative transfer values indicate interference, and positive values indicate facilitation.

Table 3 Mean response times (in milliseconds ) and percentages of errors for control addition problems and transfer relative to control problems with practice and test languages same or different for small and large problems

Addition RT transfer received a 2 × 2 × 2 ANOVA with the factors of group (Chinese or English addition) × language (multiplication practice in same or different language) × size (small or large problems). The only significant effect was the language × size interaction, F = 5.0, p = .03, η p 2 = .10, and there was no evidence for a three-way interaction, F < 1. As Fig. 2 shows, the language × size interaction occurred because the direction of transfer was positive for large addition problems following multiplication practice in different languages but was negative following multiplication practice in the same language. A separate analysis of the large problems confirmed a significant 75-ms RT cost for same, relative to different, practice test languages, F = 5.0, p = .03, η p 2 = .10. This demonstrates language-specific RIF. There was no such difference for small problems, F = 0.004, p = .95, η p 2 < .001.Footnote 1

Fig. 2
figure 2

Addition response time transfer relative to control addition problems as a function of problem size, with multiplication-practice and addition-test language the same or different. Error bars are the 95 % within-subjects confidence intervals based on the mean square error for the language × size interaction (Jarmasz & Hollands, 2009). Large problems revealed a response time cost for same, relative to different, practice and test languages

A group × size ANOVA of control addition problems indicated that L1 was faster than L2 (702 vs. 846 ms), F = 10.1, p = .003, η p 2 = .18, and, as in the prepractice addition data, this advantage was greater for large problems (231 ms) than for small problems (56 ms), F = 19.0, η p 2 = .29.

The group × language × size analysis of transfer effects on errors indicated no significant effects (all ps ≥ .24).Footnote 2 A group × size ANOVA of percentage of errors on control problems revealed only fewer errors for small problems (1.0 %) than for large problems (6.9 %), F = 8.4, η p 2 = .15.

Discussion

There was a 75-ms RT cost for large addition problems with multiplication practiced and addition tested in the same language, relative to different practice and test languages. This 75-ms cost reflects within-language RIF, taking into account that the inhibitory RIF effect is superimposed on a facilitative priming effect. Facilitative priming of large addition facts following practice of the corresponding multiplication problems appeared when language switched at test (Fig. 2). We propose (Fig. 1) that this priming effect was driven by the visual problem stimulus presented in Arabic-digit format and, therefore, would occur whether practice and test languages were the same or different; in fact, it seems plainly implausible that facilitative priming would occur only across languages. Consequently, the inhibitory effect of RIF in the same-language condition must be superimposed on this priming effect. For this reason, the different-language condition is an appropriate baseline for estimating costs owing to RIF in the same-language condition (see also Dehaene et al., 1999). Language-specific RIF would arise only if retrieval competition between corresponding addition and multiplication facts was language specific. Given that RIF reflects inhibition of retrieval processes in long-term memory (Anderson, 2003; Campbell & Phenix, 2009), the results provide evidence for language-specific representation of arithmetic facts in our Chinese–English bilingual sample.

There was little evidence of transfer for small addition problems, but this is not necessarily surprising. RIF is subject to a number of boundary conditions (see, e.g., Anderson, 2003; Goodmon & Anderson, 2011) that could differ between small and large problems. For example, our participants’ very high memory strength for small multiplication and addition facts could moderate RIF. RIF is interference dependent (Anderson, 2003); consequently, when a target memory is very strong, target retrieval encounters little interference from competitors and produces little or no RIF of potential competitors. Consistent with this, Campbell and Phenix (2009) demonstrated weaker RIF of addition facts after multiplication counterparts were repeatedly retrieved (and thereby repeatedly strengthened) than following a single retrieval. Our Chinese groups’ performance on small multiplications was very fast and accurate, as compared with the large multiplications (see Table 2), which indicates high memory strength and, in theory, little need to inhibit addition competitors. For large multiplications, mean RT was 28 % longer, and participants were almost four times more error prone, than for small multiplications, which is diagnostic of relatively weaker memory that could be more susceptible to interference, resulting in inhibition and RIF of large addition facts.Footnote 3

A comparison with the results of Campbell and Thompson (2012) supports the view that RIF in arithmetic depends on memory strength, rather than on problem size per se. As was described earlier, Campbell and Thompson (Experiment 1) tested NAC adults in a similar RIF paradigm with problem size defined as in the present study and found RIF for small but not large addition problems. The NACs’ performance on small multiplications (mean of 877 ms for correct responses) matched our Chinese groups’ performance on large multiplications (914 ms), suggesting equivalent memory strength and, therefore, an equivalent basis for RIF of addition counterparts (small additions for the NACs and large additions for our Chinese group). For large addition problems, NAC participants reported 35 % use of calculation strategies for large problems (e.g., counting, decomposition) rather than direct memory retrieval. Campbell and Thompson concluded that their NACs’ memory for large additions was too weak to attract inhibition and RIF. In contrast, previous research with Asian-Chinese university students has indicated almost exclusive use of retrieval for large additions, albeit slower retrieval than for small additions (e.g., Campbell & Xue, 2001; Penner-Wilger et al., 2002). Near exclusive reliance on memory retrieval for large additions would make our Asian-Chinese participants more susceptible to RIF for these problems than Campbell and Thompson’s NAC participants.

Conclusions

We tested novel transfer predictions to investigate how language mediates Chinese–English bilinguals’ performance on everyday arithmetic facts. The experiment demonstrated that mechanisms of RIF provide a model for the retrieval dynamics of language-based arithmetic in Chinese–English bilinguals. The RIF approach provided evidence of both cross-language priming and within-language inhibition of arithmetic facts, theoretical mechanisms that no previous research had considered. Thus, in addition to leading to novel converging evidence for the linguistic hypothesis of arithmetic memory, the model in Fig. 1 makes new theoretical links between RIF, bilingualism, and cognitive arithmetic research. Chinese students develop excellent memory for arithmetic facts through a combination of distinct linguistic, cultural, and pedagogical factors (Campbell & Xue, 2001; Miller, Kelly, & Zhou, 2005), and it remains to be determined whether the model for bilingual arithmetic depicted in Fig. 1 extends to other multilingual cultural groups.