Elsevier

Cognition

Volume 114, Issue 2, February 2010, Pages 227-252
Cognition

The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production

https://doi.org/10.1016/j.cognition.2009.09.007Get rights and content

Abstract

Naming a picture of a dog primes the subsequent naming of a picture of a dog (repetition priming) and interferes with the subsequent naming of a picture of a cat (semantic interference). Behavioral studies suggest that these effects derive from persistent changes in the way that words are activated and selected for production, and some have claimed that the findings are only understandable by positing a competitive mechanism for lexical selection. We present a simple model of lexical retrieval in speech production that applies error-driven learning to its lexical activation network. This model naturally produces repetition priming and semantic interference effects. It predicts the major findings from several published experiments, demonstrating that these effects may arise from incremental learning. Furthermore, analysis of the model suggests that competition during lexical selection is not necessary for semantic interference if the learning process is itself competitive.

Introduction

Retrieving a word from memory has consequences for later retrieval. This is particularly true when retrieval occurs in a semantic memory task such as picture naming. It is well known that the second presentation of a picture to be named speeds the naming response and diminishes the chance of error. This phenomenon, known as repetition priming, can be explained by the fact that each retrieval event is also a learning event, and so the second retrieval benefits from the learning that occurred the first time (e.g. Mitchell & Brown, 1988). Somewhat less well known is the fact that repetition priming has a “dark side”. Retrieving a word has negative consequences for the subsequent retrieval of other words from the same semantic category (e.g. Abdel Rahman and Melinger, 2007, Belke, 2008, Belke et al., 2005, Blaxton and Neely, 1983, Brown, 1981, Damian and Als, 2005, Damian et al., 2001, Howard et al., 2006, Hsiao et al., 2009, Kroll and Stewart, 1994, Schnur et al., 2006, Vigliocco et al., 2002, Wheeldon and Monsell, 1994). Following Oppenheim, Dell, and Schwartz (2007), we refer to these negative consequences as cumulative semantic interference. In this paper, we explain the mechanisms behind cumulative semantic interference in the domain of picture naming. This explanation takes the form of a computational model of lexical access in speech production that simulates the major phenomena in this domain. The model addresses meaning-based lexical retrieval in general, whether this is elicited by picture-naming, naming-to-definition, or spontaneous production. Our focus, however, is on persistent changes to lexical processing that result from the natural retrieval of a single word. The central theoretical point that the model implements is that repetition priming and cumulative semantic interference are two sides of the same coin. They both result from an error-based implicit learning process that tunes the language production system to recent experience.

Although our model is formally developed only for lexical access in speech production, our theoretical goals are more general. Cumulative semantic interference is a manifestation in speech production of a set of phenomena known in the memory literature as retrieval-induced forgetting or RIF. Retrieval-induced forgetting studies demonstrate that the episodic memory for a word or association can be impaired by the previous retrieval of a related memory (e.g. Anderson, Bjork, & Bjork, 1994; but see also Anderson & Neely, 1996, for a discussion of retrieval-induced forgetting in semantic memory). Currently, the explanation for such impairment is debated, with some claiming it results from suppressing previous competitors (often termed inhibition or unlearning; e.g. Anderson et al., 1994, Melton and Irwin, 1940, Norman et al., 2007, Postman et al., 1968) while others claim it stems from strengthening previous targets (occlusion or ‘blocking1; e.g. MacLeod et al., 2003, McGeoch, 1932, Mensink and Raaijmakers, 1988). Our analysis of cumulative semantic interference in speech production will, we claim, speak to this debate. More generally, our model reflects a recent trend in cognition to link psycholinguistics with theories of learning and memory by developing accounts of how experience changes language processing (e.g. Chang et al., 2006, Goldinger, 1998, Kraljic and Samuel, 2005).

Much of the theoretical importance of cumulative semantic interference hinges on an alleged property of requiring a competitive mechanism for lexical selection (e.g. Howard et al., 2006). The most prominent theories of lexical access (e.g. Levelt, Roelofs, & Meyer, 1999) assume competitive lexical selection. Empirical support for this assumption has often come from picture-word interference studies (e.g. Schriefers, Meyer, & Levelt, 1990), in which speakers name pictures as they are presented at short offsets from distractor words. However, since Mahon, Costa, Peterson, Vargas, and Caramazza (2007) presented an analysis demonstrating that picture-word interference studies have not reliably supported the claims of competitive lexical selection, the search for empirical support has turned to a simpler task: picture naming, specifically with regards to cumulative semantic interference.

Two serial picture-naming paradigms have been particularly common in studies of cumulative semantic interference. First is the blocked-cyclic naming paradigm (e.g. Damian et al., 2001). In each block, subjects repeatedly cycle through naming a small set of pictures (e.g. one block might consist of four cycles through a set of six pictures). In the homogeneous condition, all the pictures in the block represent the same semantic category (e.g. farm animals), and in the mixed condition each picture represents a different semantic category. Cumulative semantic interference is indexed by greater difficulty naming pictures in the homogeneous condition relative to the mixed condition (the semantic blocking effect). Typically, the semantic blocking effect is not present in the first cycle and grows over subsequent cycles (e.g. Belke et al., 2005). The second important serial picture-naming paradigm, used by Brown (1981, Experiment 4) and Howard et al. (2006), can be called the continuous paradigm. In this method, pictures drawn from several categories (e.g. animals, vehicles) are named without repeating any item, but with multiple exemplars from each category. Here, cumulative semantic interference is demonstrated by naming times that increase linearly as a function of the number of previously named pictures in that category. Importantly, the number of interspersed pictures between each category exemplar is irrelevant to the effect (Howard et al., 2006). For example, in the sequence GOAT, CAR, TOMATO, TRUCK, HORSE, the naming time for HORSE would be slower than that for GOAT, and would be unaffected by the number of unrelated intervening items.

Howard et al. (2006) argued that three specific properties of the lexical retrieval process must interact to produce cumulative semantic interference in naming latencies: shared activation, competitive selection, and priming. The idea is that each time a target word is activated, semantically related competitors are also activated (shared activation), and strongly activated competitors slow down the selection of target words (competitive selection). Retrieving a word once primes its future retrieval (priming), making it a stronger competitor when related words are retrieved in subsequent trials, thereby causing those subsequent target words to be retrieved more slowly. We will use these three properties to structure our review of the phenomenon and its implications for lexical retrieval.

When a target word such as DOG is activated during its attempted retrieval, its semantic relatives such as GOAT are also activated, thereby setting the scene for lexical competition. This principle of shared activation for semantically related words is what makes cumulative semantic interference specifically semantic in nature.

While the idea of shared activation is compatible with most current theories of semantic representation, it arises naturally from the use of distributed (or feature-based) semantic representations such as those commonly employed in connectionist models (see McClelland & Rogers, 2003, for a review). Distributed mechanisms would predict graded effects of semantic similarity, and indeed blocked-cyclic picture naming studies have demonstrated that more closely related items generate stronger interference effects than those more distant (Vigliocco et al., 2002). So, for the purpose of understanding cumulative semantic interference, it may be useful to think of shared activation arising from shared semantic features rather than all-or-none category membership. That is how shared activation is implemented in our model.

As noted by Howard et al. (2006), however, shared semantic activation does not require distributed representations. It may occur with non-decomposed (localist) lexical concepts (e.g. Roelofs, 1992) provided that related concepts connect either directly (e.g. Collins & Loftus, 1975) or indirectly through shared category or property nodes (e.g. Collins & Quillian, 1969), and each activated concept sends activation to neighboring concepts. Moreover, any graded effects can be attributed to gradations in the number or strength of such connections. Thus, the finding of graded cumulative semantic interference does not allow us to distinguish between distributed and localist semantic representations.

The second property of lexical retrieval that is required for cumulative semantic interference, according to Howard et al. (2006), is that lexical selection be competitive. That is, increasing the activation of non-target words should decrease the speed and accuracy with which a target word is selected. In a competitive selection process, words compete in the manner of two athletic teams during “sudden-death overtime”: the competition continues until a single winner emerges. This might be implemented via either a differential threshold (e.g. Levelt et al., 1999) or lateral inhibition (e.g. Howard et al., 2006), but the key is that having multiple strong competitors makes it harder to select a winner (Wheeldon & Monsell, 1994). A non-competitive selection process (e.g. Mahon et al., 2007), in contrast, is more like a horse race that ends when the first contestant crosses a pre-determined absolute threshold. To illustrate this difference, let us imagine selecting a target word, DOG, when a competitor, GOAT, is also activated. According to a sudden-death competition method, the two compete until one clearly wins, so selecting DOG should be slower and less accurate when GOAT is more active. Thus, cumulative semantic interference in response time would occur if the semantic manipulations raise the activation of competitors. With a horse-race selection method, the speed of DOG’s selection is entirely a function of DOG’s own activation. The activation of GOAT does not enter into the equation, so this non-competitive selection offers no obvious way to account for cumulative semantic interference.

Locating the competition within the word production process is difficult, but several studies constrain it to a point after semantic access and before phonological access. Two findings argue for a post-semantic locus. First, performing non-verbal semantic judgments on pictures in the blocked-cyclic paradigm has proven insufficient to elicit a semantic blocking effect (Damian et al., 2001). So any competition that occurs during stages before lexical access does not appear sufficient to drive the cumulative semantic interference effect. Second, bilingual continuous paradigm experiments indicate that cumulative semantic interference accumulates independently for each language, suggesting that the competitive selection process is language-specific and hence post-semantic (Castro, Strijkers, Costa, & Alario, 2008, Experiments 3 and 4).

Some evidence suggests that the competition may instead characterize the selection of abstract, pre-phonological word-forms, or lemmas. Blocked-cyclic word naming (reading words aloud) appears to produce semantic facilitation rather than interference, suggesting that the competition that results when naming pictures must arise before retrieving phonological word-forms (Damian et al., 2001, Experiment 2a). Retrieving gender-marked determiners during word naming may bring back the semantic blocking effect, suggesting that the competition affects the post-semantic, pre-phonological retrieval of abstract lexical concepts (Damian et al., 2001, Experiment 2b).

Together, the principles of shared lexical activation and competitive lexical selection are sufficient to produce the sort of semantic interference that might be seen within a single trial (e.g. as in the picture-word interference effect, e.g. Schriefers et al., 1990). Shared activation causes semantically related competitors to become active, and a competitive lexical selection mechanism allows these competitors to hinder selection of a target word. Making semantic interference cumulative, however, requires some mechanism by which processes during one trial can affect subsequent trials. This is the function of priming.

Priming is Howard et al.’s final necessary property for cumulative semantic interference. Retrieving a word once should facilitate its future retrieval by either making the word itself more accessible or making its competitors less accessible.

While priming can be implemented in a number of ways, its effects can be characterized as either temporary or persistent. Temporary effects occur when priming is ascribed to changes in activation levels – either positive (e.g. Crowther et al., 2008, Howard et al., 2006, Wheeldon and Monsell, 1994) or negative (inhibitory) changes (e.g. Brown, 1981, McCarthy and Kartsounis, 2000) that are carried over from previous trials. For example, selecting DOG may require the temporary suppression of the urge to say CAT, which might make it more difficult to access CAT for a short time. Persistent accounts (e.g. Damian and Als, 2005, Howard et al., 2006, Schnur et al., 2006) instead describe priming as a consequence of relatively permanent changes to the way words are accessed, such as incremental learning. Inspired in part by neural network models in which incremental learning is attributed to changes in connection weights rather than activation levels, persistent priming is an example of the learning that continually adjusts the cognitive system to suit its environment (e.g. Gupta & Cohen, 2002).

A critical property of the priming mechanism that underlies cumulative semantic interference is that the interference accumulates incrementally as a function of relevant experience (such as naming semantically related pictures), and is unaffected by irrelevant experience (such as naming unrelated pictures). In Howard et al.’s (2006) continuous paradigm study, naming pictures from a single category, such as DOG and then GOAT, produced the same linear accumulation of semantic interference whether the relevant pictures were separated by two, four, six, or eight unrelated items, suggesting that only the relevant experience matters. Moreover, in a variant of Howard et al.’s (2006) continuous paradigm, Navarrete, Mahon, and Caramazza (2008) showed that repeating an item, such as DOG, GOAT, DOG produced the same cumulative interference as accessing an additional novel exemplar from the category, thus demonstrating that each act of retrieval contributes to the effect.

Further evidence of robustness to irrelevant experience comes from the blocked-cyclic paradigm. Damian and Als (2005) showed that performing nonlinguistic tasks (Experiment 1) and naming unrelated items (Experiments 2 and 3) in between naming DOG and GOAT failed to disrupt the semantic blocking effect. Thus, the priming from each relevant experience contributes separately and robustly to the cumulative semantic interference effect, and irrelevant experience affects neither its accumulation nor diminution.

Filler trials, as used in the Howard et al., 2006, Damian and Als, 2005 studies required additional time to present and process, increasing the chronological time between the retrieval of related words. So these studies speak to residual activation (or inhibition) accounts of priming. Priming by residual activation should be strongly affected by the time between prime(s) and target. As argued by Bock and Griffin (2000), the activation levels that control language production must decay quickly in order for production – the rapid sequential activation of linguistic units – to succeed. For example, computer simulations of multi-word production have required that activation levels decay with time constants such that activated linguistic units lose nearly all of their activation within a second or two (e.g. Dell, 1986). Similarly, the effects of inhibitory processes in production are also time-bound. For example, many production theories assume that selection of a linguistic unit entails a reduction to a zero or negative activation value for the selected unit (e.g. Dell et al., 1997, Houghton, 1990). However, the effects of this inhibition are quite temporary and are designed to prevent an immediate perseveratory error. The fact that the filler trials in Howard et al. (2006), and Damian and Als (2005), failed to affect cumulative semantic interference suggests that the priming that underlies this effect is reasonably persistent. Hence, cumulative semantic interference therefore is likely not largely based on the positive or negative changes in activation levels that arise solely through the spreading activation mechanisms of the production system.

A more direct demonstration of the temporal insensitivity of priming comes from Experiment 1 of Schnur et al. (2006), who compared naming latencies in a blocked-cyclic naming paradigm in which pictures were presented either 1-s or 5-s after the previous response (i.e. a 1-s or 5-s response-stimulus interval, or RSI). Any time-based decay of residual activation predicts a statistical interaction between presentation rate and semantic blocking condition, specifically less effect of the blocking with the long RSI (e.g., Wilshire & McCarthy, 2002). Both presentation rates produced reliable cumulative semantic interference effects, with no interaction between RSI and semantic blocking condition, demonstrating that the priming is insensitive to the passage of time, at least at these intervals.

To summarize, the priming that causes cumulative semantic interference is temporally persistent, it accumulates with relevant experience, and it is insensitive to irrelevant experience. These properties offer an awkward fit for mechanisms based on residual activation or inhibition of linguistic units, both of whose effects should be expected to decay rather quickly. Instead, we follow Damian and Als, 2005, Schnur et al., 2006, Howard et al., 2006 by suggesting that the priming that underlies cumulative semantic interference emerges from small, persistent, experience-driven, post-selection adjustments to the mapping from semantics to words, i.e. incremental learning within the production system.

It appears that cumulative semantic interference does more than slow lexical retrieval; it also causes lexical selection errors. In a blocked-cyclic naming task with healthy older controls, Schnur et al. (2006, Experiment 1) found that naming latencies were higher and errors more frequent in the homogeneous condition, relative to the mixed condition. They also tested aphasic patients (Schnur et al., 2006, Experiment 2), who made many more errors, and reported two important findings. First, patients made more semantic errors (e.g. naming DOG as CAT) and omissions in the homogeneous than mixed condition. Second, these semantic blocking effects increased across cycles, while other types of errors (e.g. phonological) showed the opposite pattern. The patients’ increasing semantic blocking effects for semantic and omission errors thus resembled healthy adults’ increasing blocking effects for naming latencies, suggesting that they might stem from the same underlying causes.

The link between the blocking effects on errors in patients and on latencies in unimpaired speakers also has some support from studies that attempt to associate these effects with brain regions. Neuroimaging of healthy subjects demonstrated that activation in the left inferior frontal gyrus (LIFG) correlates with increases in naming latencies due to semantic blocking and related manipulations (Moss et al., 2005, Schnur et al., 2009). The LIFG, for reference, corresponds to Brodmann’s areas (BA) 44, 45, and 47, the posterior part of which (BA 44/45) is Broca’s area. And lesion analyses of patients from the Schnur et al. (2006) study revealed an association between LIFG damage and the increase in errors across blocking cycles (Schnur et al., 2005, Schnur et al., 2009).

A second important finding from these patient studies is that patients’ error effects are also robust to timing manipulations. Schnur et al. (2006) found that the blocking effect on errors – like the blocking effect on naming latency with unimpaired speakers – was not influenced by whether pictures were named with a 1-s or a 5-s RSI. Further examining these patients’ naming errors, Hsiao et al. (2009) found that their within-set perseverations tended to match the words that they had used most recently. For instance, if a patient named pictures of a dog, a pig, and a goat correctly (i.e. saying PIG DOG GOAT) before incorrectly naming a picture of a horse, then she was more likely to name the horse as DOG than as PIG. Crucially, the key measure of recency was not time, but the number of intervening items (henceforth item-lag). Specifically, the chance-corrected perseveration lag functions were the same regardless of whether pictures were presented 1-s or 5-s after a patient’s most recent response. Together with the temporal insensitivity of unimpaired speakers’ response time effects, these results support our previous suggestion that relevant intervening experience, not timing, matters for the build-up or dissipation of cumulative semantic interference.

Howard et al. (2006) presented an elegant model of the effect of cumulative semantic interference on response time. In their model, shared activation is implemented by assuming that words receive continuous (integrated over computationally discrete timesteps) activation from semantic nodes, and each time one semantic node is activated, similar semantic nodes are also activated to a lesser degree. Lexical competition is implemented by inhibitory connections running from each word to every other word (i.e. lateral inhibition). A word is only selected upon reaching an absolute selection threshold, but since activated words inhibit each other, strong competitors can slow down the selection of a target word. Finally, each time a word is selected, its connection from its semantic node grows stronger, implementing a priming function.

The Howard et al. model is noteworthy because it instantiates the principles of shared activation, competitive selection, and priming and because it attributes the interference to processes that are insensitive to time and to unrelated interference. Our goal is to extend this approach. We do so in three respects. First, we identify the priming mechanism with error-driven connectionist learning. This learning mechanism has the natural property that each act of retrieval in a certain context strengthens the target of retrieval (repetition priming) while at the same time making it less likely that similar memories are retrieved instead in that context (similarity-sensitive interference). We will show how this mechanism is consistent with both the perseveratory gradient in the production of word errors and the insensitivity of cumulative semantic interference to unrelated items or the passage of time. By attributing these effects to error-driven learning, we link up with the many cognitive models that are based on such learning (e.g. Chang et al., 2006, Gupta and Cohen, 2002, Plaut et al., 1996) and, as we later demonstrate, this attribution addresses the question of whether retrieval-induced forgetting is caused by “inhibition”. Second, we develop the model in conjunction with theories of word production so that it can account for errors as well as response times. This requires a decision process that allows for lexical competition to play out in time, and for errors of commission and errors of omission. This proposed decision process may underlie the correlation between cumulative semantic interference and activation of Broca’s area. Finally and most importantly, we offer the hypothesis that cumulative semantic interference does not, in fact, require a competitive mechanism for lexical selection. Specifically, we demonstrate that competition in the lexical selection process is unnecessary when combined with error-driven learning. The resulting non-competitive model, we claim, can explain the major findings concerned with cumulative semantic interference.

The key components of our model concern lexical activation, lexical selection, and learning. As in many models of lexical retrieval in production, retrieving a word begins with activating a set of semantic features (e.g. Dell et al., 1997, Gordon and Dell, 2003, Rapp and Goldrick, 2000). These semantic features each connect to a number of words, and thus activate those words in proportion to the strength and number of these connections (lexical activation, Fig. 1). Thus, multiple words are activated, requiring some kind of decision. For the model, we assume that the most active word is chosen. However, when more than one word is activated, it is assumed to be difficult to identify the most active one (i.e. if the difference in activations is slight, the winner is hard to “see”), so a ‘booster’ mechanism kicks in to tease the activations apart. This booster repeatedly amplifies each word’s activation until a winner can be selected (lexical selection), or until this boosting process times out. Response time is assumed to be correlated with the number of boosts needed for the winner to emerge. Errors of commission, such as semantic errors, occur when the wrong word is chosen, and errors of omission occur if the booster times out. Finally, after lexical selection has concluded, an error-driven learning process adjusts the semantic-to-lexical connections so as to facilitate future retrieval of the target word (learning). In the following sections, we describe the details of model’s architecture, and its lexical activation, lexical selection, and learning mechanisms.

The model is a feedforward two-layer network. Semantic feature nodes (e.g. FURRY or AQUATIC) form the input layer of the network. Each feature node connects directly to each of the word nodes (such as DOG or BOAT) in the output layer. Connection weights are initialized at zero and are continually adjusted through an error-driven driven learning process, as detailed later in this description. There are no lateral connections between semantic feature nodes or between word nodes, or reverse connections from words to features.

Lexical activation. When semantic features are activated (as we assume happens when a picture is presented), these features in turn activate words. The net input, neti, to any lexical node i, sums the activation, aj, of each semantic feature, j, times the weight of its connection to the lexical node, wij (Eq. (1))neti=jwijaj

This net input, neti, is then converted to an activation, ai, via a logistic function (Eq. (2)).ai=11+e-netiThus, the activations range from zero to one. We assume that lexical activation is imprecise and therefore add a small amount of normally-distributed noise, ν (with a mean of 0 and a standard deviation of θ), to the net input, neti, yielding Eq. (3)ai=11+e-(neti+ν)

Lexical selection. The next stage applies a competitive winner-take-all process to the lexical activations, linking increased lexical competition to increased naming latencies. A booster mechanism floods the network with additional activation that combines nonlinearly with the existing lexical activation until either one word grows discernibly more active than the rest or the boosting process times out. Notice that this booster process is “dumb” in the sense that it does not know which word is the target. It repeatedly boosts all words. But because it boosts them in a multiplicative manner, the most active one gradually increases its lead on the other words.

The booster is engaged only to the extent necessary to select a single word (that is, it operates more when selection is difficult), recalling Schnur et al.’s (2009) reports of greater LIFG activity as a function of increased lexical competition. Therefore we tentatively identify this booster with the competition-biasing mechanisms that are hypothesized to be a function of the LIFG (e.g. Kan and Thompson-Schill, 2004, Thompson-Schill et al., 1997), but we acknowledge that our implemented booster has arbitrary properties that lack neural motivation. That is, we commit to the functions of the booster (aiding selection when competition is present) and its possible association to the LIFG (Broca’s area), rather than to its implemented details.

The boosting process plays out over time. To determine whether a winner has emerged, at each timestep, tn, we compare the difference between the activation of each word node, aitn, and the mean activation of the other word nodes, aotherstn, to a threshold value, τ (Eq. (4))τ>(aitn-aotherstn)

If no word’s activation difference exceeds the difference threshold (i.e. Eq. (4) is false for all i), then the booster multiplies each word’s current activation level, aitn, by a constant boosting factor, β > 1.0. The result becomes its new activation level, aitn+1 (Eq. (5)). Then this testing and boosting process repeatsaitn+1=aitnβ

A word is selected, that is, the boosting stops, if and when its activation advantage over other words (per Eq. (4)) is great enough. The timestep at which this selection occurs, tselection, is treated as an index of the duration of the lexical selection process, which should correlate with naming latency. If, for the sake of simplicity, we assume no variation in the repeated boosting, this iterative process becomes computationally equivalent to Eq. (6)tselection=logβτait1-aotherst1

However, if no node reaches the difference threshold within a certain number of boosts, Ω, then no word is selected and the trial is an omission. This corresponds to a simple “wait and give up” theory of omissions. So we do not consider an omission as a special state that may be achieved, but rather a lack of sufficient evidence for any particular word, making it difficult to select a word quickly enough.

Note that while the implemented boosting process may be deterministic, based on the initial activations, the target word will not necessarily be selected. The combination of a discernible-difference threshold and a selection deadline may preclude selecting any word if the difference in lexical activations is too small. Adding noise to the lexical activations, as we have done, increases this chance and further opens the possibility that a competitor will be selected instead. Furthermore, although we have not done so here, one could assume that the boosting process is subject to noise either in its normal operation, or in pathological cases (e.g. LIFG damage), by allowing for boosts to randomly fail for particular words at particular time steps. A noisy booster would then have properties in common with sequential stochastic decision mechanisms such as a random-walk process.

We do not implement any residual activation or inhibition in this model. When the trial ends, either by selecting a word or by failing to select a word before the deadline, all activations return to zero.

Learning. At the end of each trial, semantic-to-lexical connection weights are adjusted according to Eq. (7), which is the Widrow–Hoff or delta rule tailored for the logistic activation function (Rumelhart et al., 1986, Widrow and Hoff, 1960): Δwij is the weight change for the connection to node i from node j, η is the learning rate, and di is the desired activation of node iΔwij=η(ai(1-ai)(di-ai))aj

Since this equation will prove crucial for understanding the behavior of the model, we should unpack it a bit more. We have said that learning is error-driven. This means that connections are adjusted according to (di-ai), the discrepancy between the desired activation of output node i, di, and its actual activation, ai (that is, its activation before boosting). So the error in a receiving node’s activation affects both the degree and the direction of the weight change. Hence, when the error for an output node (di-ai) is strongly positive, connections feeding it will be greatly strengthened. When the error for an output node is strongly negative, the connections feeding it will be greatly weakened. Notice that because the logistic activation function precludes activations that are actually 0 or 1, every word unit will experience at least some error on all trials, either positive or negative, if the desired activations are 0 or 1. Next, including the ai(1-ai) component scales weight adjustments to ai, such that weight changes are greatest at ai=0.5 , and decrease as ai approaches 0 or 1. Thus, weight changes are strongest for connections that contribute to moderate activations. Adding the aj specifies that connections from input j should only be modified to the extent that j is activated. And finally, the learning rate, η, is simply an arbitrary global parameter, used to adjust how rapidly weight changes occur.

Thus the learning algorithm increases the connection weights from active semantic features to the target word, and decreases weights from those features to all other words, to the extent that those words were active before boosting. Since this learning is based on the deviation between di and ai, it occurs regardless of whether the target was ultimately selected. So if the network encountered a dog (activating semantic features MAMMAL and TERRESTRIAL), then the connections from MAMMAL and TERRESTRIAL to DOG would strengthen, and the connections from MAMMAL and TERRESTRIAL to any other activated words (e.g. BAT) would weaken. The next time the network encounters a dog, those same semantic features will activate DOG more efficiently (i.e. activating DOG more and competitors less), increasing the speed and likelihood of its selection.

Section snippets

Simulations

Our lexical learning model integrates several features common to theories of lexical access: lexical retrieval begins when distributed semantic features activate words (e.g. Dell et al., 1997, Rapp and Goldrick, 2000); lexical selection uses a differential threshold (e.g. Levelt et al., 1999); and semantic-to-lexical connections are adjusted through experience (e.g. Gordon and Dell, 2003, Howard et al., 2006).

Can this model account for the major behavioral manifestations of cumulative semantic

Summary and implications of findings

Lexical retrieval leads to lexical learning. The light side of learning is well known. Retrieving the same word again becomes faster and more accurate. But learning also has a dark, competitive, side that hinders the subsequent retrieval of semantically related words. In our theoretical framework, this dark side of learning leads to the behaviors identified with cumulative semantic interference.

Remarkably, this framework does not require competitive lexical selection. Competitive learning

Conclusion

The model instantiates a dynamic view of lexical knowledge. Shared semantic representations put competing words in a dynamic equilibrium where no semantic feature connects too strongly to any one word. Each act of lexical retrieval produces persistent, competitive, learning that perturbs this balance. It facilitates repeating the same word and impairs access to competing words. But retrieving a competitor shifts the balance back again. So not only are we capable of learning new words every day,

Acknowledgments

This research was supported by National Institutes of Health Grants DC000191, HD44458, and MH1819990. We thank Aaron Benjamin, Kara Federmeier, Matt Goldrick, John Hummel, Brad Mahon, Randi Martin, Sharon Thompson-Schill, Tatiana Schnur, and the Bock and Dell joint lab meeting for comments on this work and other contributions.

References (76)

  • G.M. Oppenheim et al.

    Cumulative semantic interference as learning

    Brain and Language

    (2007)
  • L. Postman et al.

    Temporal changes in interference

    Journal of Verbal Learning and Verbal Behavior

    (1968)
  • A. Roelofs

    A spreading-activation theory of lemma retrieval in speaking

    Cognition

    (1992)
  • T.T. Schnur et al.

    When lexical selection gets tough, the LIFG gets going: A lesion analysis study of interference during word production

    Brain and Language

    (2005)
  • T.T. Schnur et al.

    Semantic interference during blocked-cyclic naming: Evidence from aphasia

    Journal of Memory and Language

    (2006)
  • H. Schriefers et al.

    Exploring the time course of lexical access in language production: Picture-word interference studies

    Journal of Memory and Language

    (1990)
  • G. Vigliocco et al.

    Semantic distance effects on object and action naming

    Cognition

    (2002)
  • L.R. Wheeldon et al.

    Inhibition of spoken word production by priming a semantic competitor

    Journal of Memory and Language

    (1994)
  • R. Abdel Rahman et al.

    When bees hamper the production of honey: Lexical interference from associates in speech production

    Journal of Experimental Psychology: Learning, Memory and Cognition

    (2007)
  • R. Abdel Rahman et al.

    Semantic context effects in language production: A swinging lexical network proposal and a review

    Language and Cognitive Processes

    (2009)
  • M.C. Anderson et al.

    Remembering can cause forgetting: Retrieval dynamics in long-term memory

    Journal of Experimental Psychology: Learning, Memory and Cognition

    (1994)
  • M.C. Anderson et al.

    On the status of inhibitory mechanisms in cognition: Memory retrieval as a model case

    Psychological Review

    (1995)
  • E. Belke

    Effects of working memory load on lexical-semantic encoding in language production

    Psychonomic Bulletin and Review

    (2008)
  • E. Belke et al.

    Refractory effects in picture naming as assessed in a semantic blocking paradigm

    Quarterly Journal of Experimental Psychology Section A – Human Experimental Psychology

    (2005)
  • K.A. Biegler et al.

    Consequences of an inhibition deficit for word production and comprehension: Evidence from the semantic blocking paradigm

    Cognitive Neuropsychology

    (2008)
  • T.A. Blaxton et al.

    Inhibition from semantically related primes: Evidence of a category-specific inhibition

    Memory and Cognition

    (1983)
  • K. Bock et al.

    The persistence of structural priming: Transient activation or implicit learning?

    Journal of Experimental Psychology: General

    (2000)
  • A.S. Brown

    Priming effects in semantic memory retrieval processes

    Journal of Experimental Psychology: Human Learning and Memory

    (1979)
  • A.S. Brown

    Inhibition in cued retrieval

    Journal of Experimental Psychology: Human Learning and Memory

    (1981)
  • Castro, Y. G., Strijkers, K., Costa, A., & Alario, X. (2008). When cat competes with dog but not with perro: Evidence...
  • F. Chang et al.

    Becoming syntactic

    Psychological Review

    (2006)
  • L. Cohen et al.

    Competition between past and present: Assessment and interpretation of verbal perseverations

    Brain

    (1998)
  • A.M. Collins et al.

    A spreading activation theory of semantic memory

    Psychological Review

    (1975)
  • Crowther, J. E., Martin, R. C., & Biegler, K. A. (2008). The semantic blocking effect in naming: A computational model...
  • M.F. Damian et al.

    Long-lasting semantic context effects in the spoken production of object names

    Journal of Experimental Psychology: Learning, Memory and Cognition

    (2005)
  • M.F. Damian et al.

    Effects of semantic context in the naming of pictures and words

    Cognition

    (2001)
  • G.S. Dell

    A spreading-activation theory of retrieval in sentence production

    Psychological Review

    (1986)
  • G.S. Dell et al.

    Language production and serial order: A functional analysis and a model

    Psychological Review

    (1997)
  • Cited by (296)

    View all citing articles on Scopus
    View full text