In short-term recognition memory, Sternberg (1969) reported flat serial-position functions in reaction time (RT). If the study items are presented rapidly and the pause before the recognition probe is brief, however, there is a modest advantage for the first-presented item and a dramatic advantage for the final item (Burrows & Okada, 1971; Corballis, Kirby, & Miller, 1972; McElree & Dosher, 1989).

Initially, serial-position effects in short-term memory were interpreted as support for strength theory (Corballis et al., 1972; McElree & Dosher, 1989). That is, encoding an item strengthens its representation in permanent memory: The better the encoding, the greater the strength. Encoding varies with serial position, with the first item receiving more attention than later items and the final item suffering least from interference (see Oberauer, 2003). Probes are evaluated by checking their strength; strong items yield fast decisions.

Models of recognition in long-term memory—the global-matching models—have proposed that decision depends on the probe’s similarity to the entire study set (see Clark & Gronlund, 1996). Applied to short-term memory, such models also explain serial-position effects in terms of encoding.

Strength and global-matching models agree that decision is based on a single index of strength/similarity: Values above and below a criterion yield positive and negative responses, respectively. Decisions are most difficult for values near the criterion, so that strong targets are easy, while strong lures are difficult. We shall refer to strength and global-matching theories as single-factor accounts, because they both posit uni-dimensional evidence evaluated against a single decision criterion.

Serial-position effects occur only for targets; by definition, lures do not have positions within study lists. Recency for lures, however, has been examined in studies where stimuli repeat across trials. A lure was harder to reject if it was studied on a recent trial than if it had not occurred previously or had occurred earlier in the experiment (McElree, 1998; McElree, Dolan, & Jacoby, 1999; McElree & Dosher, 1989; Monsell, 1978). Thus, recency helps targets and hinders lures, as required by single-factor theory.

The claim that recency helps targets but hinders lures assumes that recency for targets within a list is comparable to recency for lures across lists. The evidence confounds probe type (target vs. lure) with the context in which recency is measured (within the study list vs. across experimental trials). An alternative interpretation is that recency within the list helps decision, whereas recency across lists hinders decision. The second view is consistent with studies that manipulated frequency in multi-trial recognition (Johns & Mewhort, 2003). On each trial, a list of three independent items was studied; the test sequence involved multiple copies of targets and related lures. Both targets and lures were speeded when repeated within the test sequence. Across trials of the experiment, items varied according to the frequency with which their features occurred. Both targets and lures were slowed if their features occurred frequently across trials. For feature frequency, performance reflected the context in which frequency was measured. Resolving the corresponding confound for recency requires either within- or across-list data for both targets and lures.

We can manipulate within-list recency for lures by selecting lures that are similar to studied items and assigning them to the corresponding serial positions (e.g., McElree, 1998; Öztekin & McElree, 2010). A lure’s strength should be correlated with that of the study item it resembles: Lures that are similar to the strongest study items should be the strongest lures. If strength makes targets easy and lures difficult, the serial positions associated with optimal performance for targets should show the poorest performance for lures: Serial-position curves for targets and lures should mirror each other.

Experiment 1

Experiment 1 examined serial-position curves for both targets and lures in short-term recognition memory. Each lure was an orthographic neighbour of a studied item; that is, it shared three of its four letters with one study item. It was assigned to the serial position of its studied neighbour. Single-factor theory anticipates a standard serial-position curve for targets and an inverted serial-position curve for lures.

Method

Subjects

Twelve volunteers participated in return for credit in their introductory psychology course. All subjects were fluent in English.

Apparatus

An MS-DOS computer controlled the experiment. Three response buttons, Start, Yes, and No, were connected to the games port of the computer by micro-switches.

Word pool

The pool was composed of 360 four-letter, monosyllabic English words (Johns & Mewhort, 2002, Appendix). The sequence of letters within each word was consonant, vowel, consonant, consonant. Only one homophone pair, worn/warn, was included.

Study and test items

Each study list comprised four words drawn from the pool so as to include four different initial consonants, four different vowels, four different third-position consonants, and four different final consonants—for example, fund, help, mist, and barn. Targets were copied from the appropriate serial position in the study list. Lures were selected so that three of the letters, including the initial letter, matched the three letters in the corresponding positions of the word in the appropriate serial position. The mismatched letter did not occur in that letter position in any study word. In our example, the lures fond, hemp, miss, and bark would test serial positions 1–4, respectively. No study or test word had occurred on the preceding trial.

There were six blocks of 48 trials. Within blocks, the trials were equally divided by probe type (target/lure), serial position within probe type (1–4), and, for lures within serial position, position within the word of the mismatched letter (2, 3, or 4).

Procedure

Each trial began with, "When ready, press START" at the centre of the screen. After the subject pressed the Start button, the study words appeared, each in lowercase letters and centred on line 13 of the 24-line display. Each word was shown for 500 ms, with a blank screen of 50 ms between words. The last study word was followed by two rows of four ampersands, centred on lines 13 and 14 of the display, shown for 200 ms, and followed by a 50-ms blank screen. Next, a single test word appeared, in uppercase letters, centred on line 14. The probe remained on the screen until the subject pressed either the Yes or the No button. After a 250-ms pause, "When ready, press START" appeared, to start the next trial.

At the end of each block, the number of errors in that block and the message "Take a break" were displayed. The experimental blocks were preceded by a block of 12 practice trials, drawn at random from a further 48 trials.

Accuracy was stressed. Subjects were informed that an auditory 'groan' would signal an error and that the trial would be replaced. The replacement trial was inserted at random amongst the trials remaining in the block. They were also informed that RTs were being recorded.

Results

All results exclude practice trials.

As is shown in Table 1, accuracy was higher for targets than for lures, F(1, 11) = 42.93, MSE = 8.52, p < .001. Accuracy varied with serial position, F(3, 33) = 3.73, MSE = 17.77, p < .05. Serial position did not interact with probe type, F(3, 33) = 1.09, p > .35.

Table 1 Accuracy (percentage correct) for targets and lures from four serial positions, Experiments 1 and 2

In the Sternberg paradigm, accuracy should approach ceiling, and the data of interest are the RTs for correct responses. As is shown in Fig. 1, correct responses were slower for lures than for targets, F(1, 11) = 67.79, MSE = 8,563, p < .001. RT varied with serial position, F(3, 33) = 8.62, MSE = 6,851, p < .001, and serial position did not interact with probe type, F(3, 33) < 1.

Fig. 1
figure 1

Mean reaction time (RT) as a function of serial position for correct targets and lures, Experiment 1

We used orthogonal contrasts to test the recency and primacy components of the serial-position curve; the comparisons are defined in Table 2. Both recency and primacy were reliable, F(1, 11) = 11.91, MSE = 13,172, p < .01, and F(1, 11) = 8.60, MSE = 1,698, p < .05, respectively. Neither component interacted with probe type, all Fs(1, 11) < 1.

Table 2 Weights used to calculate orthogonal contrasts for analyses of reaction time data

Discussion

Three aspects of the data suggest that Experiment 1 was more difficult than is typical in the Sternberg paradigm (e.g., Burrows & Okada, 1973): Errors were more numerous; RTs, especially for lures, were longer; and false alarms exceeded misses. The greater difficulty presumably reflects our use of lures that were similar to study items.

Targets and lures varied the same way with serial position: The functions were parallel. If the most recently presented study item is strongest, a matching target should be easy, but a related lure should be so strong that it will be difficult to reject. The parallel functions are the opposite of the complementary serial-position curves predicted by single-factor theory.Footnote 1

Other manipulations in variants of Sternberg’s short-term memory paradigm also affect RTs for targets and lures in the same way. In the fixed-set variant, each study list is tested with a series of probes. As was noted earlier, both targets and lures benefited when repeated within the test series (e.g., Theios, Smith, Haviland, Traupmann, & Moy, 1973); when features were repeated across trials of the experiment, however, targets and lures with repeated features were both slowed (Johns & Mewhort, 2003). Furthermore, both targets and lures were speeded if preceded in the test series by a related test (Johns & Mewhort, 2009). Finally, Oberauer (2008) modified the paradigm by presenting study items in different contexts; subjects decided whether probes had occurred in their studied contexts. Serial-position curves for context identification were parallel for targets and lures.

Instead of basing decisions on strength/familiarity, our subjects appear to have retrieved the studied item most like the probe. The best-encoded (first and last) items were retrieved most quickly. Subjects decided whether the retrieved item matched the probe and responded accordingly. That is, instead of general information, they used information about specific studied items.

The idea that lures are evaluated against particular study items has been proposed for several long-term memory paradigms involving difficult discriminations. Brainerd, Reyna, and Kneer (1995) showed that the bias to false alarm on a lure that is related to a studied item can reverse if the lure immediately follows its related target in the test sequence. Rotello, Macmillan, and Van Tassel (2000) used lures that differed from studied items by only the plural marker; such lures frequently elicited high-confident rejections. Hintzman and Curran (1994) included changed-plurality lures in a judgment-of-frequency task; when required to respond early, subjects assigned the frequency of the related studied items to such lures, but when prompted to respond later, they correctly assigned them a frequency of zero. Finally, using a process-dissociation task, McElree et al. (1999) found that the correlation between false alarm rate and number of presentations of the lure in an irrelevant list was positive early in processing but negative late in processing. Apparently, general familiarity information governs performance early in processing but can be corrected subsequently by item-specific information.

In the present experiment, the well-encoded end items were both familiar and easy to retrieve. For targets, both general and item-specific information pointed to a positive response. For lures, the general information encouraged a positive response, but the item-specific information indicated a negative response. If combined, the two sources should produce a flatter serial-position curve for lures than for targets. Hence, the parallel curves imply that our subjects relied entirely on item-specific information.

Experiment 2

In Experiment 1, subjects made little use of general (familiarity) information. In Experiment 2, we attempted to bring the choice of information under experimental control. The test sequence included not only targets and orthographic neighbours of study items (high-overlap lures), but also lures that shared only one letter with a study item (low-overlap lures). Low-overlap lures are not intrinsically interesting but were included because they should be rejected quickly and accurately on the basis of general information. If subjects consider general information whenever a probe is presented, some high-overlap lures will be rejected on this basis. The serial-position curve for high-overlap lures will, therefore, be flatter than the curve for targets (unlike the parallel curves obtained in Experiment 1). Although using general information does risk false alarms on high-overlap lures, that risk should be balanced because low-overlap lures should produce few errors.

Method

Subjects

Fourteen undergraduates participated on the same basis as in Experiment 1. We replaced 1 subject who misunderstood the instructions.

Apparatus, word pool, and procedure

The apparatus, word pool, and procedure were the same as in Experiment 1.

Study and test items

Study items, targets, and high-overlap lures were selected as in Experiment 1. Low-overlap lures shared a single letter with one studied item. If fund, help, mist, and barn were studied, we assigned such, poll, toss, and gown to serial positions 1–4, respectively. Although our assignment balances the experimental design, assigning such lures to serial positions is arbitrary; we would not claim that a low-overlap lure is necessarily most similar to the study item with which it shares a letter.

There were six blocks of 48 trials, preceded by 12 practice trials. Trials per block were equally divided by probe type (target/lure) and serial position within probe type (1–4). Lures within serial position were equally often high-overlap and low-overlap lures. The unstudied letter (high-overlap lures) or studied letter (low-overlap lures) was equally often in letter position 2, 3, or 4. The sequence of trials within each block was determined randomly.

Results and discussion

As is shown in Table 1, accuracy varied with probe type, F(2, 26) = 92.64, MSE = 12.03, p < .001. We used two orthogonal contrasts to pinpoint the differences among the probe types. First, we compared low-overlap lures against the average of targets and high-overlap lures, anticipating that performance on low-overlap lures would reflect general information and, therefore, be distinguished from the other probes. Accuracy on low-overlap lures (99.0%) was, indeed, higher than the average of the other two probe types (93.4%), F(1, 13) = 165.05, MSE = 6.79, p < .001. Second, we compared targets against high-overlap lures; accuracy was higher on targets than on high-overlap lures, F(1, 13) = 64.13, MSE = 17.26, p < .001. As in Experiment 1, accuracy varied with serial position, F(3, 39) = 4.05, MSE = 23.06, p < .05. Unlike in Experiment 1, the interaction between probe type and serial position was not reliable, F(6, 78) = 1.60, .15 < p < .25.

Accuracy was too near ceiling to support strong conclusions. The important data are the RTs, shown in Fig. 2. RT varied with probe type, F(2, 26) = 45.96, MSE = 10,475, p < .001: Low-overlap lures were faster than the average of the other two probe types, F(1, 13) = 7.97, MSE = 14,135, p < .05. Targets were faster than high-overlap lures, F(1, 13) = 124.75, MSE = 6,815, p < .001.

Fig. 2
figure 2

Mean reaction time (RT) as a function of serial position for correct targets and lures, Experiment 2

As is shown in Fig. 2, targets and high-overlap lures showed typical serial-position curves, with a flat function for low-overlap lures, F(3, 39) = 8.21, MSE = 5,871, p < .001. The flat function was based on assigning each low-overlap lure to the study position with which it shared one letter. As was indicated earlier, the assignment is open to question. Because our interest was in how subjects process targets and high-overlap lures when encouraged to use general information, we excluded the low-overlap lures from further analysis.

We used the same orthogonal contrasts as in Experiment 1 to assess recency and primacy (see Table 2). There was strong recency, F(1, 13) = 14.10, MSE = 11,085, p < .01, but only marginal primacy, F(1, 13) = 3.86, MSE = 5,208, .05 < p < .1.Importantly, and unlike in Experiment 1, targets showed greater recency than did high-overlap lures, F(1, 13) = 11.09, MSE = 987, p < .01. At the editor’s request, we compared recency across the two experiments; the three-way interaction between experiment, probe type, and recency was reliable, F(1, 24) = 5.54, MSE = 1,439, p < .05.

Both the accuracy and RT data suggest that including low-overlap lures promoted the use of general information. If they were using general information, subjects should incorrectly identify some high-overlap lures as studied items. Table 1 documents more false alarms for high-overlap lures than occurred in Experiment 1. In the RT data, high-overlap lures showed less recency than did targets. The interaction implies that subjects based some correct rejections on general information, instead of relying solely on item-specific information, as they did in Experiment 1.

General discussion

In long-term recognition, dual-process theories naturally involve two types of information (see Yonelinas, 2002). Jacoby and Dallas (1981) proposed an automatic judgment of familiarity, followed, if necessary, by an attempt to recollect relevant information. Lures that resemble studied items are, initially, too familiar to reject, but the subject may recollect the closest study item and note the mismatch (recall-to-reject). We have already summarized several results that fit the dual-process framework. A lure preceded by a test of a related study item is easily rejected (Brainerd et al., 1995) because the related study item is easily recalled. If the related studied item is recalled, a reversed-plurality lure is rejected (Rotello et al., 2000). Finally, in the judgment-of-frequency task, a new item is correctly classified if the subject recollects the related item (Hintzman & Curran, 1994).

The present experiments involved sub-span lists, almost perfect encoding, and brief retention intervals. Because our results indicate the use of two types of information, however, we will consider our data in terms of dual-process ideas.

According to standard dual-process theory, familiarity information is calculated automatically and becomes available early. If decisions are made on the basis of familiarity, targets from the end positions should be easy to accept, and lures related to the end positions should be difficult to reject. Subjects can correct the familiarity judgment by recalling items from the list, with the end items easiest to recall. For targets, recall favours the end items, just as familiarity did. For lures, however, recall favours the end items, where familiarity makes end items difficult to reject. To the extent that familiarity information is corrected by recalled information, the serial-position curve for lures will be moderated and can even bend in the same direction as the curve for targets. So long as it includes a contribution from familiarity, however, it must be flatter than the curve for targets.

Dual-process theory can accommodate the parallel serial-position curves in Experiment 1 only if all the decisions reflected recollection. That is, subjects evaluate the familiarity of each probe, but, because it is always inconclusive, invariably base decisions on recollection. Alternatively, subjects could eschew familiarity completely, although this would be “a nonstandard if not controversial claim” (McElree et al., 1999, p. 576) for a dual-process model. The results of Experiment 2 are easily explained by dual-process theory: a classic serial-position curve for targets where both familiarity and recollected information favour the same items and an attenuated curve for lures where familiarity and recollection work in opposition.

Somewhat paradoxically, dual-process models do not define the mechanisms of familiarity and recollection so much as they specify how performance should be apportioned to the two processes. Dual-process models work because they retrieve two kinds of information, but the use of two kinds of information does not imply two processes.

Mewhort and Johns (2005) outlined a single-process account of item recognition, the iterative resonance model (IRM), which retrieves a continuum of information, from general to item specific. A probe resonates with memory, producing an echo (as in Minerva-2; Hintzman, 1988). Features in the echo vary in strength with (1) how frequently they occurred in studied items, (2) the strength of those encoded items, and (3) the similarity of the probe to those items. Unlike Minerva-2, if the echo does not support a clear decision, the process iterates. With successive iterations, the weighting by similarity between the probe and each memory item is increased, producing a series of echoes that approach the studied item most like the probe (see also Jamieson & Mewhort, 2009). Although neither the retrieval process nor the retrieved information is dichotomous, the model involves two factors: Positive decisions reflect the similarity of the probe and echo, but negative decisions occur when features of the probe contradict features in the echo. Note that contradictory features can be detected before item-specific information is retrieved if a feature of the probe does not occur in any study item.

When a recognition task involves similar targets and lures, as in Experiment 1, item-specific information is required to classify both targets and lures, and subjects must allow the resonance process to approach the item most like the probe. The end items are at an advantage, and serial-position curves for targets and lures will be parallel. If negative decisions can frequently be made without item-specific information, as in Experiment 2, subjects may respond before resonance isolates the most similar studied item, and an attenuated serial-position curve occurs.

Although the evidence thus far does not distinguish dual-process models from the IRM, we conclude that a single-factor model cannot accommodate our serial-position curves.