Cognitive flexibility, or the ability to adapt one’s behavior to environmental change, depends on several factors, such as learning that the reward value of stimuli can suddenly change. If a previously rewarded stimulus ceases to be rewarding, an organism must adjust its behavior accordingly. If an organism is influenced only by its past reinforcement history with the stimuli that it experienced, its ability to respond to a change in reward value should be rather slow, whereas if it is able to benefit from changes in the value of a stimulus (“learning-to-learn”), it should be able to acquire new learning faster, perhaps by learning to ignore irrelevant cues and learning to quickly inhibit earlier behavior.

To investigate the cognitive flexibility of monkeys, Harlow (1949, 1956) gave them a series of problems involving a choice between two three-dimensional objects. Choice of one object but not the other resulted in reinforcement. After a number of trials with this discrimination, the objects were replaced, and again, one of them was arbitrarily defined as correct. In this manner, Harlow’s monkeys received over 300 pairs of novel objects. He observed that for early problems, over trials there was a gradual increase in choices of the correct stimulus. For later problems, however, Harlow observed that the monkeys began choosing the correct stimulus on the second trial of a problem. He explained this change in accuracy as the acquisition of a learning set, which he defined as “learning how to learn efficiently in a situation an animal frequently encounters” (Harlow, 1949, p. 51).

An alternative approach to the question of learning-to-learn by animals has been to test animals with a serial reversal, a task in which animals are given a simultaneous discrimination that is reversed following acquisition (what was once correct is now incorrect), and then reversed again and again (Mackintosh, McGonigle, Holgate, & Vanderver, 1968). The question is, would animals show improved reversal learning with successive reversals. If one uses original learning as a baseline against which to measure improvement, one should be able to control for the difficulty of the original discrimination and thereby make comparisons among different species. That is, the degree of improvement relative to baseline should be a measure of the animal’s cognitive flexibility (Bitterman, 1975). Research has shown that a variety of animals, including apes and monkeys (Beran, Klein, Evans, Chan, Flemming, Harris et al., 2008; Warren, 1966), horses (Martin, Zentall, & Lawrence, 2006), rats (Bushnell & Stanton, 1991; Reid & Morris, 1992), and birds (Bond, Kamil, & Balda, 2007; Ploog & Williams, 2010), show substantial improvement with reversals, which suggests that this type of flexibility has adaptive value (Shettleworth, 1998).

The optimal strategy with a task that involves multiple reversals is to base one’s choice on the consequences of the last trial. If that response was rewarded, stay with it; if it was not rewarded, then shift to the alternative response. To maximize reinforcement, animals often have to learn to behave in ways that deviate from their natural tendencies. Under laboratory conditions, one can examine these tendencies by setting up arbitrary rules for the availability of reinforcement to test the flexibility of subjects in responding to these manipulations. Pigeons show evidence of a natural predisposition to exhibit stay behavior. This hypothesis has been supported by findings that pigeons tend to perseverate, especially following a reinforced response (Randall & Zentall, 1997; Zentall, Steirn, & Jackson-Smith, 1990). Similarly, pigs (Mendl, Laughin, & Hitchcock, 1997), cattle (Hosoi, Rittenhouse, Swift, & Richards, 1995), and sheep and goats (Hosoi, Swift, Rittenhouse, & Richards, 1995) also appear to have a win–stay response bias. By contrast, other species, such as rats (Olton & Schlosberg, 1978), Siamese fighting fish (Roitblat, Tham, & Golub, 1982), and honeybees (Demas & Brown, 1995), appear to be predisposed to exhibit shift behavior. From a young age, the results of learning-set experiments with human children show that they are able to adopt specific responses strategies according to the nature of the task (Berman, 1973; Piaget & Inhelder, 1966). In studies of reversal learning with human adults, normal controls show great flexibility in the use of reversal strategies; however, problems in the performance of such tasks occur with ventral prefrontal damage to the brain (Bechara, Tranel, & Damasio, 2000; Rolls, Hornak, Wade, & McGrath, 1994).

A variation on the reversal procedure is one in which each session involving a simple simultaneous discrimination begins with one stimulus (S1) as the correct (positive, S+) stimulus and a different stimulus (S2) as the incorrect (negative, S–) stimulus (i.e., S1+, S2–), and halfway through the session the discrimination reverses (S2+, S1–). The question is, if pigeons are given sufficient experience with this procedure, how will they deal with the reversal?

Cook and Rosen (in press) investigated such a midsession reversal in pigeons using matching-to-sample and oddity-from-sample tasks. For the first half of each session, reinforcement was provided for selecting the comparison stimulus that matched the sample stimulus, but for the remainder of the session, reinforcement was provided for responding to the comparison stimulus that did not match the sample stimulus. After many training sessions, Cook found that the pigeons responded appropriately at the start and end of each session, but accuracy declined at a relatively constant rate until the middle of the session, at which point the pigeons were responding close to chance, and accuracy improved from that point to the end of the session. It appeared that matching had generalized from the beginning of the session and oddity had generalized from the end of the session, and that the comparison choice was controlled by the temporal (or trial number) distance from either end of the session.

The purpose of the present experiments was to investigate whether pigeons and humans could learn to adopt an efficient reversal strategy in dealing with a single predictable reversal involving a simple simultaneous discrimination. In Experiment 1, the reversal occurred predictably after 40 trials of an 80-trial session. In Experiment 2, the single reversal was made less predictable by semirandomly varying when it occurred during each session. In Experiment 3, we tested pigeons on the variable reversal procedure with 20 pecks required on each trial, to make their prior choice more salient. In Experiments 4 and 5, using the designs of Experiments 1 and 2, we tested humans for their ability to deal, respectively, with a predictable and a less predictable reversal of a simultaneous discrimination.

Experiment 1

Method

Subjects

Ten White Carneaux pigeons (Columba livia) ranging in age from 2 to 12 years served as subjects. All subjects had had experience in previous, unrelated studies involving simultaneous color discriminations, but they had never been exposed to a discrimination reversal procedure. The pigeons were maintained at 85% of their free-feeding weight throughout the experiment. They were individually housed in wire cages with free access to water and grit in a colony room that was maintained on a 12:12-hr light:dark cycle. The pigeons were maintained in accordance with a protocol approved by the Institutional Animal Care and Use Committee at the University of Kentucky.

Apparatus

The experiment was conducted in a BRS/LVE (Laurel, MD) sound-attenuating standard operant test chamber measuring 34 cm high, 30 cm from the response panel to the back wall, and 35 cm across the response panel. Three circular response keys (2.5 cm diameter) were aligned horizontally on the response panel and separated from each other by 6.0 cm, but only the left and right side keys were used in this experiment. The bottom edge of the response keys was 24 cm from the wire-mesh floor. A 12-stimulus in-line projector (Industrial Electronics Engineering, Van Nuys, CA) with 28-V, 0.1-A lamps (GE 1820) that could project red and green hues (Kodak Wratten Filters Nos. 26 and 60, respectively) was mounted behind each response key. Mixed-grain reinforcement (Purina Pro Grains, a mixture of corn, wheat, peas, kafir, and vetch) was provided from a raised and illuminated grain feeder located behind a 5.1 × 5.7 cm aperture horizontally centered and vertically located midway between the response keys and the floor of the chamber. Reinforcement consisted of 1.5 s access to mixed grain. The experiment was controlled by a microcomputer and interface located in an adjacent room.

Procedure

At the start of each experimental session, one side key was illuminated red, the other green. For half of the subjects, a single response to the red hue (S1) turned off both stimuli and presented the pigeon with 1.5 s access to grain followed by a 3.5-s intertrial interval, whereas a response to the green hue (S2) turned off both stimuli and resulted in a 5-s intertrial interval. For the other half of the subjects, choice of the green hue (S1) was reinforced and not the red hue (S2). For the first 40 trials of each 80-trial session, subjects were trained with (S1+/S2–). From Trials 41–80, the contingencies were reversed (S2+/S1–). Subjects were trained for 50 sessions.

Results

Pigeons reached a stable level of choice accuracy within about 20 sessions of training. The percentage choice of the first correct stimulus (S1) as a function of trial number averaged over subjects for the last 20 training sessions appears in Fig. 1. The data are plotted in blocks of five trials. The results indicate that the pigeons chose S1 almost exclusively during early trials in each session, then choice of S1 declined prior to the reversal and continued to decline to almost exclusive choice of S2 following the reversal. As can be seen in Fig. 1, accuracy was only about 70% correct (choice of S1) during the five trials immediate before the reversal, and it was only about 55% correct (45% choice of S1) during the five trials immediately following the reversal. The initial dip in accuracy at the start of the session can be attributed to 1 pigeon that began each session at chance. The mean accuracy over the first five trials of each session for the remaining 9 pigeons was 98.2%. To confirm that the mean data accurately represent the data from individual subjects, the data from individual pigeons pooled over the last 20 training sessions appear in Fig. 2.

Fig. 1
figure 1

Experiment 1: mean percentage choices of the first correct stimulus (S1) as a function of trial number for the last 20 sessions of training (Sessions 31–50) with a fixed reversal point after Trial 40 (dotted line). Error bars represent standard errors of the means

Fig. 2
figure 2

Experiment 1: percentage choices of the first correct stimulus (S1) as a function of trial number for individual birds for the last 20 sessions of training (Sessions 31–50) with a fixed reversal point after Trial 40. The dotted line indicates the point at which the reversal occurred during the sessions

A more detailed view of the pigeons’ sensitivity to the reversal can be seen in the trial-by-trial plot of choice of the first correct stimulus over the blocks of trials immediately prior to the reversal (Trials 36–40) and immediately after the reversal (Trials 41–45). See Fig. 3. Although the average drop in choice of the first correct stimulus between trials from Trials 41 to 42 was 12% and averaged only 3% over the five trials prior to the reversal (Trials 36–41), that difference was not statistically reliable, t(9) = 1.19, p > .05. Thus, the pigeons were relatively insensitive to the immediate feedback from the first nonreinforced response to S1 (or perhaps the first reinforced response to S2).

Fig. 3
figure 3

Experiment 1: mean percentage choice of the first correct stimulus (S1) as a function of trial number for Trials 36–45 for the last 10 sessions of training (Sessions 41–50). The dotted line indicates the point of reversal. Error bars represent standard errors of the means

Finally, although the pigeons could have made a single shift from S1 to S2 during a session, they did not. In fact, the pigeons made an average of 7.46 switches (SEM = 1.07) during the course of a session.

Discussion

The results of Experiment 1 indicate that when pigeons are trained on a simple simultaneous discrimination that involves a single reversal midway through the session, they begin to anticipate the reversal well before it occurs—that is, well before the change in reinforcement contingencies. Furthermore, following the reversal of the reinforcement contingencies, the pigeons continue to choose the initially correct but now incorrect stimulus for at least 10 trials. That is, in spite of the fact that they show some anticipation of the reversal, they persist in choosing the once correct but now incorrect hue. This pattern of anticipatory and perseverative choices suggests that the pigeons were not using the immediate feedback from each trial as a primary indication of how to respond, but were relying on their memory of past sessions for where in the session the reversal had occurred. Furthermore, this response pattern suggests that the pigeons were using a temporal- or trial-based reference memory in performing this task rather than—or more accurately, in addition to—the recent (or local) feedback from their choices on trials before and after the reversal. That is, the pigeons appeared to be judging the time or number of trials at which the reversal had been experienced in preference to the more effective cue, the outcome of their choice on the immediately preceding trials. Although this strategy may appear to be relatively inefficient because it resulted in a large number of errors, especially on trials close to the point of reversal, it did produce a relatively high degree of overall accuracy when averaged over the entire session (90.7% correct).

In Experiment 2, we asked whether time or trial number from the start of the session would result in less control over stimulus choice if it could not always serve as a reliable cue for the reversal. That is, would the pigeons be more sensitive to the local history of reinforcement (i.e., the occurrence of reinforcement and nonreinforcement experienced on the most recent trials) if the reversal occurred at a point in the session that varied from session to session?

Experiment 2

Method

Subjects

The subjects were 8 White Carneaux pigeons (Columba livia) similar in age and experience to those used in Experiment 1. The subjects were housed and maintained under the same conditions as the previous subjects.

Apparatus

The apparatus was similar to that used in Experiment 1.

Procedure

The procedure was the same as in Experiment 1, with the exception that the location of the reversal in each session was varied from session to session in a semirandom fashion that equated for the number of sessions at each reversal location (i.e., the reversal location was sampled from a uniform distribution and occurred equally often after 10, 25, 40, 55, or 70 trials). The same semirandom series of sessions was used for all pigeons. Subjects were tested on the variable reversal procedure for a total of 100 sessions (20 sessions at each reversal location).

Results

When the reversal occurred early in the session, the pigeons made few anticipatory errors but a large number of perseverative errors, and when the reversal occurred late in the session, the subjects made many anticipatory errors. When the reversal occurred at the midpoint of the session, the subjects made both anticipatory and perseverative errors. The mean percentage choice of the first correct stimulus as a function of trial number pooled over the last 25 sessions of training (the last 5 sessions at each reversal location) appears in Fig. 4. Compared with the results from Experiment 1, subjects showed somewhat more sensitivity to the local history of reinforcement, as indicated by the separation of the functions among the points of reversal. The overall percentage correct was highest when the reversal point was at the midpoint of the session and lowest when the reversal point was near the end of the session. For the reversals after 10, 25, 40, 55, and 70 trials, the mean percentage errors were 13.7%, 10.0%, 9.6%, 11.6%, and 21.8%, respectively. A one-way repeated measures ANOVA performed on the data from the last 20 sessions of training as a function of location of the reversal indicated that there was a significant difference in numbers of errors among the reversal points, F(4, 28) = 17.06, p < .01. The pigeons made significantly fewer errors when the reversal occurred after Trial 40 rather than after Trial 70, F(1, 7) = 57.97, p = .0001, suggesting that the pigeons tended to average the reversal points experienced in training. Errors made when the reversal occurred after Trial 10 also differed from those made when the reversal occurred after Trial 40, but only marginally so, F(1, 7) = 4.51, p = .07. However, the fact that they made significantly fewer errors when the reversal occurred after Trial 10 than after Trial 70, F(1, 7) = 15.84, p < .01, suggested that the local history of reinforcement played a role as well. Trend analyses performed on the errors as a function of the reversal locations indicated that the linear trend was not significant, F(1, 36) = 1.41, p > .05, but that there was a significant quadratic trend, F(1, 36) = 30.19, p < .03.

Fig. 4
figure 4

Experiment 2: mean percentage choices of the first correct stimulus (S1) as a function of trial number for the last 25 sessions of training (the last 5 sessions for each reversal point). The dotted lines indicate the point of reversal for each function

Comparison of the overall percentage correct on the last 20 sessions for the pigeons in Experiment 1 (90.7%) and for the pigeons in Experiment 2 on sessions in which the reversal occurred on Trial 41 (91.0%) indicated that the difference was not statistically significant, t < 1.

Examination of Fig. 4 suggests that anticipatory errors increased as the point in the session at which the reversal occurred increased. A measure of anticipatory errors is the difference in errors between the first block of five trials in the session and the last block of five trials prior to the reversal. When the reversal occurred after Trial 10, there was no significant increase in anticipatory errors, t(14) = 1.53, p < .05. When the reversal occurred after Trial 25, there was also no significant increase in anticipatory errors, t(14) = 1.53, p < .05. However, when the reversal occurred after Trial 40, the increase in anticipatory errors was marginally significant, t(14) = 1.92, p = .08, and when the reversal occurred after Trial 55, there was a significant increase in anticipatory errors, t(14) = 5.37, p < .0001. Finally, when the reversal occurred after Trial 70, there was also a significant increase in anticipatory errors, t(14) = 5.83, p < .0001.

Examination of Fig. 4 also suggests that perseverative errors decreased as the point in the session at which the reversal occurred increased. A measure of perseverative errors is the difference in errors between the first block of five trials after the reversal and the last block of trials (Trials 76–80). It should be noted that the five trials following a reversal include the first reversal trial. Because on this trial the feedback (nonreinforcement) came after the choice, the maximum likely probability of reinforcement for that five-trial block would have been .80. For this reason, the error rate calculated for perseverative errors for the first trial block following the reversal was reduced by 20%. At all points in the session at which a reversal occurred there were significant numbers of perseverative errors, all ts > 5.36, all ps < .001.

Another way to compare the results of Experiment 2 with those of Experiment 1 is to compare the pigeons’ ability to react to the reversal at the only common point of reversal (Trial 41). When pooled over Sessions 31–50, pigeons in Experiment 2 showed a decrease in responses to S1 of 29.0% on the first trial block following the reversal, whereas those in Experiment 1 showed a decrease in responses to S1 of 25.4%, a difference that was not statistically significant, t < 1.

Once again, a more detailed view of the pigeons’ sensitivity to the reversal can be seen in the trial-by-trial plot of choice of the first correct stimulus over the blocks of trials immediately prior to and immediately after the reversal (see Fig. 5). The average drop in choice of the first correct stimulus between the first and second reversal trials was 7.5%, whereas it had averaged only 0.9% over the five trials prior to the reversal. This difference was not statistically reliable, t(7) = 1.99, p < .05. The small difference in response to the first nonreinforced response to S1 (or the first reinforced response to S2) suggests that the pigeons were relatively insensitive to the immediate feedback from their response.

Fig. 5
figure 5

Experiment 2: mean percentage choices of the first correct stimulus (S1) as a function of relative trial number (five trials prior to and five trials after the reversal) for the last 10 sessions of training (Sessions 41–50). The dotted line indicates the point of reversal. Error bars represent standard errors of the means

Once again, although the pigeons could have made a single shift from S1 to S2 during a session, they did not. In fact, the pigeons made an average of 8.81 switches (SEM = 1.11) per session.

Discussion

The rationale for varying the location in the session at which the reversal occurred was to discourage the use of timing or counting trials and to encourage the use of the local history of reinforcement. The results of Experiment 2 indicate that in a within-session reversal task, in which the location at which the reversal occurs varies less predictably over sessions, although pigeons did show some control by the local history of reinforcement, they continued to show control by time or trial number as well. If subjects were relying solely on local reinforcement contingencies, they would not have shown as many perseverative errors when the reversal occurred early in the session, nor would they have shown as many anticipatory errors when the reversal occurred late in the session. The facts that anticipatory errors were so high prior to the reversal when it occurred after Trial 70 (an average of 49.5% errors on the five trials prior to the reversal) and that overall accuracy was best when the reversal occurred at the midpoint of the session suggest that there was a tendency for the pigeons to average the reversal locations over the training sessions. That is, when the reversal location occurred equally often after Trials 10, 25, 40, 55, and 70 (the average of which is 40), the pigeons tended to act as if the reversal point occurred near the middle of the session. This was especially surprising for perseverative errors, because all of the local reinforcement history should have been inconsistent with these errors. Although the pigeons made only 11.5% errors on the third block of five trials following the reversal when it occurred after Trial 40, they made 35% perseverative errors on the third block of five trials following the reversal when it occurred after Trial 10. Thus, time or trials into the session continued to play a role in the pigeons’ choices, in spite of the fact that they were not reliable cues, whereas the local history of reinforcement, which did provide a reliable cue, only moderately controlled pigeons’ choice behavior.

However, the relative asymmetry of perseverative errors and anticipatory errors indicates that there was some sensitivity to the differential cues provided by those two kinds of errors. Perseverative errors should be less likely because they provide valid cues about the future contingencies of reinforcement (any error after the reversal predicts the reinforcement contingencies on the remaining trials in the session), whereas anticipatory errors do not provide valid cues about future contingencies (choice of a stimulus that resulted in an anticipatory error does not predict the results of choice of that same stimulus on a later trial). Support for this difference can also be seen in the significant difference between the percentage errors found when the reversal occurred after Trial 10 rather than after Trial 70.

In Experiment 2, when the peck requirement to receive the outcome on a given trial was a single peck, subjects showed some sensitivity to the reinforcement contingencies, but they also appeared to rely on timing to predict the reversal. That is, although they were somewhat sensitive to the outcome of their response on each trial, they tended to average the points of reversal such that they were most efficient when the reversal occurred after Trial 40, in spite of the fact that the reversal was no more likely to occur after Trial 40 than after any one of the other four points in the session.

One reason that the pigeons were not more sensitive to the consequence of their choices on recent trials is because it is possible that their memory for those choices was poor. Only 1 peck was required on each trial, and there is evidence that memory in pigeons for a stimulus to which a single response is made is substantially worse than if multiple pecks are required (e.g., Rayburn-Reeves, Miller, & Zentall, 2010, in counting research, and Roberts, 1972, in matching-to-sample research). Thus, the purpose of Experiment 3 was to determine if increasing the response requirement on each trial would produce greater sensitivity to the reinforcement contingencies. In Experiment 3, subjects were exposed to the variable reversal switch procedure used in Experiment 2, but 20 pecks were required to a stimulus on each trial.

Experiment 3

Method

Subjects

The subjects were 8 White Carneaux pigeons (Columba livia) similar to those used in Experiments 1 and 2. The subjects were housed and maintained under the same conditions as in the previous experiments.

Apparatus

The apparatus was the same as that used in Experiments 1 and 2.

Procedure

Pigeons were given the same training as in Experiment 2, with the exception that a minimum of 20 pecks were required to turn off either stimulus. The choice of a stimulus was defined as the first stimulus to which 20 pecks were made. Subjects were trained from the start of the experiment with the variable reversal procedure for 100 sessions, as in Experiment 2.

Results

The results of Experiment 3 were similar to those of Experiment 2. When the reversal occurred early in the session, subjects made few anticipatory errors but a large number of perseverative errors, and when the reversal occurred late in the session, subjects made many anticipatory errors. When the reversal occurred at the midpoint of the session, pigeons again made both anticipatory and perseverative errors. The mean percentage choices of the first correct stimulus as a function of trial number pooled over the last 25 sessions of training (the last 5 sessions at each reversal point) appear in Fig. 6.

Fig. 6
figure 6

Experiment 3: mean percentage choices of the first correct stimulus (S1) as a function of trial number for the last 25 sessions of training (the last 5 sessions for each reversal point) for pigeons that were required to peck the chosen stimulus 20 times, rather than 1 time as in Experiments 1 and 2. The dotted lines indicate the point of reversal for each function

Anticipatory errors

Examination of Fig. 6 suggests that as in Experiment 2, anticipatory errors increased as the point in the session at which the reversal occurred increased. Again we examined the change in anticipatory errors as the difference in errors between the first block of five trials in the session and the last block of five trials prior to the reversal. When the reversal occurred after Trial 10, there was no significant increase in anticipatory errors, t(7) = 1.53, p < .05. When the reversal occurred after Trial 25, there was also no significant increase in anticipatory errors, t(7) = 0.70, p < .05. However, when the reversal occurred after Trial 40, the increase in anticipatory errors was marginally significant, t(7) = 2.09, p = .07, and when the reversal occurred after Trial 55, there was a significant increase in anticipatory errors, t(7) = 5.88, p < .001. Finally, when the reversal occurred after Trial 70, there was also a significant increase in anticipatory errors, t(7) = 5.98, p < .001.

Perseverative errors

Examination of Fig. 6 also suggests that perseverative errors decreased as the point in the sessions at which the reversal occurred increased. A measure of perseverative errors is the difference in errors between the first block of five trials after the reversal and the last block of trials (Trials 76–80). Once again, the five trials following a reversal included the first reversal trial. Because on this trial the feedback (nonreinforcement) came after the choice, the maximum likely probability of reinforcement for the first five-trial block after the reversal would have been .80. For this reason, the error rate calculated for perseverative errors on the first block following the reversal was reduced by 20%. At all points in the session at which a reversal occurred there were significant numbers of perseverative errors, all ts > 5.59, all ps < .001.

Comparison with the results of experiment 2

Compared with the results from Experiment 2, subjects showed no more sensitivity to the local history of reinforcement and very similar effects of varying the reversal points. The tendency to make anticipatory errors was actually somewhat greater on the five trials prior to the reversal at all points in the session at which the reversal occurred when 20 pecks were required than when 1 peck was required. A two-way mixed model ANOVA performed on the data from Experiments 2 and 3 (the difference between errors on the first block of five trials and on the last block of five trials prior to the reversal), with point of reversal as the within-subjects factor and experiment as the between-groups factor, indicated that the effect of experiment was not significant, F < 1, but that the main effect of point of reversal was significant, F(4, 56) = 34.52, p < .0001, as was the Experiment×Point of Reversal interaction, F(4, 56) = 5.61, p < .003

Anticipatory errors

Evaluation of the difference in anticipatory errors at each point of reversal indicated that the difference in anticipatory errors between the experiments reached statistical significance only when the reversal occurred after Trial 40, t(14) = 2.30, p = .037. In that condition, anticipatory errors on the five trials prior to the reversal were 27.0% when 20 pecks were required (see Fig. 6), whereas they were only 10.5% when a single peck was required (see Fig. 4).

Sensitivity to reversal

Another means of comparing the results of Experiments 2 and 3 is to ask how sensitive the pigeons were to the feedback from the first five reversal trials. The differences in percentage choices of the initially correct stimulus on the five trials prior to and the five trials after the reversal, pooled over the five points of reversal, were comparable for the pigeons that were required to make 20 pecks (24.1%, Experiment 3) and the pigeons that were required to make only 1 peck (21.9%, Experiment 2), t < 1. Furthermore, similar comparisons made at each point of reversal in the session indicated that the reductions in choice of the initially correct stimulus on the five trials prior to the reversal and the five trials after the reversal were similar for the pigeons required to make 20 pecks and for those required to make only 1 peck, all ts < 1.

Perseverative errors

A two-way mixed model ANOVA performed on the perseverative errors from Experiments 2 and 3 (the difference between errors on the first block of five trials and the last block of five trials prior to the reversal), with point of reversal as the within-subjects factor and experiment as the between-groups factor, indicated that the effect of experiment was significant, F(1, 14) = 13.12, p < .001, and that both the main effect of point of reversal, F(4, 56) = 34.52, p < .001, and the Experiment×Point of Reversal interaction, F(4, 56) = 40.79, p < .001, were significant. On average, fewer perseverative errors were made in Experiment 3 than in Experiment 2, and this was especially true when the reversal occurred at the midpoint in the session. But, as can be seen in Figs. 4 and 6, this difference was likely affected by the fact that the pigeons in Experiment 3 made more anticipatory errors when the reversal occurred in the middle of the session. With more anticipatory errors, there was less opportunity for perseverative errors.

Once again, for the pigeons in Experiment 3, the overall percentage correct was highest when the reversal location was at the midpoint of the session and lowest when it was near the end of the session. For the reversals after 10, 25, 40, 55, and 70 trials, mean errors were 15.2%, 11.8%, 11.6%, 17.6%, and 24.7%, respectively. A one-way repeated measures ANOVA performed on the data from the last 25 sessions of training as a function of location of reversal indicated that there was a significant difference among reversal points, F(4, 28) = 32.63, p < .01. Once again, the pigeons made significantly fewer errors when the reversal occurred after Trial 40 than after Trial 70, F(1, 7) = 70.31, p < .0001, and in Experiment 3 they also made significantly fewer errors when the reversal occurred after Trial 40 than when it occurred after Trial 10, F(1, 7) = 9.81, p = .017, suggesting that they still showed a tendency to average reversal points over training. However, the fact that they made significantly fewer errors when the reversal occurred after Trial 10 rather than after Trial 70, F(1, 7) = 19.81, p = .003, suggests that the local history of reinforcement played a role as well. Once again, trend analyses performed on the errors as a function of the reversal point indicated that the linear trend was not significant, F(1, 36) = 3.36, p > .05, but that there was a significant quadratic trend, F(1, 36) = 83.17, p < .01.

Again, a more detailed view of the pigeons’ sensitivity to the reversal can be seen in the trial-by-trial plot of choices of the first correct stimulus over the blocks of trials immediately prior to and immediately after the reversal (see Fig. 7). The average drop in choice of the first correct stimulus between the first and second reversal trials was 7.0%, whereas it had averaged only 2.6% over the five trials prior to the reversal. This difference was not statistically reliable, t < 1. Again, the small difference in response to the first nonreinforced response to S1 (or the first reinforced response to S2) suggests that the pigeons were relatively insensitive to the immediate feedback from their responses.

Fig. 7
figure 7

Experiment 3: mean percentage choices of the first correct stimulus (S1) as a function of relative trial number (five trials prior to and five trials after the reversal) for the last 10 sessions of training (Sessions 41–50). The dotted line indicates the point of reversal. Error bars represent standard errors of the means

Once again, the number of switches during a session could have been as few as 1, but it was much greater. In fact, the pigeons made an average of 11.42 switches (SEM = 1.70) per session.

Discussion

The results of Experiment 3 indicate that when the reversal point varied over sessions unpredictably and 20 pecks were required to a stimulus on each trial, pigeons continued to show control by time or trial number. Apparently, although the added response requirement may have enhanced the pigeons’ memory for the stimulus last chosen, it may not have enhanced the pigeons’ memory for the consequence of that trial. Furthermore, it may have resulted in reduced memory for the consequences of the trials prior to the immediately preceding trial. The only appreciable difference in the results of Experiments 2 and 3 was the greater increase in anticipatory errors and decrease in perseverative errors in Experiment 3 when the reversal occurred near the middle of the session.

Given the results from Experiments 1, 2, and 3, it appears that pigeons may not be able to refrain from using mean time or number of trials into the session as a cue for reversal of the discrimination. In the limit, the efficient use of the local history of reinforcement, if applied to this single-reversal simultaneous discrimination, would result in the use of a win–stay/lose–shift response strategy. That is, if subjects based their choice on each trial on the outcome of the choice from the preceding trial, it would result in a high level of accuracy (only one error per session).

In Experiments 4 and 5, we asked if humans, who should have a much greater ability to use rule-based strategies, would learn to use a win–stay/lose–shift strategy with a similar task in which the reversal occurred at the midpoint of a session (Experiment 4), as in Experiment 1, or at semirandom points during a session (Experiment 5), as in Experiments 2 and 3. Specifically, in Experiment 4 we asked whether, if humans had ten 24-trial sessions of reversal discrimination training in which the reversal occurred at the midpoint of each session (Trial 13), they would use a win–stay/lose–shift choice strategy. In Experiment 5, we asked whether humans would choose differently if the within-session reversal was less predictable (i.e., if it occurred semirandomly after Trial 5, 9, 13, 17, or 21).

Experiment 4

Method

Participants

Ten undergraduate students (5 males, 5 females, ages 18–23) at the University of Kentucky participated in this study in partial fulfillment of a psychology course requirement. The study was conducted according to a protocol approved by the University of Kentucky Institutional Review Board.

Apparatus

The apparatus consisted of a personal computer located in a small room. Responses were recorded by means of a standard mouse. The task was programmed using Visual Basic 6 (Microsoft).

Procedure

Participants were run individually. They were placed in front of the computer, where instructions were presented in a white rectangular window on the screen.

You are about to play a card game. You will see two different cards on a table. On each trial you will have to guess which one is the winning card. To play this game, we ask you not to count cards. When you are ready to begin, click below.

Instructions to avoid counting or performing rhythmic activities have been found to be successful in preventing the biases that may be produced by adopting such chronometric counting strategies (Grondin, Ouellet, & Roussel, 2004). A gray button with the word “start” was presented at the bottom center of the screen.

On discrimination trials, the computer screen displayed a green rectangle (14 cm high×18 cm wide) centered on a dark background. A button labelled “play” appeared at the bottom of the green rectangle. Each trial began when the participant clicked the “play” button. Two playing cards (the 10 of spades and the 10 of clubs, approximately 2.5 cm high×1.8 cm wide) served as the discriminative stimuli (S1 and S2) and were displayed on the right and left above the “play” button. S1 and S2 were randomly assigned to the two cards for each participant. Just below each discriminative stimulus and equidistant from the “play” button was a button labelled “choice.” S1 and S2 appeared equally often on the left and the right, but neither appeared on the same side more often than three trials in a row.

Participants gave their responses by clicking the button of their choice with the mouse, which caused the cards to disappear. If the choice was correct, a gray rectangular window (1 cm high×4 cm wide) at the top of the green rectangle turned yellow and indicated “You Win!” If the choice was incorrect, the gray rectangular window turned red and indicated “You lose.” Once this feedback was displayed, the “play” button was active again. By clicking on the “play” button, the feedback window turned gray and the discriminative stimuli were presented again.

Participants were given ten 24-trial sessions in which responses to one stimulus (S1+) were correct and responses to the other stimulus (S2–) were incorrect for the first half of the session (Trials 1–12), followed by 12 additional trials in which the contingencies were reversed (S1–/S2+). Immediately after completion of 24 trials, the following instructions appeared on the screen: “Let’s play again! Remember, we ask you not to count. When you are ready to begin, click below.”

The task was the same for each session: 12 trials of S1+/S2– followed by 12 trials of S1–/S2+. After ten sessions of reversal discrimination training, there was a postexperimental question in which participants were asked how they made their choices during the card game.

Results

Participants acquired the midsession reversal task quickly. Percentage choices of the first correct stimulus (S1) as a function of trial number averaged across all 10 participants are shown in Fig. 8. Choice of S1 on Sessions 1–5 and 6–10 are depicted separately. For the first five sessions, almost all participants responded to the first correct stimulus from Trial 2 to Trial 13 (the first reversal trial). Accuracy on the first trial was somewhat lower than on other trials (approximately 80%) because participants had to guess which card was correct on Trial 1 of Session 1, as well as on Trial 1 of Session 2, because on Session 2 the most recent correct card was S2. For Sessions 1–5, mean choice of S1 on Trial 13 dropped from 96% to 18% choice of S1 on Trial 14 and then to almost exclusive choice of S2.

Fig. 8
figure 8

Experiment 4: mean percentage choices of the first correct stimulus (S1) as a function of trial number for sessions 1–5 and sessions 6–10 of training with a fixed reversal point after trial 12 (dotted line)

For Sessions 6–10, however, choice was somewhat different. As can be seen in Fig. 8, participants began responding to S2 prior to the reversal. Although choice of S1 on Trial 11 was 96%, choice of S1 on Trial 12 was 86%, and choice of S1 on Trial 13 was 74% (the first trial on which feedback would indicate that S2 was now correct). A repeated measures ANOVA performed on the data from Trials 11, 12, and 13 indicated that the drop in S1 choice over trials was statistically significant, F(2, 18) = 5.14, p = .017. Further, correlated t tests on the percentage choices of S1 indicated that although the drop in accuracy from Trial 11 (96%) to Trial 12 (86%) was not statistically significant, t(9) = 1.86, p > .05, the drop in accuracy from Trial 12 (86%) to Trial 13 (74%) was significant, t(9) = 2.25, p < .05. Thus, these human participants appear to have been anticipating the reversal.

Although the humans made anticipatory errors, it is of interest to ask whether those errors cost the participants in percentage correct choice relative to a presumably optimal win–stay/lose–shift strategy, which would cost 1.0 error per session. On Sessions 6–12, participants made 0.04 errors on Trial 11, 0.14 errors on Trial 12, and 0.74 errors on Trial 13. Thus, total anticipatory errors were 0.92, or almost exactly the same as would have been produced had the participants used a win–stay/lose–shift strategy. Thus, anticipatory errors did not result in an overall increase in the optimal number of errors.

Discussion

In Experiment 4, we asked whether human participants would produce results similar to those found with pigeons when tested under somewhat similar conditions. Although the anticipatory effect was considerably smaller and occurred later with humans than with pigeons, the effect was somewhat different from a win–stay/lose–shift strategy. On the other hand, unlike the pigeons, the human participants showed little indication of perseveration following the reversal. In other words, although the human participants did not consistently adopt a win–stay rule (they tried to anticipate the reversal), they did quite consistently adopt a lose–shift rule.

It may be that when a reversal occurs at a predictable point in the session, humans will sometimes try to anticipate the reversal in an effort to avoid making an error. Although anticipatory errors did not lead to an increase in total errors made relative to a win–stay/lose–shift strategy, we were interested in whether humans would use a more consistent win–stay/lose–shift strategy if the location of the reversal in the session was less predictable. In Experiment 5, we semirandomly varied the point in the session at which the reversal occurred.

Experiment 5

Method

Participants

Ten undergraduate students (5 males, 5 females, ages 18–23) from the University of Kentucky were recruited to participate in this study in partial fulfillment of a course requirement.

Apparatus and procedure

The apparatus was the same as in Experiment 4. In Experiment 5, participants were given the same simple discrimination procedure with a single reversal, with the exception that in each session, the trial on which the reversal occurred was semirandomly selected (Trial 5, 9, 13, 17, or 21), with the constraint that each point of reversal occurred equally often. All participants were trained for 10 sessions, as in Experiment 4.

Results

The participants acquired the task quickly and, because the point of reversal was no longer predictable, the participants appeared to use a win–stay/lose–shift strategy. The data from Experiment 5, averaged over the participants and plotted relative to the location in the session at which the reversal occurred, appear in Fig. 9. The percentage choices of the first correct stimulus (S1) are plotted as a function of relative trial number. Because the number of trials before and after the reversal varied with the location in the session at which the reversal occurred, only four trials prior to and four trials after the reversal appear in the figure. As can be seen, independently of the point of reversal, participants consistently chose S1 on the four trials prior to the reversal as well as on the first trial of the reversal, and then consistently chose S2 from that point on.

Fig. 9
figure 9

Experiment 5: mean percentage choices of the first correct stimulus (S1) as a function of trial number for sessions 6–10 relative to the variable reversal point (indicated by the dotted line) for the four trials prior to and the four trials following the reversals that were experienced, pooled over sessions

Discussion

The results of Experiment 5 indicate that when human participants are given a simple simultaneous discrimination with a single reversal in which the location at which the reversal occurs varies unpredictably from session to session, they readily adopt a win–stay/lose–shift strategy. That is, they choose the first correct stimulus (S1) until the first trial in which that stimulus is incorrect, and they use that information to switch from S1 to S2 for the remainder of the session.

General discussion

The results of Experiments 1, 2, and 3 indicate that pigeons did not adopt an efficient choice strategy. The facts that both anticipatory and perseverative errors occurred when the reversal occurred at a constant location in the session and that subjects’ accuracy did not show much improvement when the reversal’s location in the session was less predictable suggest that there was some temporal generalization. That is, even for pigeons in Experiments 2 and 3, there was a tendency to average the reversal locations over sessions. In support of this hypothesis, choice accuracy was highest on sessions in which the reversal occurred at the session’s midpoint. Thus, pigeons tended to use the time into the session (or trial number) as at least part of the basis for their choices. It appears that under the present conditions, time or number as a cue is both readily adopted and difficult to overcome.

In a time–place learning task, animals learn that food can be found at different locations at different times of day or at different times into the session. There is evidence from research on time–place learning in animals that pigeons trained to use the passage of time to predict the availability of food tend to switch responding to the next location before the first location ceases to provide food (Wilkie, Saksida, Samson, & Lee, 1994; Wilkie & Willson, 1992). Even stronger evidence that time rather than the absence of food is often used as a cue for moving to the next food location is the finding that when rats are trained on a time–place learning task and test trials are inserted in which all locations provide food at all times, the rats continue to respond to the different locations as if the training contingencies were still in effect (Carr, Tan, Thorpe, & Wilkie, 2001). That is, they base their criterion for moving to the next location on the passage of time rather than on the absence of food at the current location.

In the present experiment, although time/number appears to maintain some control over reversal behavior in pigeons, the asymmetry in anticipation and perseveration errors suggests that the local history of reinforcement plays some role as well. That is, anticipatory errors late in the session (e.g., when the reversal occurs after Trial 70) are substantially greater than perseverative errors early in the session (e.g., when the reversal occurs after Trial 10).

It is interesting to speculate about why there would not be more use of local reinforcement history by pigeons in this task. In other words, why do they show such a strong tendency to anticipate the reversal when the reversal occurs late in the session?

Although the variable reversal location procedure was introduced to make the reversal less predictable, in fact, the later in the session a reversal occurred, the more predictable it was. That is, the probability of a reversal after Trial 10 was .25, whereas if it had not occurred already, the probability of a reversal after Trial 55 was .50, and if it had not occurred by then, it was guaranteed to occur at Trial 70. This could explain the relative preponderance of anticipatory errors late in the session when the reversal occurred late. We could have used an exponential distribution of the frequencies of reversal points to make the probabilities of reversing and not reversing equal at each of the reversal points in the session, but that would have resulted in a strong bias to choose the S2 stimulus. At minimum, there would have been at least 1 session with a reversal at Trial 70 and 1 without a reversal at all (an all-S1 session). Then, to equate for the probability of a reversal at each of the other reversal points, for each session with a reversal at Trial 70 there would have had to have been 2 sessions with a reversal at Trial 55, 4 sessions with a reversal at Trial 40, 8 sessions with a reversal at Trial 25, 16 sessions with a reversal at Trial 10, and, given that there was a session with all S1 trials, 32 all-S2 sessions. Thus, on 84% of the trials, S2 would have been correct, and a strong S2 bias likely would have resulted. Even if the 32 all-S2 sessions were omitted, S2 would have been correct on over two-thirds of the trials.

On the other hand, the purpose of manipulating the location of the reversal was to reduce the pigeons’ reliance on timing, and given the fact that there was a much more valid cue available (the feedback from the preceding trial), it is remarkable how tenaciously the pigeons maintained their use of the passage of time as a cue. Nevertheless, it would be interesting to see whether errors, especially anticipatory errors, would be reduced if at each potential reversal point in the session, the probability of a reversal occurring (or not) was the same. The results of such an experiment notwithstanding, it remains a puzzle that in the present experiments, the pigeons were not able to efficiently use the local reinforcement history before and after a reversal.

Surprisingly, the data from the human participants showed a similar but smaller tendency to make anticipatory errors when the reversal point was predictable, but they made few perseverative errors. However, when the reversal point was less predictable, they made few anticipatory or perseverative errors (i.e., they adopted a win–stay/lose–shift strategy). The presence of anticipatory errors when the reversal was predictable is reminiscent of the results when humans perform a discrimination in which one alternative is probabilistically correct on the majority of trials and the other is correct on the remaining trials (see, e.g., Humphreys, 1939). Under these conditions, humans tend to match the probabilities. That is, their choices tend to match the probability that each alternative is correct in spite of the fact that a better strategy, one that produces a larger number of reinforcements, is to always choose the alternative with the higher probability of reinforcement (i.e., to maximize). Humans apparently choose to match probabilities in an effort to do better than maximizing, but in so doing they do worse. In the present procedure, subjects can maximize reinforcement by adopting a win–stay/lose–shift strategy, but that means that they must make at least one error. Subjects may attempt to avoid making that error by anticipating when the reversal will take place. Although such a strategy would appear to be maladaptive because it might lead to making more errors, as it turns out, in this case, subjects did not actually make more errors, because by adopting this strategy, those anticipatory errors were balanced by the 26% correct anticipatory choices on Trial 13 (the first reversal trial).

It is possible that the ability of humans to develop a rule-based strategy comes from the fact that they live in a world where rules are abundant. Learning specific rules and being flexible enough to create multiple hypotheses and determine which hypothesis should be adopted is characteristic of our lives. That is, humans may have had considerably more prior experience with the adoption of win–stay/lose–shift strategies than pigeons. One source of such experience in humans is the acquisition of language, in which children learn to generalize rules, but they also learn that when they overgeneralize they must quickly learn to apply a different rule (Berman, 1973; Piaget & Inhelder, 1966). Thus, given all of this past experience, humans are clearly better than pigeons at acquiring this reversal task quickly, and they readily learn to make minimal errors. The fact that when the reversal occurs at a predictable time, humans, like pigeons, make anticipatory errors, but unlike pigeons, do not make perseverative errors, suggests that the learning mechanisms for these two species may be quite different.