Elsevier

Neuropharmacology

Volume 63, Issue 6, November 2012, Pages 1012-1021
Neuropharmacology

Establishing a probabilistic reversal learning test in mice: Evidence for the processes mediating reward-stay and punishment-shift behaviour and for their modulation by serotonin

https://doi.org/10.1016/j.neuropharm.2012.07.025Get rights and content

Abstract

Valid animal models of psychopathology need to include behavioural readouts informed by human findings. In the probabilistic reversal learning (PRL) task, human subjects are confronted with serial reversal of the contingency between two operant stimuli and reward/punishment and, superimposed on this, a low probability (0.2) of punished correct responses/rewarded incorrect responses. In depression, reward-stay and reversals completed are unaffected but response-shift following punished correct response trials, referred to as negative feedback sensitivity (NFS), is increased. The aims of this study were to: establish an operant spatial PRL test appropriate for mice; obtain evidence for the processes mediating reward-stay and punishment-shift responding; and assess effects thereon of genetically- and pharmacologically-altered serotonin (5-HT) function. The study was conducted with wildtype (WT) and heterozygous mutant (HET) mice from a 5-HT transporter (5-HTT) null mutant strain. Mice were mildly food deprived and reward was sugar pellet and punishment was 5-s time out. Mice exhibited high motivation and adaptive reversal performance. Increased probability of punished correct response (PCR) trials per session (p = 0.1, 0.2 or 0.3) led to monotonic decrease in reward-stay and reversals completed, suggesting accurate reward prediction. NFS differed from chance-level at p PCR = 0.1, suggesting accurate punishment prediction, whereas NFS was at chance-level at p = 0.2–0.3. At p PCR = 0.1, HET mice exhibited lower NFS than WT mice. The 5-HTT blocker escitalopram was studied acutely at p PCR = 0.2: a low dose (0.5–1.5 mg/kg) resulted in decreased NFS, increased reward-stay and increased reversals completed, and similarly in WT and HET mice. This study demonstrates that testing PRL in mice can provide evidence on the regulation of reward and punishment processing that is, albeit within certain limits, of relevance to human emotional-cognitive processing, its dysfunction and treatment.

Highlights

► Depressed patients exhibit deficits in probabilistic reversal learning (PRL). ► A PRL paradigm was developed for C57BL/6 mice. ► Operant responding was sensitive to probability of punished correct responses (PCR). ► Heterozygous serotonin transporter mutant mice exhibited reduced sensitivity to PCR. ► Low-acute SSRI escitalopram administration also reduced sensitivity to PCR.

Introduction

Altered emotional-cognitive processing of negative and positive events is central to depression psychopathology, as indicated by the two core diagnostic symptoms of depressed mood (sadness, emptiness) and loss of interest or pleasure (anhedonia) (APA, 2000). Depressed mood reflects high focus on past, current or potential future negative (aversive) events, which elicits feelings and states such as sadness, frustration and catastrophisation (Abramson et al., 1989; Eshel and Roiser, 2010; Nandrino et al., 2004). Anhedonia reflects low focus on past, current, and potential future positive events (rewards), which leads to reduced pleasure and interest (Berridge and Robinson, 2003; Henriques and Davidson, 2000; Sloan et al., 2001). Automated psychological (operant) tasks that require development of cognitive associations between stimuli and outcomes allow for quantitative assessment of affective responsiveness to negative events (“error” feedback) and positive events (“correct” feedback). It has been reported that depression is characterised by high sensitivity to negative feedback: on memory and planning tasks, when patients receive error feedback on a trial they are more likely to also make another incorrect decision on the next trial, relative to healthy subjects. That is, depression is associated with increased emotional-cognitive reactivity to error feedback, possibly due to exaggerated punishment expectancy/prediction, manifested as increased likelihood of additional errors on subsequent trials (Elliott et al., 1996, 1997).

The probabilistic reversal learning (PRL) task is specifically designed to assess cognitive-emotional ability to develop appropriate expectations/predictions about stimulus–feedback associations on the basis of a combination of accurate and misleading feedback (Chamberlain et al., 2006; Cools et al., 2002; Evers et al., 2005; Jocham et al., 2009). Using two-way operant-stimulus or spatial-response discrimination on a computer touch-screen, the subject is instructed to select one stimulus on each trial in order to maximize correct feedback in the form of rewards and to minimize error feedback in the form of punishers. Rewards and punishers can be symbolic or monetary. The identity of the correct stimulus is reversed when a criterion of consecutive correct responses is attained. The reversal criterion varies within certain limits and is to an extent predictable, therefore. In addition to the accurate correct and accurate error feedback that the subject experiences, at a certain probability (e.g. 0.2) correct and incorrect responses receive misleading error and correct feedback, respectively. The subject is informed that at any one time one stimulus will usually be correct. Of particular interest are the subject's decisions on trials that follow misleading error feedback i.e. when the subject is punished for a correct response. The proportion of such trials on which the subject shifts on the next trial to the incorrect stimulus i.e. punishment-shift-punishment responses, provides a measure of punishment expectancy/prediction and is referred to as negative feedback sensitivity (NFS). High NFS is indicative of a cognitive-emotional over-estimation of punishment expectancy/prediction for that stimulus. Depressed patients, whilst being largely unimpaired in the acquisition and reversal of probabilistic reversal in the PRL task, exhibit 2–3 times higher NFS than do healthy controls (Murphy et al., 2003; Taylor Tavares et al., 2008).

There is a current deficiency of animal tests for detailed study of depression-relevant emotional-cognitive processes (Pryce and Seifritz, 2011). This is despite the clinical and therapeutic importance of altered emotional-cognitive processing in depression psychopathology, the availability of non-verbal emotional-cognitive tasks such as PRL to quantify such states, and the potential for these tasks to be adapted to animal species. One exception is indeed the PRL task, a version of which has been recently described for the rat (Bari et al., 2010). This rat automated PRL test is based on operant responding (nose poking) in a two-way spatial (rather than stimulus) discrimination, and a 0.2 probability for both correct and incorrect responses being followed by inaccurate (i.e. non-reversal) negative and positive feedback, respectively. Subjects were food deprived to induce high motivation for correct responding with sugar pellet reinforcement, and incorrect responses received error feedback in the form of no reward plus a time-out delay until the next trial. Compared to human performance, rats exhibited relatively low reward-stay behaviour (p = 0.6–0.8 versus close to 1.0 in human) and relatively high NFS (p = 0.4–0.6 versus 0.1 in human) and across 200 trials achieved 2–3 reversals on a reversal contingency of eight consecutive responses (Bari et al., 2010).

The first aim of the present study was to attempt to establish a mouse automated PRL test based on nose-poke responding in a two-way spatial discrimination as used in the rat. Mice were trained to a high level of reversal learning performance prior to introducing PRL conditions. Then the effects of different per session probabilities of punished correct responding on reward-stay, punishment-shift and reversals completed, were studied. It was reasoned that these data would provide insights into the emotional-cognitive processes underlying PRL behaviour in mouse. One assumption underlying the human PRL task is that subjects are cognitively able to acquire accurate reward and punishment expectancy/prediction and thereby to exhibit both (1) accurate rule reversal learning and (2) a probabilistic strategy of maximising correct responses without frequent shifting from the just-punished stimulus (Evers et al., 2005; Murphy et al., 2003). Compared to the human data, relatively low reward-stay probability and relatively high NFS were observed in the rat study (Bari et al., 2010). These findings suggest that applying human PRL test parameters one-to-one to rodents could run counter to the assumption that subjects are cognitively able to maintain accurate reward and punishment expectancy. Extrapolating these important rat findings to the current mouse study, in an attempt to maximize the likelihood of establishing a mouse automated PRL test where subjects are able to acquire accurate reward and punishment expectancy/prediction, the protocol was a priori designed to be cognitively less demanding (i.e. less inaccurate/misleading feedback) than the human task. Therefore, misleading feedback was given exclusively in the form of punished correct responding and no rewarded incorrect response trials were used. It is punished correct responding (and not rewarded incorrect responding) which is the basis of one of the major PRL parameters, NFS. In fact, there is at least one example of a human PRL task that is also without rewarded incorrect responses (Ersche et al., 2008). A recent study of a mouse manual PRL test that used a spatial maze and probability of 0.2 for both correct and incorrect responses being followed by inaccurate feedback, reports that reward-stay was only p = 0.5, i.e. chance level (Amodeo et al., 2012), thereby providing further evidence for the utility of reducing cognitive demand. Therefore, whilst not using rewarded incorrect response trials is a proviso that needs to be taken into account when comparing the findings of the current study with existing rat and mouse PRL data, these existing data indicate the need for this adjustment in order to better satisfy the assumption of accurate reward/punishment expectancy underlying the human PRL test and thereby to increase the relevance of mouse PRL findings to human.

The second aim of the study was to investigate the effects of manipulation of serotonergic function on mouse behaviour in the PRL test. Aversive stimuli (punishments) induce 5-HT release in cortico-limbic-striatal regions (Millan, 2003). Accordingly, 5-HT has been proposed as a major mediator of adaptive behavioural responses to punishing stimuli, including those experienced in tasks such as PRL (Boureau and Dayan, 2011; Cools et al., 2011, 2008;). Genetic polymorphisms that impact on 5-HT signalling could also impact on aversive stimulus processing. For example, the “short” form of the 5-HT transporter-linked polymorphic region of the gene (SLC6A4, 5-HTT) is associated with decreased 5-HTT/increased 5-HT activity, increased reactivity to aversive stimuli, and is a risk factor for depression (Canli and Lesch, 2007; Caspi et al., 2003; Hariri et al., 2002). One of the major hypotheses for depression pathophysiology is 5-HT deficiency (Sharp and Cowen, 2011), and altered sensitivity to rewarding and punishing feedback could mediate the effects of 5-HT deficiency on depression psychopathology. Accordingly, a mouse translational PRL test for the study of effects of 5-HT manipulations would be beneficial. The present study was conducted with wildtype (WT) and heterozygous mutant (HET) mice from a 5-HTT (Slc6a4) null mutant strain. Relative to WT, the 5-HTT HET mouse exhibits reduced 5-HT clearance (Montanez et al., 2003) and increased extracellular 5-HT levels (Mathews et al., 2004), but otherwise normal 5-HT transmission (Jennings et al., 2010). It provides a model for human 5-HTT polymorphisms that are associated with reduced 5-HTT function (Murphy and Lesch, 2008). The selective 5-HTT blocker citalopram, when administered acutely as a tool compound, was found to increase NFS in healthy humans (Chamberlain et al., 2006) and to increase and decrease NFS at low and high doses, respectively, in rat (Bari et al., 2010). In the present study the potent 5-HTT blocker escitalopram was used as a tool compound in order to investigate the effects of acute, specific blocking of 5-HTT in WT and HET mice, using the acute doses demonstrated to be effective in mouse antidepressant- and anxiolytic-screening tests (Sanchez et al., 2003). Therefore, the effects of genetically- and pharmacologically-induced reduced 5-HTT function were studied in a novel mouse PRL test. It is important to note that the present study did not aim to produce a mouse model of depression-relevant deficits in PRL behaviour, but rather to establish and validate a mouse PRL test. The major application of such a test would then be to investigate whether environmental and genetic manipulations induce depression-relevant deficits in PRL behaviour, as observed in human depression, and to use such a model to study their neuropharmacological reversal.

Section snippets

Animals

Male and female mice of a 5-HTT null mutant strain on a C57BL/6J background (>20 backcross generations) were transferred from the University of Würzburg (Bengel et al., 1998) and breeding was established in-house with WT dams and HET sires. Male offspring were weaned at age 4 weeks and caged as brother pairs throughout the study. Mice were maintained on a reversed 12:12 h light–dark cycle (white lights off at 07:00 h) in an individually-ventilated cage system, with temperature at 20–22 °C and

Operant training and reversal learning performance

For autoshaping, mice required 3.5 ± 1.6 sessions to eat ≥30 sugar pellets in two consecutive sessions. There was no significant effect of genotype on autoshaping e.g. total number of sessions to attain criterion (F (1, 28) < 1, p < 0.58). For operant nose-poke learning, mice required 344 ± 80 total trials over 7.6 ± 2.0 sessions to achieve ≥30 rewards in each of two consecutive sessions. There was no significant effect of genotype on operant learning e.g. total number of trials to criterion (F

Discussion

The present study describes an operant two-way spatial discrimination task for the study of probabilistic reversal learning in mice, the responsiveness of behaviour to changes in the probability of punishment of correct responding, and the effects on PRL behaviour of constitutive genetic and acute pharmacological manipulation of serotonin function.

Reversal learning is the basis of the PRL task and it is important to briefly review the evidence for 5-HT modulation of this prior to discussing the

Acknowledgements

This research was funded by the Swiss National Science Foundation (grant 31003A_130499) and the National Center for Competence in Research “Neural Plasticity and Repair”. We thank Daniel Schuppli and his team for animal care and H. Lundbeck A/S for the provision of escitalopram.

References (38)

  • J.V. Taylor Tavares et al.

    Neural basis of abnormal response to negative feedback in unmedicated mood disorders

    NeuroImage

    (2008)
  • L.Y. Abramson et al.

    Hopelessness depression: a theory-based subtype of depression

    Psychol. Rev.

    (1989)
  • APA

    Diagnostic and Statistical Manual of Mental Disorders

    (2000)
  • A. Bari et al.

    Serotonin modulates sensitivity to reward and negative feedback in a probabilistic reversal learning task in rats

    Neuropsychopharmacology

    (2010)
  • D. Bengel et al.

    Altered brain serotonin homeostasis and locomotor insensitivity to 3,4-methylenedioxymethamphetamine (“ecstasy”) in serotonin transporter-deficient mice

    Mol. Pharmacol.

    (1998)
  • Y.-L. Boureau et al.

    Opponency revisited: competition and cooperation between dopamine and serotonin

    Neuropsychopharmacology

    (2011)
  • J.L. Brigman et al.

    Pharmacological or genetic inactivation of the serotonin transporter improves reversal learning in mice

    Cereb. Cortex

    (2010)
  • T. Canli et al.

    Long story short: the serotonin transporter in emotion regulation and social cognition

    Nat. Neurosci.

    (2007)
  • A. Caspi et al.

    Influence of life stress on depression: moderation by a polymorphism in the 5-HTT gene

    Science

    (2003)
  • Cited by (46)

    • Reward motivation and cognitive flexibility in tau null-mutation mice

      2021, Neurobiology of Aging
      Citation Excerpt :

      Lose-shift represents changing from the nontarget to the target lever after a punishment on the nontarget lever. Mice were trained until they reached criterion, which was defined as 2 sessions of win-stay p ≥ 0.70, at least one additional session of p ≥ 0.75, and at least 3 reversals in each of these sessions (modified from Ineichen et al., 2012). Animals had 30 training days to reach this criterion.

    View all citing articles on Scopus
    View full text