Establishing a probabilistic reversal learning test in mice: Evidence for the processes mediating reward-stay and punishment-shift behaviour and for their modulation by serotonin
Highlights
► Depressed patients exhibit deficits in probabilistic reversal learning (PRL). ► A PRL paradigm was developed for C57BL/6 mice. ► Operant responding was sensitive to probability of punished correct responses (PCR). ► Heterozygous serotonin transporter mutant mice exhibited reduced sensitivity to PCR. ► Low-acute SSRI escitalopram administration also reduced sensitivity to PCR.
Introduction
Altered emotional-cognitive processing of negative and positive events is central to depression psychopathology, as indicated by the two core diagnostic symptoms of depressed mood (sadness, emptiness) and loss of interest or pleasure (anhedonia) (APA, 2000). Depressed mood reflects high focus on past, current or potential future negative (aversive) events, which elicits feelings and states such as sadness, frustration and catastrophisation (Abramson et al., 1989; Eshel and Roiser, 2010; Nandrino et al., 2004). Anhedonia reflects low focus on past, current, and potential future positive events (rewards), which leads to reduced pleasure and interest (Berridge and Robinson, 2003; Henriques and Davidson, 2000; Sloan et al., 2001). Automated psychological (operant) tasks that require development of cognitive associations between stimuli and outcomes allow for quantitative assessment of affective responsiveness to negative events (“error” feedback) and positive events (“correct” feedback). It has been reported that depression is characterised by high sensitivity to negative feedback: on memory and planning tasks, when patients receive error feedback on a trial they are more likely to also make another incorrect decision on the next trial, relative to healthy subjects. That is, depression is associated with increased emotional-cognitive reactivity to error feedback, possibly due to exaggerated punishment expectancy/prediction, manifested as increased likelihood of additional errors on subsequent trials (Elliott et al., 1996, 1997).
The probabilistic reversal learning (PRL) task is specifically designed to assess cognitive-emotional ability to develop appropriate expectations/predictions about stimulus–feedback associations on the basis of a combination of accurate and misleading feedback (Chamberlain et al., 2006; Cools et al., 2002; Evers et al., 2005; Jocham et al., 2009). Using two-way operant-stimulus or spatial-response discrimination on a computer touch-screen, the subject is instructed to select one stimulus on each trial in order to maximize correct feedback in the form of rewards and to minimize error feedback in the form of punishers. Rewards and punishers can be symbolic or monetary. The identity of the correct stimulus is reversed when a criterion of consecutive correct responses is attained. The reversal criterion varies within certain limits and is to an extent predictable, therefore. In addition to the accurate correct and accurate error feedback that the subject experiences, at a certain probability (e.g. 0.2) correct and incorrect responses receive misleading error and correct feedback, respectively. The subject is informed that at any one time one stimulus will usually be correct. Of particular interest are the subject's decisions on trials that follow misleading error feedback i.e. when the subject is punished for a correct response. The proportion of such trials on which the subject shifts on the next trial to the incorrect stimulus i.e. punishment-shift-punishment responses, provides a measure of punishment expectancy/prediction and is referred to as negative feedback sensitivity (NFS). High NFS is indicative of a cognitive-emotional over-estimation of punishment expectancy/prediction for that stimulus. Depressed patients, whilst being largely unimpaired in the acquisition and reversal of probabilistic reversal in the PRL task, exhibit 2–3 times higher NFS than do healthy controls (Murphy et al., 2003; Taylor Tavares et al., 2008).
There is a current deficiency of animal tests for detailed study of depression-relevant emotional-cognitive processes (Pryce and Seifritz, 2011). This is despite the clinical and therapeutic importance of altered emotional-cognitive processing in depression psychopathology, the availability of non-verbal emotional-cognitive tasks such as PRL to quantify such states, and the potential for these tasks to be adapted to animal species. One exception is indeed the PRL task, a version of which has been recently described for the rat (Bari et al., 2010). This rat automated PRL test is based on operant responding (nose poking) in a two-way spatial (rather than stimulus) discrimination, and a 0.2 probability for both correct and incorrect responses being followed by inaccurate (i.e. non-reversal) negative and positive feedback, respectively. Subjects were food deprived to induce high motivation for correct responding with sugar pellet reinforcement, and incorrect responses received error feedback in the form of no reward plus a time-out delay until the next trial. Compared to human performance, rats exhibited relatively low reward-stay behaviour (p = 0.6–0.8 versus close to 1.0 in human) and relatively high NFS (p = 0.4–0.6 versus 0.1 in human) and across 200 trials achieved 2–3 reversals on a reversal contingency of eight consecutive responses (Bari et al., 2010).
The first aim of the present study was to attempt to establish a mouse automated PRL test based on nose-poke responding in a two-way spatial discrimination as used in the rat. Mice were trained to a high level of reversal learning performance prior to introducing PRL conditions. Then the effects of different per session probabilities of punished correct responding on reward-stay, punishment-shift and reversals completed, were studied. It was reasoned that these data would provide insights into the emotional-cognitive processes underlying PRL behaviour in mouse. One assumption underlying the human PRL task is that subjects are cognitively able to acquire accurate reward and punishment expectancy/prediction and thereby to exhibit both (1) accurate rule reversal learning and (2) a probabilistic strategy of maximising correct responses without frequent shifting from the just-punished stimulus (Evers et al., 2005; Murphy et al., 2003). Compared to the human data, relatively low reward-stay probability and relatively high NFS were observed in the rat study (Bari et al., 2010). These findings suggest that applying human PRL test parameters one-to-one to rodents could run counter to the assumption that subjects are cognitively able to maintain accurate reward and punishment expectancy. Extrapolating these important rat findings to the current mouse study, in an attempt to maximize the likelihood of establishing a mouse automated PRL test where subjects are able to acquire accurate reward and punishment expectancy/prediction, the protocol was a priori designed to be cognitively less demanding (i.e. less inaccurate/misleading feedback) than the human task. Therefore, misleading feedback was given exclusively in the form of punished correct responding and no rewarded incorrect response trials were used. It is punished correct responding (and not rewarded incorrect responding) which is the basis of one of the major PRL parameters, NFS. In fact, there is at least one example of a human PRL task that is also without rewarded incorrect responses (Ersche et al., 2008). A recent study of a mouse manual PRL test that used a spatial maze and probability of 0.2 for both correct and incorrect responses being followed by inaccurate feedback, reports that reward-stay was only p = 0.5, i.e. chance level (Amodeo et al., 2012), thereby providing further evidence for the utility of reducing cognitive demand. Therefore, whilst not using rewarded incorrect response trials is a proviso that needs to be taken into account when comparing the findings of the current study with existing rat and mouse PRL data, these existing data indicate the need for this adjustment in order to better satisfy the assumption of accurate reward/punishment expectancy underlying the human PRL test and thereby to increase the relevance of mouse PRL findings to human.
The second aim of the study was to investigate the effects of manipulation of serotonergic function on mouse behaviour in the PRL test. Aversive stimuli (punishments) induce 5-HT release in cortico-limbic-striatal regions (Millan, 2003). Accordingly, 5-HT has been proposed as a major mediator of adaptive behavioural responses to punishing stimuli, including those experienced in tasks such as PRL (Boureau and Dayan, 2011; Cools et al., 2011, 2008;). Genetic polymorphisms that impact on 5-HT signalling could also impact on aversive stimulus processing. For example, the “short” form of the 5-HT transporter-linked polymorphic region of the gene (SLC6A4, 5-HTT) is associated with decreased 5-HTT/increased 5-HT activity, increased reactivity to aversive stimuli, and is a risk factor for depression (Canli and Lesch, 2007; Caspi et al., 2003; Hariri et al., 2002). One of the major hypotheses for depression pathophysiology is 5-HT deficiency (Sharp and Cowen, 2011), and altered sensitivity to rewarding and punishing feedback could mediate the effects of 5-HT deficiency on depression psychopathology. Accordingly, a mouse translational PRL test for the study of effects of 5-HT manipulations would be beneficial. The present study was conducted with wildtype (WT) and heterozygous mutant (HET) mice from a 5-HTT (Slc6a4) null mutant strain. Relative to WT, the 5-HTT HET mouse exhibits reduced 5-HT clearance (Montanez et al., 2003) and increased extracellular 5-HT levels (Mathews et al., 2004), but otherwise normal 5-HT transmission (Jennings et al., 2010). It provides a model for human 5-HTT polymorphisms that are associated with reduced 5-HTT function (Murphy and Lesch, 2008). The selective 5-HTT blocker citalopram, when administered acutely as a tool compound, was found to increase NFS in healthy humans (Chamberlain et al., 2006) and to increase and decrease NFS at low and high doses, respectively, in rat (Bari et al., 2010). In the present study the potent 5-HTT blocker escitalopram was used as a tool compound in order to investigate the effects of acute, specific blocking of 5-HTT in WT and HET mice, using the acute doses demonstrated to be effective in mouse antidepressant- and anxiolytic-screening tests (Sanchez et al., 2003). Therefore, the effects of genetically- and pharmacologically-induced reduced 5-HTT function were studied in a novel mouse PRL test. It is important to note that the present study did not aim to produce a mouse model of depression-relevant deficits in PRL behaviour, but rather to establish and validate a mouse PRL test. The major application of such a test would then be to investigate whether environmental and genetic manipulations induce depression-relevant deficits in PRL behaviour, as observed in human depression, and to use such a model to study their neuropharmacological reversal.
Section snippets
Animals
Male and female mice of a 5-HTT null mutant strain on a C57BL/6J background (>20 backcross generations) were transferred from the University of Würzburg (Bengel et al., 1998) and breeding was established in-house with WT dams and HET sires. Male offspring were weaned at age 4 weeks and caged as brother pairs throughout the study. Mice were maintained on a reversed 12:12 h light–dark cycle (white lights off at 07:00 h) in an individually-ventilated cage system, with temperature at 20–22 °C and
Operant training and reversal learning performance
For autoshaping, mice required 3.5 ± 1.6 sessions to eat ≥30 sugar pellets in two consecutive sessions. There was no significant effect of genotype on autoshaping e.g. total number of sessions to attain criterion (F (1, 28) < 1, p < 0.58). For operant nose-poke learning, mice required 344 ± 80 total trials over 7.6 ± 2.0 sessions to achieve ≥30 rewards in each of two consecutive sessions. There was no significant effect of genotype on operant learning e.g. total number of trials to criterion (F
Discussion
The present study describes an operant two-way spatial discrimination task for the study of probabilistic reversal learning in mice, the responsiveness of behaviour to changes in the probability of punishment of correct responding, and the effects on PRL behaviour of constitutive genetic and acute pharmacological manipulation of serotonin function.
Reversal learning is the basis of the PRL task and it is important to briefly review the evidence for 5-HT modulation of this prior to discussing the
Acknowledgements
This research was funded by the Swiss National Science Foundation (grant 31003A_130499) and the National Center for Competence in Research “Neural Plasticity and Repair”. We thank Daniel Schuppli and his team for animal care and H. Lundbeck A/S for the provision of escitalopram.
References (38)
- et al.
Differences in BTBR T+ tf/J and C57BL/6J mice on probabilistic reversal learning and stereotyped behaviors
Behav. Brain Res.
(2012) - et al.
Parsing reward
TINS
(2003) - et al.
The neuropsychology of ventral prefrontal cortex: decision-making and reversal learning
Brain Cogn.
(2004) - et al.
Serotonergic regulation of emotional and behavioural control processes
Trends Cogn. Sci.
(2008) - et al.
Reward and punishment processing in depression
Biol. Psychiatry
(2010) - et al.
Gene dose-dependent alterations in extraneuronal serotonin but not dopamine in mice with reduced serotonin transporter expression
J. Neurosci. Methods
(2004) The neurobiology and control of anxious states
Prog. Neurobiol.
(2003)- et al.
Emotional information processing in first and recurrent major depressive episodes
J. Psychiat. Res.
(2004) - et al.
Establishing a learned helplessness effect paradigm in C57BL/6 mice: behavioural evidence for emotional, motivational and cognitive effects of aversive uncontrollability per se
Neuropharmacology
(2012) - et al.
A translational research framework for enhanced validity of mouse models of psychopathological states in depression
Psychoneuroendocrinology
(2011)
Neural basis of abnormal response to negative feedback in unmedicated mood disorders
NeuroImage
Hopelessness depression: a theory-based subtype of depression
Psychol. Rev.
Diagnostic and Statistical Manual of Mental Disorders
Serotonin modulates sensitivity to reward and negative feedback in a probabilistic reversal learning task in rats
Neuropsychopharmacology
Altered brain serotonin homeostasis and locomotor insensitivity to 3,4-methylenedioxymethamphetamine (“ecstasy”) in serotonin transporter-deficient mice
Mol. Pharmacol.
Opponency revisited: competition and cooperation between dopamine and serotonin
Neuropsychopharmacology
Pharmacological or genetic inactivation of the serotonin transporter improves reversal learning in mice
Cereb. Cortex
Long story short: the serotonin transporter in emotion regulation and social cognition
Nat. Neurosci.
Influence of life stress on depression: moderation by a polymorphism in the 5-HTT gene
Science
Cited by (46)
Activity in the Dorsomedial Striatum Underlies Serial Reversal Learning Performance Under Probabilistic Uncertainty
2023, Biological Psychiatry Global Open ScienceThe translational genetics of ADHD and related phenotypes in model organisms
2023, Neuroscience and Biobehavioral ReviewsAnhedonia as a central factor in depression: Neural mechanisms revealed from preclinical to clinical evidence
2021, Progress in Neuro-Psychopharmacology and Biological PsychiatryRegion- and receptor-specific effects of chronic social stress on the central serotonergic system in mice
2021, IBRO Neuroscience ReportsReward motivation and cognitive flexibility in tau null-mutation mice
2021, Neurobiology of AgingCitation Excerpt :Lose-shift represents changing from the nontarget to the target lever after a punishment on the nontarget lever. Mice were trained until they reached criterion, which was defined as 2 sessions of win-stay p ≥ 0.70, at least one additional session of p ≥ 0.75, and at least 3 reversals in each of these sessions (modified from Ineichen et al., 2012). Animals had 30 training days to reach this criterion.
Modulation of cognitive flexibility by reward and punishment in BALB/cJ and BALB/cByJ mice
2020, Behavioural Brain Research