Introduction

Adequate adaptation to the environment requires the anticipation of biologically relevant events, i.e., rewards and punishments, by learning signals of their occurrence. However, in our constantly and rapidly changing environment, this learning has to be highly flexible. In financial markets, for instance, stocks that were previously growing in value can suddenly and rapidly plummet; previously valuable possessions can become valueless. The emerging field of neuroeconomics seeks to understand the neural and pharmacological underpinnings of the flexible value-based decision making necessary to adapt to such changes (Sanfey et al. 2006).

It is thought that flexible outcome-dependent learning occurs when predicted and actual outcomes differ (i.e., prediction errors). A putative mechanism for prediction error learning is the phasic firing of midbrain dopamine (DA) neurones (Schultz 1998). The striatum, which is a target of these dopamine neurones, has frequently been implicated in reward-based prediction error learning (Frank et al. 2004; Frank 2005; Cools et al. 2006b, 2009). Recent work has extended this to show that individual differences in DA synthesis capacity within the striatum can predict the extent to which individuals learn from reward and punishment prediction errors (Cools et al. 2009). Individuals with high dopamine synthesis capacity were shown to be better at reversal based on unexpected reward than unexpected punishment, while individuals with low DA synthesis capacity were better at reversal based on unexpected punishment than unexpected reward. The DA D2 receptor agonist bromocriptine affected reversal in a baseline-dependent manner. It improved reward-based relative to punishment-based learning in those with low DA synthesis whilst impairing it in those with high DA synthesis, likely reflecting an ‘inverted U’-shaped relationship between DA levels and cognitive performance, whereby both too much and too little DA result in sub-optimal behaviour. Contrary to the pervasive hypothesis that DA plays a greater role in reward than in punishment processing, effects of bromocriptine on punishment-based reversal in this study were actually more robust than those on reward-based reversal. This is consistent with previously observed disproportionate effects of dopaminergic medication withdrawal on punishment-based learning in patients with Parkinson's disease (PD) (Cools et al. 2006a; Frank et al. 2007b) and with a growing literature implicating the striatum in punishment-based prediction error learning (Frank 2005; Seymour et al. 2007; Cohen and Frank 2009).

It is difficult to determine, however, whether the punishment-selective effects of dopaminergic drugs in these prior studies reflect a general property of DA's effect on learning or whether they reflect specifics of the employed approaches. For example, bromocriptine's specific effect on punishment-based reversal might reflect its relative selectivity for (presynaptic) DA D2 receptors (Frank 2005; Frank et al. 2007a; Cools et al. 2009). Moreover, punishment-selective effects of dopaminergic medication in PD might be restricted to PD, in which receptors are influenced asymmetrically and in which there is a spectrum of changes in addition to the DA alterations. In the present study, we therefore extended our prior research by reducing DA synthesis in healthy individuals via the acute phenylalanine and tyrosine depletion (APTD) procedure. This procedure is a dietary manipulation that reduces DA by depriving the brain of its amino acid precursors and is thought to both reduce overall DA synthesis and reduce DA transmission at all postsynaptic DA receptor subtypes (Montgomery et al. 2003). Here, we employed a reversal-learning task that disentangles reward and punishment processing and that has previously been shown to be sensitive to DA manipulation (Cools et al. 2006a, 2009). Based upon our prior work with this task, we predicted that non-selectively reducing DA via ATPD would disproportionately influence punishment processing over reward processing. Furthermore, given the above-described baseline-dependency of the effects of dopaminergic manipulations (Cools et al. 2009), and given that there is an association between gender and variation in baseline DA (females have significantly higher striatal dopamine synthesis capacity than males (Laakso et al. 2002; Haaxma et al. 2007)), we recruited enough males and females to include gender as an effect of interest in analysis of task performance.

In this study, we therefore examined the effects of DA depletion in healthy males and females performing the same reward- and punishment-based reversal-learning task previously employed in studies on bromocriptine and levodopa (Cools et al. 2006a).

Materials and methods

Procedures were approved by the Hertfordshire Research Ethics Committee (08/H0311/25). All participants gave written informed consent prior to commencing the study. Testing took place at the Wellcome Trust Clinical Research Facility in Addenbrooke's hospital in Cambridge.

Subjects

From the Cambridgeshire area, 15 female and 14 male subjects aged between 19 and 49 years (mean, 27.28 years; standard deviation, 7.00) were recruited through email and poster advertisements. Two female participants withdrew because they felt unwell following the amino acid drink (one on visit 1, one on visit 2).

Prior to recruitment all participants were screened by telephone interview to ensure that they met the study criteria. The exclusion criteria were as follows: cigarette smoking, history of psychiatric disorder or neurological disorder, history of major illness, first degree relative with a history of axis 1 psychiatric disorder, drug abuse, excessive alcohol intake, head injury resulting in unconsciousness and current use of psychoactive medication. Subjects were excluded following initial interview. They were also independently interviewed by trainee clinician HS and by nursing staff on the research ward on their first visit to the hospital. Subject demographic and trait characteristics are reported in Table 1.

Acute phenylalanine/tyrosine depletion (APTD) procedure

Participants made two visits to the clinical research facility, which were separated by at least a week. On the day preceding their visit, they were instructed to follow a low protein diet (less than 20 g protein) and then to fast from 7 pm. They arrived on the test day at approximately 9.15 am. Upon arrival, they were asked to complete a set of visual analogues scales (VAS) to determine mood state. The VAS consisted of a number of questions including “how happy are you?” to which subjects had to respond by making a mark on a 10-cm line. A baseline sample of blood was then obtained, following which, the amino acid drink was given. For males, the TYR drink contained 15 g isoleucine, 22.5 g leucine, 17.5 g lysine, 5 g methionine, 17.5 g valine, 10 g threonine and 2.5 g tryptophan. The BAL drink contained the same but with the addition of 12.5 g tyrosine and 12.5 g phenylalanine. Female subjects received 20% less of each amino acid in order to account for a lower average body weight. The amino acids were dissolved in approximately 300 ml of water, and lemon flavouring was added to make the drink more palatable. Fourteen subjects (seven females) received the TYR drink on their first visit and the BAL on their second visit. This order was reversed for the other 14 (seven females) participants. Drink order was randomly assigned, and both the participant and the researcher were blind to which drink was being administered. Participants were unable to distinguish between the drinks. After consuming the drink, the participants were given free time, although asked to remain at the research facility. They had unlimited access to water, and at 12 pm, they were given an apple to avoid hypoglycaemia and hunger (this technique was shown to be effective in prior tryptophan depletion studies (Robinson et al. 2009; Robinson and Sahakian 2009)). Approximately 4.5 h after consuming the drink, a second blood sample was taken; the VAS ratings were taken once again. Neuropsychological testing was then carried out, and once completed, participants were given a meal and allowed to go home. Data from additional tasks completed on the same day will be presented elsewhere and clearly cite this paper.

Neuropsychological testing

Reward and punishment processing were measured behaviourally using the deterministic reversal paradigm employed previously (Cools et al. 2006a, 2008 #681). Readers are referred to Cools et al. (2006a) for a more comprehensive description and a diagram. Briefly, subjects were presented with two stimuli, one scene and one face, throughout the experiment. At any given point in time, one stimulus was associated with reward, whilst the other was associated with punishment. On each trial, one stimulus was highlighted by a thick black border, and subjects had to predict (via a button box) whether the highlighted stimulus would lead to reward or punishment. Subjects were then shown the actual outcome. Reward consisted of a green smiley face, a ‘+$100’ sign and a high-frequency jingle tone. Punishment consisted of a red sad face, a ‘−$100’ sign and a single low-frequency tone. Subjects were not given direct performance feedback and could only determine their accuracy by remembering their prediction and comparing it to the actual outcome. The stimulus–outcome contingencies reversed multiple times and provided attainment of a variable learning criterion (mean = 6.9, SD = 1.8, range from 5 to 9 trials).

There were two types of experimental block. In the reward condition, reversals were signalled by providing subjects with unexpected reward following a stimulus previously associated with punishment (i.e., subjects predicted punishment based upon the previous trials, but the stimulus was unexpectedly followed by reward), whilst in the punishment condition, reversals were signalled by unexpected punishment. The maximum number of reversal stages per experimental block was 16, although the block terminated automatically after completion of 120 trials (6.6 min). Subjects completed two blocks of reward reversals and two blocks of punishment reversals. Thus each subject performed 480 trials (four blocks) per experimental session. The order of conditions was counterbalanced between subjects.

Performance was assessed by examining proportional errors on four trial trial-types: expected (non-reversal) reward, expected punishment, unexpected (reversal) reward and unexpected punishment trials. Only trials that followed correct responses were included in the non-reversal trial analysis. Errors on the trial immediately following the unexpected feedback were included in the reversal error analysis. Unexpected reward trials were collapsed across the two unexpected reward blocks (and vice versa for punishment), whilst expected reward and punishment trials were collapsed across all unexpected reward and unexpected punishment blocks. Reversal and non-reversal trials were assessed using separate repeated measures ANOVAs. Error rates were transformed into proportional scores (as the number of errors varied as a function of performance; reversal errors were proportional to the number of reversal trials and non-reversal errors were proportional to the number of non-reversal trials) and arcsine transformed (2 × arcsine (√x)) as is appropriate when the variance is proportional to the mean (Howell 2002). Analysis was performed with these transformed proportional scores.

Results

Biochemical measures

The ratio of TYR/PHE:∑LNAAs was calculated from measures at baseline and at (approximately) 4.5 h post drink. A repeated measures ANOVA with two within subject factors (drink: BAL vs TYR; time: baseline vs 4.5 h) and one between subject factor (gender) was carried out. As expected, there was a highly significant drink × time interaction (F 1,24 = 51.6, p < 0.001), with no effects of gender (drink × time × gender F 1,24 = 0.021, p = 0.89). Breakdown into simple effects showed that there was a significant effect of time on the depletion day (F 1,24 = 157.3, p < 0.001, which did not interact with gender, F 1,24 = 1.0, p = 0.33) but not on the balanced day (F 1,24 = 2.7, p = 0.11; also no time × gender interaction, F 1,24 = 1.2, p = 0.29). Moreover, the ratio at 4.5 h post drink was significantly lower after the TYR drink compared with the BAL drink (F 1,24 = 83, p < 0.001; no interaction with gender, F 1,24 = 1.0, p = 0.98), whereas there was no difference between the ratios at baseline on each of the two test days (F 1,24 = 1.2, p = 0.29; no interaction with gender, F 1,24 = 1.0, p = 0.84). This is shown in Tables 2 and 3.

Table 1 Group demographic and trait characteristics: BDI, Beck Depression Inventory-II; BIS, behavioural inhibition system score; BAS, behavioural activation system score; IVE, impulsiveness venturesomeness empathy questionnaire
Table 2 Biochemical measures. ATPD causes a significant decrease in the tyrosine/phenylalanine to large neutral amino acids ratio (TYR + PHE:∑LNAAs) in both males and females (standard error of the mean, SEM)
Table 3 Biochemical measures. ATPD causes a significant decrease in the tyrosine/phenylalanine to large neutral amino acids ratio (TYR + PHE:∑LNAAs) in both males and females (standard error of the mean, SEM)

State measures

There was a main effect of time (F 2,23 = 6.5, p = 0.006) that was driven by a shift away from the happy end of the VAS scale over the course of the day, but there was no effect of drink (F 1,24 < 0.0001, p = 0.99) nor a drink × time interaction (F 2,23 = 0.46, p = 0.6). There was also no drink by gender interaction (F 1,24 = 0.65, p = 0.4) or a drink × time × gender interaction (F 2,23 = 0.3, p = 0.7). Thus, the APTD manipulation had no significant effect upon subjects' moods, and the behavioural effects cannot be attributed to differential mood states. This absence of mood effects is consistent with the effect of tryptophan depletion in healthy individuals (Robinson et al. 2009; Robinson and Sahakian 2009) and with previous studies of ATPD (Booij et al. 2003; Ruhe et al. 2007). We aimed to test all female subjects outside of menses; however, due to time constraints, 5 of the 28 female testing sessions fell either during, or in the week prior to, menses.

Neuropsychological testing

The results from the experimental paradigm are summarised in Table 4 and Fig. 1.

Table 4 Neuropsychological measures. Percentage error rates (standard error of the mean, SEM) on punishment (PUN) and reward (REW) A) reversal trials and B) non-reversal trials in the placebo (BAL) and acute tyrosine/phenylalanine depletion (ATPD) conditions. N.B. there was no effect of gender in the non-reversal trials; data stratified for illustration purposes only
Fig. 1
figure 1

Dissociable effects on valence specific reversal learning in males and females. In females, tyrosine depletion significantly improved punishment-based reversal (a) relative to the non-depleted day. There was no similar effect on reward-based reversal (b) or in males (see also Table 4). Values represent the mean; error bars represent standard error of the mean

Comparison between reward- and punishment-based reversal trials

There was no main effect of either drink (F 1,25 = 1.1, P = 0.31) or valence (F 1,25 = 0.31, P = 0.60) on reversal trial errors. There was, however, a drink × valence × gender interaction (F 1,25 = 6.5, P = 0.02) which was driven by a valence × gender interaction on the depletion day (F 1,26 = 4.4, P = 0.05) but not on the balanced day (F 1,26 = 1.1, P = 0.31). To break this interaction down further, we next examined the reward and punishment blocks separately.

Punishment-based reversals

Consistent with the overall model there was a significant interaction between the effect of APTD on punishment-based reversal trial errors and gender (drink × gender, F 1,25 = 8.4, P = 0.007). This interaction was driven by a significant improvement after ATPD in punishment reversal trial performance in female (F 1,12 = 13.3, P = 0.003) but not male subjects (F 1,13 = 2.0, P = 0.184). The female-specific effect did not interact with menses status (drink × menses status, F 1,10 = 0.001, P = 0.97). There was no overall effect of drink (F 1,25 = 0.25, P = 0.62). In sum, APTD improved the ability to reverse predictions after receiving unexpected punishment in female but not male subjects (Fig. 1).

Reward-based reversals

There was no effect of APTD on reward-based reversals (no effect of drink, F 1,26 = 0.87, P = 0.36). There was also no interaction with gender (drink × gender, F 1,26 = 0.58, P = 0.45). APTD therefore had no effect upon the ability to predict reward.

Non-reversal trials

There was a main effect of valence (F 1,25 = 12.2, P = 0.002), which was driven by increased errors on non-reversal punishment trials, but no main effect of drink (F 1,25 = 0.45, P = 0.51) or valence × drink interaction (F 1,25 = 0.26, P = 0.62). There was also no drink × valence × gender interaction (F 1,25 = 0.62, P = 0.44). This increased non-reversal punishment error rate is consistent with previous research with the same task (Cools et al. 2008).

Discussion

Consistent with our hypothesis, we demonstrate that DA depletion improves punishment-based reversal learning, while leaving reward-based reversal learning unaffected. However, this effect was gender-specific so that effects were only observed in female subjects, who are known to exhibit higher baseline DA levels than do male subjects.

These findings extend a growing body of research implicating DA in punishment as well as reward processing (Frank 2005; Seymour et al. 2007; Cohen and Frank 2009; Cools et al. 2009; Robinson et al. 2010). More specifically, prior work has shown that increases in DA impair punishment-based reversal in patients with Parkinson's disease and healthy volunteers with low DA synthesis capacity (Cools et al. 2006a, 2009). Thus, low levels of DA seem beneficial for learning from punishment. This concurs with recent theoretical modelling in which increases in DA impair punishment-based learning by blocking the effects of punishment-associated DA dips on cortico-striatal action suppression (Frank 2005; Frank et al. 2007a). The present data reveal the opposite side of the same coin, i.e., that decreases in DA can improve punishment-based learning (here, we do not distinguish between receipt of punishment and reward omission). Recent fMRI work with this task adopted in this study suggests that such a shift in responses may occur either via modulation of a valence non-specific Pavlovian signal within the anterior ventral striatum or a reward-specific instrumental-like signal within the posterior dorsal striatum (Robinson et al. 2010).

It is worth noting that, following the unexpected outcome, the same stimulus was always highlighted in this task. This meant that punishment-based reversal required primarily the breaking of a prepotent stimulus–reward link, as well as the formation of a new stimulus–punishment link. Conversely, reward-based reversal required primarily the breaking of a prepotent stimulus–punishment link and the formation of a new stimulus-reward link. We hypothesise that the effect of ATPD on punishment-based reversal likely reflected the breaking of the old prepotent stimulus–reward link rather than the formation of a new stimulus–punishment link, based on two observations. First, we observed no effect of ATPD on the punishment non-reversal trials (although we cannot be sure that the non-reversal trials were sufficiently sensitive to detect punishment-related learning, given that the task contingencies were deterministic and learning likely occurred on an approximately single trial basis). Second, there is precedence for the importance of dopamine in the breaking of prepotent stimulus–reward links. Specifically, injection of d-amphetamine in the NAc of rats potentiates behavioural control by stimuli formerly associated with reward (i.e., conditioned reinforcement) in a DA-dependent way (Robbins et al. 1989). The observation that this d-amphetamine induced potentiation of control by previously rewarded stimuli was abolished by lesions of the ventral striatum (the nucleus accumbens) (Parkinson et al. 2002) strengthens the hypothesis that the striatum might play a role in the effects shown here. Evidence for this hypothesis comes from a series of recent studies (Cools et al. 2007; Dodds et al. 2008; Clatworthy et al. 2009) showing that effects of DA-enhancing drugs (LDOPA in PD and methylphenidate in healthy volunteers) are accompanied by modulations of BOLD signals in the ventral striatum during punishment-based reversal learning. Furthermore, Goto and Grace (2005) have revealed that increases in tonic DA release in the ventral striatum and administration of a DA (D2) receptor agonist in the ventral striatum disrupted PFC-evoked responses in the ventral striatum and impaired behavioural reversal learning in rats. Thus, the DA-enhancing drugs in these studies may have induced aberrant potentiation of control by previously rewarded stimuli and disrupted input to the striatum, signalling the need for a switch. By analogy, DA depletion in the present study may have attenuated potentiation of control by previously rewarded stimuli and may have enhanced input to the striatum, signalling the need for a switch.

The present findings suggest, however, that this effect may depend on gender. This is consistent with the previously observed gender disparity in DA synthesis capacity. Female subjects have been shown to have increased DA synthesis relative to males (Laakso et al. 2002), and punishment processing is shown to be impaired in individuals with high baseline DA synthesis (Cools et al. 2009). As such, the improvement in punishment processing following DA reduction in females may be driven by a greater ATPD-induced deviation from this putatively higher baseline DA synthesis rate in females relative to males (although future work in which we directly measure DA synthesis [rather than levels of amino acid precursors] across genders under both conditions is necessary to test this hypothesis more thoroughly).

Intriguingly, the effect that we see here is distinct from the effects of reducing serotonin on the same task. Acute tryptophan depletion (ATD), which is similar in principle to ATPD, reduces the serotonin precursor tryptophan and has been used to reduce serotonin in subjects completing this task. ATD influenced performance on non-reversal punishment trials, rather than on punishment reversal trials seen in these findings (Cools et al. 2008) (although, note that we do replicate the increased unexpected (non-reversal) punishment errors). It also should be noted, however, that although gender effects were not seen in the prior ATD study, the sample size in that study was not large enough to enable such analysis. Other studies within our lab have in fact shown gender-specific effects of ATD on cognitive processing (Robinson et al. 2009; Robinson and Sahakian 2009), but additional research with the present task would be required to determine the effects of gender on serotonin manipulation of this task.

The gender biases in the response to neurotransmitter precursor depletion are particularly interesting in the light of the gender biases in the susceptibility to affective disorders. Females are, for instance, much more likely to become depressed than males (Nolen-Hoeksema et al. 1999) but tend to experience a more benign form of PD with a later onset than males (Haaxma et al. 2007). The gender-specific effects of depletion may be driven by, for example, differential baseline levels of neurotransmitters, differential rates of precursor absorption or differential rates neurotransmitter production, and this, may in turn, influence susceptibility to affective disorders. Indeed, susceptibility to affective biases following precursor depletion may provide a means of predicting susceptibility to affective disorders, although future work is clearly necessary to clarify this.

It should be noted that the affective bias that we see here was found in the absence of any effects on mood. This is consistent with a number of prior ATPD (and ATD) studies in healthy individuals (Booij et al. 2003), and it is now thought that the link between monoamines and mood state is indirect (Ruhe et al. 2007). As such, the affective biases seen here may represent alterations in ‘emotional’ processing (the short-term affective response to distinct stimuli) rather than alterations in ‘mood’ states (long-term changes in affective state with diverse, unclear causes; Robinson and Sahakian 2009). It should also be noted that whilst the BAL drink did not cause depletion, it is still a somewhat artificial situation compared with normal food consumption. As such, caution should be exercised when comparing the BAL condition to “healthy” performance. As a further caveat, it should be noted that we did not have the facility for more comprehensive screening methods that are sometimes used (e.g., a urine drug screen). However, we would point out that the mean BDI score (3; Table 1) puts the sample firmly within the lowest category (cut off 13) of depressive symptoms (“minimal depression”). Crucially, this was also equivalent across males and females (as, indeed were all the trait measures taken). Indeed, given that over 50% of the sample had an education level higher than BA, there is a possibility that this is a hyper performing sample. Again, care should be exercised when extrapolating these findings to the population as a whole.

One advantage of the ATPD method is that it should promote a global reduction in DA rather than focusing upon a specific subtype (i.e., D1 or D2), but it is, nevertheless, an imperfect technique. Previous research has shown effects to be modest on some aspects of cognition (Booij et al. 2003; Roiser et al. 2005) and, given that noradrenaline (NA) is synthesised from DA, it may also influence NA levels. However, as highlighted by Roiser et al. (2005), a number of lines of evidence suggest that this is not the case. Firstly, ATPD attenuates d-amphetamine-induced DA efflux (Leyton et al. 2003; McTavish et al. 1999a; McTavish et al. 1999b) but has no effect upon NA response to amphetamine or idazoxan; secondly, ATPD also only influences the DA-mediated subjective effects of d-amphetamine (e.g., the ‘buzz’) and not influence the NA-associated effects (e.g., hunger) (McTavish et al. 1999c; although see Leyton et al. 2005, 2007); and thirdly, ATPD has no effect on melatonin levels (which are controlled by NA) but does influence prolactin levels (which are controlled by DA) (Sheehan et al. 1996; Harmer et al. 2001). It should be noted as a limitation that the venipuncture method of blood sample collection used in this study causes short-term stress (influencing prolactin levels) and is therefore unreliable as means of assessing prolactin levels. Future research should use cannulation to adequately assay prolactin.

In sum, this study strengthens previous observations that DA plays an important role in punishment processing but suggests that it may be dependent upon gender. It is conceivable that this disproportionate sensitivity to DA depletion of females is due to increased levels of baseline DA in females relative to males, although further work is necessary to clarify this. However, these findings further cement the role of DA in punishment processing and underline the importance of gender in the neuropharmacology of cognitive processing. Such gender differences may shed light on the gender biases in susceptibility to and severity of psychiatric diseases like Parkinson's disease or depression.