In everyday life, we constantly monitor our behavior and adapt our actions following performance errors. Depending on the social, monetary, or other type of feedback we receive from our environment, we can either feel encouraged or discouraged to continue with a specific behavior. According to the reinforcement learning theory (Holroyd & Coles, 2002), reward-based learning is driven by the dopamine (DA) system. A dopaminergic prediction error signal originates in the midbrain and is reflected by an increase of activity for positive and a reduction for negative prediction errors. This signal is projected to the anterior cingulate cortex (ACC), where activation increases, for unexpected outcomes are supposed to underlie action selection (Schultz, 1998, 2001; Schultz, Dayan, & Montague, 1997).

In humans, the analysis of event-related potentials (ERPs) yielded a negative feedback-locked component, the so-called feedback-related negativity (FRN), which has been associated with the processing of unexpected negative performance feedback (Bellebaum & Daum, 2008; Hajcak, Holroyd, Moser, & Simons, 2005; Hajcak, Moser, Holroyd, & Simons, 2007; Holroyd, Nieuwenhuis, Yeung, & Cohen, 2003; Yasuda, Sato, Miyawaki, Kumano, & Kuboki, 2004) and is generated in the ACC (Gehring & Willoughby, 2002). Typically, the processing of feedback stimuli also evokes a later, positive ERP component termed P300, modulated by the probability of stimulus occurrence (Hajcak et al., 2005) and by task relevance (Polich, 2007), with enhanced amplitudes for infrequent stimuli and more motivationally salient tasks (Carrillo-de-la-Pena & Cadaveira, 2000; Pfabigan, Alexopoulos, Bauer, & Sailer, 2011). The role of the P300 in the context of feedback processing is as yet unclear. While some authors reported larger P300 amplitudes for positive than for negative, for unexpected than for expected, and for larger than for smaller outcomes (Hajcak et al., 2005; Leng & Zhou, 2010; Ma et al., 2011; Yeung & Sanfey, 2004), others could, for example, not find P300 amplitude modulations by feedback valence (Sato et al., 2005; Yeung & Sanfey, 2004).

Generally, humans can also use the errors and performance feedback they observe in other individuals to modify their own behavior. FRN and P300 components are also seen in the processing of observed feedback but show smaller amplitudes in comparison with active learning (Bellebaum, Kobza, Thiele, & Daum, 2010; Itagaki & Katayama, 2008; Leng & Zhou, 2010; Ma et al., 2011). Functional magnetic resonance imaging (fMRI) studies have revealed shared activation networks associated with error processing during active and observed performance, involving the posterior medial frontal cortex (including the dorsal ACC) and the anterior insula (Shane, Stevens, Harenski, & Kiehl, 2008). It has been shown that these shared activations do not reflect the reward outcome for the observer; that is, they do not depend on whether the observed error signals a relative gain or a relative loss (de Bruijn, de Lange, von Cramon, & Ullsperger, 2009). On the other hand, they appear to mark potential conflicts between one’s own versus someone else’s interests (Koban, Pichon, & Vuilleumier, 2013). With respect to feedback processing, activation overlaps for active and observed behavior have been reported in the striatum, dorsomedial frontal cortex, orbitofrontal cortex, and anterior insula (Bellebaum, Jokisch, Gizewski, Forsting, & Daum, 2012; de Bruijn et al., 2009; Koban, Corradi-Dell’Acqua, & Vuilleumier, 2013; Koban et al. 2013b).

Whether we feel sorry for the failures and happy about the successes of individuals we observe during learning may also depend on our relationship with the observed person and on the empathic concern we feel toward him or her. The degree to which we empathize with the observed individual may also determine how much we learn from him or her. Empathy is currently thought of as a multifaceted concept involving at least the distinction between an affective and a cognitive component (Davis, 1980; Decety & Lamm, 2006; Polich, 2007; Shamay-Tsoory, Tibi-Elhanany, & Aharon-Peretz, 2006). The affective component of empathy assesses the emotional reactivity to other people’s experiences (e.g., pain), whereas cognitive empathy denotes the ability to adopt and understand another person’s perspective. In a quantitative meta-analysis, Fan, Duncan, de Greck, and Northoff (2011) analyzed fMRI studies of empathy for pain, anxiety, happiness, or disgust. Most consistently, the left dorsal anterior midcingulate cortex was associated with cognitive empathy, and activation of the right anterior insula was related to affective empathy. As for active versus observational learning, the shared network hypothesis of empathy postulates that the same areas, involving the ACC in particular, that are automatically involved during the first-hand experience of (aversive) situations such as painful stimulation are also activated during the mere observation of other individuals making these experiences (Singer & Lamm, 2009). A recent study by Koban et al. (2013a) corroborated an overlap between error and pain processing in the dorsomedial frontal cortex. While activations for error processing showed an agency effect with stronger activity for own errors, no difference was seen in this brain region for pain observed in another person when it was caused by oneself, relative to when it was caused by the other.

Thoma and Bellebaum (2012) have recently reviewed the evidence linking empathy to the ERN/FRN elicited during active and observational learning. One of the conclusions they reached is that most studies establish an indirect association—for example, showing changes of these components in clinical and nonclinical populations exhibiting altered empathic responding, such as individuals characterized by symptoms of depression (Mies et al., 2011), anxiety (Weinberg, Olvet, & Hajcak, 2010), or psychopathy (von Borries et al., 2010). Also, on the basis of the reasoning that observational learning should elicit empathic reactions more strongly than active learning and that this ought to be most pronounced for individuals we feel emotionally closest to (Singer et al., 2004), a modulation of the oERN/oFRN by the relationship between the observer and the observed person has been demonstrated (Carp, Halenar, Quandt, Sklar, & Compton, 2009; Kang, Hirsh, & Chasteen, 2010; Ma et al., 2011). Similarly, enhanced P300 amplitudes for a friend’s, relative to a stranger’s, feedback were reported (Leng & Zhou, 2010; Ma et al., 2011).

So far, few studies have established more direct associations between empathy and the neural correlates of performance monitoring. Santesso and Segalowitz (2009) reported a link between higher trait empathy and larger ERN amplitudes in adolescents. Larson, Fair, Good, and Baldwin (2010) confirmed this association using a Stroop paradigm but, additionally, controlled for state negative affect. Both research groups interpreted larger ERN amplitudes in highly empathic individuals in terms of more concern over their outcomes. It is intuitively plausible that empathic responding is even more likely to arise within the context of an interpersonal interaction, such as during observational learning. Depending on the specific context of the interaction—for example, whether the observer is also involved in active learning or not (Ma et al., 2011)—either processes evaluating the consequences for oneself or empathy-related processes evaluating the outcome for the other person might dominate and differentially affect the neural signal underlying the processing of observed response feedback (Marco-Pallares, Kramer, Strehl, Schroder, & Munte, 2010).

Koban, Pourtois, Vocat, and Vuilleumier (2010) showed that the ERN elicited by active learning was not affected by a competitive versus cooperative social context, while the oERN showed an early component during cooperation and a late component elicited by competition. While trait empathy was not related to any of these components, state measures of rivalry and competition toward the observed participant correlated with a diminished early oERN, and state measures of sympathy and friendship toward that person were related to an attenuated late oERN. Even fewer studies have linked empathy to feedback processing. Fukushima and Hiraki (2006) had participants complete a competitive gambling task where one player’s gain meant the opponent’s loss and vice versa. Only female, but not male, participants showed a small but discernible oFRN when watching their opponents’ outcomes, even if these incurred losses for them. Overall higher trait-empathic concern was related to a lower attenuation of the oFRN in response to opponents’ gains. Subsequently, Fukushima and Hiraki (2009) employed a paradigm involving a human and two computer-simulated players, and gains were independent of the other participants’ performance. They found that the oFRN correlated positively with trait empathy only in the human and not in the computer player observation condition. The authors interpreted the oFRN in terms of an automatic reaction to affective states. More recently, Koban, Pourtois, Bediou, and Veuilleumier (2012) reported that cooperation, in contrast to competition (defined as rewarding of joint performance vs. rewarding of the best performer on a go/no-go task) increased both FRN and P300 amplitudes to one’s own feedback, particularly in individuals scoring higher on perspective taking, while empathic concern was related to increased P300 amplitudes in the observer. However, in some cases, higher trait empathy might also disrupt observational learning performance. In a previous study from our own lab, Kobza, Thoma, Daum, and Bellebaum (2011) examined feedback learning and processing in an observational feedback learning task, which required insight into stimulus–outcome contingencies. In a small sample of participants, they found that higher trait empathy scores were related to lower performance, suggesting that empathy can actually hinder observational learning. For the FRN, a relationship with trait empathy emerged for the amplitude difference between negative and positive feedback. In contrast to previous studies, which mainly involved gambling tasks or tasks where no complex rules had to be inferred from the observed person’s performance, a task was used that actually required participants to gain insight into probabilistic stimulus–outcome contingencies. Although this may seem counterintuitive at first glance, it is conceivable that both cognitive aspects of empathy—for example, trying to focus on the strategies the observed person might have used—and affective aspects of empathy, such as shared negative affect related to problems with figuring out difficult stimulus–outcome contingencies, might distract from the actual task of gaining insight into these contingencies. A stronger tendency to adopt someone else’s cognitive/affective perspective might particularly disrupt learning in those situations where the observer gains insight into the stimulus–outcome contingencies earlier than the observed person and might thus feel puzzled about the other person’s choices.

It is sometimes difficult to distinguish observational learning from imitation learning, since imitation learning by definition includes learning by observing other people’s actions. Although the paradigm used in the present study involves imitation of someone else’s motor response as an attentional control (see the Method section), we do not aim to assess pure “copying” of other people’s motor responses. Instead, we focus on observational learning as defined by gaining insight into action–outcome relationships as a result of observing other people’s behavior (see Heyes, 1993, for the distinction between different forms of social learning, such as observational and imitation learning).

The aim of the present study was to systematically investigate the relationship between empathy, on the one hand, and learning from and processing of feedback during both active and observational learning, on the other hand. On the basis of the studies reviewed above and, particularly, the previous finding from our lab (Kobza et al., 2011), it was hypothesized that higher scores for trait empathy would predict lower FRN and P300 amplitudes and lower performance in observational, but not active, learning. Due to the nature of the present learning task, the strongest relationships were expected for cognitive aspects of empathy.

Method

Participants

Thirty-four (32 right-handed, 2 left-handed) healthy undergraduate students (11 male, 23 female), between 18 and 35 years of age (M = 24.76, SD = 4.09), were recruited for participation at the Faculty of Psychology, Ruhr University Bochum. All participants showed normal or corrected-to-normal vision, and none of them suffered from any neurological or psychiatric disorders, as determined by a semistructured informal interview assessing past and present disorders, medication, and drug consumption history based on self-report. Mean verbal IQ amounted to 113.5 (SD = 12.4). All participants gave written informed consent and were reimbursed for their participation with 10–15 €, depending on their performance on the learning tasks.

Learning tasks

Active learning task

The active learning task represents a modification of a design by Frank, Seeberger, and O’Reilly (2004). The stimulus material consisted of six Hiragana characters. Unknown to participants, stimuli were presented in fixed pairs, and each stimulus pair was associated with a specific probabilistic reward pattern. A correct response was defined as the choice of the stimulus with the higher probability of reward (A, C, and E). On the basis of the different distributions of reward probabilities within the pairs, the stimulus pair AB (80 % vs. 20 % reward probability) was easier to learn than CD (70 % vs. 30 %) and EF (60 % vs. 40 %). Participants were instructed to select one of the two stimuli by buttonpress. Responses were either rewarded with 20 cents or punished by a deduction of 10 cents from the participants’ gains. The aim was to maximize one’s monetary gain. Instructions were followed by practice trials. There were four learning blocks (60 trials each) involving performance feedback (+20 cents, −10 cents) and four test blocks (30 trials each, 10 for each symbol pair) without feedback. After each block, participants were informed about the amount of money they had already earned and about the block type to follow (learning vs. test). The learning blocks enabled the participants to gain insight into the contingencies between stimuli and rewards. The subsequent test blocks without feedback measured the learning performance and were introduced to provide a comparable performance measure in active and observational learning. Each trial started with a fixation cross at the center of the screen, followed by two stimuli being displayed on the left and the right sides of the screen (e.g., AB). Participants had to respond by pressing the left or right control key within 3,500 ms. If participants were too slow, they were requested to react more quickly. However, participants hardly ever missed a response. Misses were seen only in 5 of the 34 participants during learning trials, 3 of which showed one miss and the 2 other participants showed two misses each. In test trials, 6 participants missed responses (5 participants one response each, 1 participant seven responses). The selected stimulus was highlighted with a red circle displayed for 300 ms. During learning trials, after presentation of a blank screen for 500 ms, a feedback screen followed for 500 ms informing participants whether they had gained 20 cents or lost 10 cents (see Fig. 1a for an illustration of the trial structure).

Fig. 1
figure 1

Sequence of events in a single learning trial for the a active and b observational learning tasks. A test trial included the same sequence of events as in a single learning trial from the active learning task without feedback

Observational learning task

The observational learning task also consisted of four learning and four test blocks, basically involving the same trial structure as described above for the active learning task. During learning trials, participants had to learn the correct responses by observing the performance of a second participant. Unknown to the participants, they were shown the performance data of the previously tested person and did not observe anyone perform in real time. The symbol of a hand choosing one of the two stimuli indicated the responses of the observed person to the participants, and the chosen stimulus was then highlighted with a red circle for up to 3,500 ms. Within this time frame, the participants had to imitate the other person’s reaction by pressing the corresponding key (left or right; Fig. 1b). If they were too slow, they were requested to react more quickly (this happened once in 6 participants and twice in 1 participant), and in case of an incorrect response (e.g., pressing the left key although the right stimulus had been selected), they were requested to respond correctly. Incorrect responses were seen more frequently than misses (i.e., in 12 participants, 6 showed more than one incorrect response). Participants were also told that the monetary feedback (+20 cents, −10 cents) given after each response applied only to the performance of the observed person and had no influence on the monetary gains of the participant. The subsequent test phase without feedback was identical to the one for the active learning task. Here, 6 participants missed one response each. In the observational learning task, only the performance during the test blocks determined the monetary gain of the participants.

Background measures

An estimate of general verbal intelligence was obtained using a German Multiple-Choice Vocabulary Intelligence Test (Lehrl, Triebig, & Fischer, 1995), which requires participants to identify a real word among four nonwords for each of 37 items of increasing difficulty.

Trait empathy was assessed using two self-report questionnaires: the German version of the Cambridge Empathy Scale yielding the so-called “empathy quotient” (EQ; Baron-Cohen & Wheelwright, 2004) and an abbreviated German version of the Interpersonal Reactivity Index (IRI; Davis, 1980; Paulus, 2007). The Cambridge Empathy Scale comprises 60 items, 20 of which serve as filler or distractor items, and participants have to indicate their agreement or disagreement with item content on a 4-point Likert scale, ranging from strongly agree to strongly disagree. The German IRI comprises four subscales of 4 items each, requiring participants to rate their agreement or disagreement on a 5-point Likert scale, ranging from not at all to very strongly. Cognitive empathy is measured by the two scales “Perspective Taking,” assessing the ability to look at issues from a different point of view, and “Fantasy,” as the tendency to identify oneself with fictional characters in films and novels. Affective empathy is measured by the two scales “Empathic Concern,” as the disposition to feel compassion and apprehension for people having negative experiences, and “Personal Distress,” assessing feelings of anxiety and discomfort seeing people confronted with negative experiences. To ensure that participants kept the second participant in mind, a questionnaire was filled in between blocks of the observational learning task. Participants were asked to rate their affective state empathy toward the second participant on a 5-point scale (ranging from not at all to very much). The questionnaire included 6 questions about affective states that were either congruent or incongruent between observer and observed person: (1) “How happy were you for the other participant when he was gaining money?” (congruent), (2) “How angry were you about the other participant’s gains?” (incongruent), (3) “Did you pity the other participant when he was losing money?” (congruent), (4) “Were you angry with the other participant when he was losing money?” (incongruent), (5) “Were you angry together with the other participant when he was losing money?” (congruent), and (6) “Were you delighted when the other participant was losing money?” (incongruent).

After the experiment, participants rated their sympathy for the second participant on a 5-point rating scale. They were instructed to indicate how often they had thought about him or her during the experiment and how attentively they believed they had observed the other’s decisions, on a 5-point rating scale.

Procedure

The participant, the experimenter, and two confederates (due to the fact that the cover story involved a second participant and a second experimenter) arrived at the EEG laboratory of the Department of Neuropsychology of the Ruhr University of Bochum. It was explained that for the 2 participants to be able to observe each other, the two learning tasks had to be performed with predetermined delays. For that reason, 1 of the 2 participants would start with the active learning task, whereas the other participant would fill out questionnaires first. While the two confederates stayed in one laboratory, the experimenter and the real participant went next door to another laboratory.

Since participants performed both the active and the observational learning tasks, there were two versions of the task involving stimulus sets with different Hiragana characters (version A, version B), so that participants could not make use of learned contingencies from their first learning task. During the observational learning task, a state empathy questionnaire was filled out after each learning block. After the experiment, a last questionnaire was completed basically measuring sympathy for the second participant. Afterward, all participants were thoroughly debriefed about the study’s goals and the cover story.

Electroencephalography recording

The participant was comfortably seated at a distance of 75 cm in front of a computer screen. Electroencephalography (EEG) recordings were performed using a 30-channel Brainvision BrainAMP system. The 30 silver–silver-chloride electrodes were applied according to the international 10–20 system: F7, F3, Fz, F4, F8, FT7, FC3, FCz, FC4, FT8, T7, C3, Cz, C4, T8, TP7, CP3, CPz, CP4, TP8, P7, P3, Pz, P4, P8, PO7, PO3, POz, PO4, and PO8. Reference electrodes were placed on both mastoids. The sampling rate was 500 Hz, and impedance was kept under 10 kΩ. Stimulus presentation and recording of the participants’ responses was controlled using Presentation software (Neurobehavioral Systems Inc., Albany, CA).

Data analysis

Behavioral data

In a first step, a repeated measures analysis of variance (ANOVA) was carried out, with the number of correct choices as the dependent variable and three within-subjects factors—learning task (active/observational), test block (1–4), and stimulus pair (AB, CD, EF)—to analyze learning performance for the whole group of participants. If the Mauchly’s test of sphericity was statistically significant (p < .05), sphericity could not be assumed, and Greenhouse–Geisser corrections were applied. Post hoc paired t-tests were conducted to resolve significant main effects and interactions. In a second step, separate Pearson correlations were computed to investigate the association between empathy state (six measures, averaged over all four observational learning blocks) or trait measures (five measures: EQ, IRI subscales), on the one hand, and learning performance averaged over all active and over all observational learning blocks, respectively, on the other hand, yielding 16 correlations in total (note that the trait measures were correlated with both active and observational learning performance and the state measures only with observational learning). To correct for multiple correlations, the alpha level was adjusted using a Bonferroni correction (p = .05/16 = .003125).

Subsequently, the trait empathy measure showing the strongest correlation with learning performance was dichotomized into two levels according to a median split (low/high). The two empathy groups resulting from this median split were then compared with independent-samples t-tests and Pearson’s chi-square tests, respectively, in terms of group differences regarding age, sex, and intelligence. In a third step, another repeated measures ANOVA was calculated involving the within-subjects factors defined above and the between-subjects factor empathy group (low/high).

EEG analyses

EEG data were analyzed using Brain Vision Analyzer 1.1 (BrainVision, München). The data were filtered off-line through a 0.5305- to 40-Hz passband zero phase Butterworth filter. Afterward, an independent component analysis (Lee, Girolami, & Sejnowski, 1999) was performed for each participant. This procedure yields an unmixing matrix decomposing the multichannel scalp EEG into a sum of temporally independent and spatially fixed components. The number of components is equivalent to the number of channels. Each component is characterized by a time course of activation and a topographical map. The components were screened for typical activation maps indicating eye blinks—that is, for symmetric frontally positive topographies. These components were then removed from raw data by performing a back transformation. In a few cases, two components were removed due to remaining blink artifacts detected by a visual inspection of the back-transformed data. Segments from 200-ms prefeedback onset until 800-ms postfeedback onset were separately extracted for every participant, for each learning task (active/observational) and feedback-type (reward/nonreward). An automatic artifact rejection removed all trials with amplitudes above 100 μV and below −100 μV before the remaining segments per condition were averaged for each participant. The segments were baseline corrected using the 200-ms prefeedback window. On average, 4.3 % and 3.7 % of the trials were excluded due to artifacts in the active nonreward and reward conditions, respectively. For observational learning, 3.6 % (nonreward) and 3.1 % (reward) of the trials were excluded. The mean number of trials entering analysis was 94 and 136 for active nonreward and reward and 96 and 136 for observational nonreward and reward.

Further analyses were performed using MATLAB 7.01 (MathWorks, Natick, MA). The FRN was most pronounced at FCz. The FRN for each participant was defined as the most negative peak in the time window from 200 to 330 ms after feedback onset, relative to the most positive peak in the time window from 150 ms after feedback onset up to the latency of the negative peak. The latter peak was defined as the P200 and was analyzed separately. The P300 peak was larger at FCz than at parietal electrodes and was, therefore, defined as the most positive peak in the time window from 300 to 500 ms after feedback onset at FCz for each participant.

In a first step, repeated measures ANOVAs with P200, FRN, or P300 amplitude as dependent variable and the two within-subjects factors learning task (active/observational) and feedback (reward/nonreward) were calculated to analyze feedback processing. If the Mauchly’s test of sphericity was statistically significant (p < .05), sphericity could not be assumed, and Greenhouse–Geisser corrections were applied. Post hoc paired t-tests were computed to resolve significant interactions. Similar to the procedure for the behavioral data, correlations between empathy state or trait measures and P200, FRN, or P300 amplitude were carried out next, computed separately for active and observational learning, for positive and negative feedback, and for the difference between positive and negative feedback, using Pearson correlations. The empathy variable showing the strongest correlation with ERPs was then dichotomized into two levels (low/high) according to a median split. Again, the alpha level was adjusted using a Bonferroni correction to correct for multiple correlations. For each of the three components, 48 correlations were calculated (positive and negative feedback and the difference wave, on the one hand, each with 5 empathy measures for the active and 11 empathy measures for the observational task, on the other hand). Thus, the level of significance was set to p = .05/48 = .00104. The two empathy groups that resulted from the median split were compared in terms of age, sex, and intelligence. In a third step, another repeated measures ANOVA was calculated involving the within-subjects factors defined above and the between-subjects factor empathy group (low/high).

Results

IQ and empathy group data

Participants’ mean IQ score was 113.50 (SD = 12.42). Regarding the empathy trait measures, participants achieved an EQ of 43.61 (SD = 10.61), an IRI Fantasy score of 15.09 (SD = 3.26), an IRI Perspective Taking score of 15.29 (SD = 2.33), an IRI Empathic Concern score of 15.12 (SD = 2.14), and an IRI Personal Distress score of 9.59 (SD = 2.60).

Behavioral data

Figure 2 shows the learning performance during the four test blocks on the active and observational learning tasks. A repeated measures ANOVA involving the three within-subjects factors learning task, test block, and stimulus pair showed highly significant main effects of learning block, indicating a general performance increase from test block 1 to 4 [linear trend: F(1, 33) = 13.12, p = .001, η p 2 = .285], and of stimulus pair, F(2, 66) = 9.33, p = .001, η p 2 = .220. As was expected, the stimulus pair AB was easier to learn than CD, t(33) = 4.00, p = .001, η p 2 = .327, and EF, t(33) = 4.42, p = .001, η p 2 = .371, pairs, whereas the learning performance for the latter ones did not differ (p = .541). Neither the main effect of learning task (p = .916) nor any interactions (all ps > .051) reached significance.

Fig. 2
figure 2

Learning performance of all participants during the four test blocks of the active and observational learning tasks

Controlling for imitation effects

In the observational learning task, participants were required to associate information about observed responses and the accompanying outcomes. However, observational learners may also have solved the task by just imitating the behavior of the observed person, at least when observing a good performer. In this case, one would expect that observational learners’ performance in a given block of test trials correlates with the observed performance in the preceding block of learning trials. To examine observers’ tendency to imitate, correlation analyses between observed learning block performance and active test block performance were performed, separately for each block (1–4) and stimulus pair (AB, CD, EF) in the observational learning condition. None of the correlations reached a corrected significance threshold (.05/12 correlations = .0042), suggesting that participants in the observational learning condition did not imitate the choices they observed. The only near-significant correlation was seen for stimulus pair CD in block 3, r = .434, p = .010; all other correlations were far from reaching significance (all ps > .085). In contrast, all 12 correlations between performance in the learning and test blocks of the active learning task were significant (all rs > .489, all ps < .004), with correlation coefficients reaching values as high as r = .797 (p < .001), showing that test blocks without feedback actually assessed learning reliably.

Correlations between the behavioral data, empathy scores, and the belief in the cover story

Trait empathy measures were not related to the number of correct responses in the active learning task (all ps > .097). For observational learning, no significant associations between any of the state empathy measures and the number of correct responses were found (all ps > .097). However, significant negative correlations emerged between learning performance in the observational learning task and IRI Fantasy scores, r = −.523, p < .002. No other correlation between EQ/IRI measures and the total number of responses in the observational learning task reached the corrected significance level (all ps > .014). The degree to which participants believed the cover story did not correlate with any measure of trait empathy (p > .085). Believing the cover story was inversely associated with the total number of correct responses (summed up across stimulus pairs) in the observational task, r = −.434, p = .010.

Learning in high and low empathizers

On the basis of the correlations reported above, a median split was performed to classify participants as high or low empathizers. Learning performance correlated highly significantly with IRI Fantasy subscores. The median score on the IRI Fantasy subscale was 16, so that 16 participants were classified into the low-empathy groupfant (<16) and 18 participants into the high-empathy groupfant (≥16). The two groupsfant did not differ with respect to age, sex, or general intelligence (all ps > .162). Descriptive statistics are displayed in Table 1. Figure 3 shows learning curves for the active and observational learning tasks in both empathy groups. A repeated measures ANOVA involving both the three within-subjects factors defined above and the empathy groupfant as group factor was performed, yielding a significant effect of empathy groupfant, F(1, 32) = 5.50, p = .025, η p 2 = .147, indicating a better learning performance in low empathizers. Furthermore, a significant interaction emerged for empathy groupfant and learning task, F(1, 32) = 12.01, p = .002, η p 2 = .273. Two post hoc independent-samples t-tests revealed that there was a significant difference on the observational, t(32) = 4.32, p = .001, η p 2 = .369, but not on the active (p = .879), learning task, reflecting reduced performance of the high empathizers relative to the low empathizers. Figure 4 shows the mean number of correct responses in the active and observational learning task, separately for the low- and the high-empathy groupfant. Further main effects or interactions did not reach significance (all ps > .175).

Table 1 Means (SDs) of background measures of the low- and high-empathy groups resulting from the median split of IRI Fantasy score (low, <16; high, ≥16)
Fig. 3
figure 3

Learning performance on the four test blocks of the a active and b observational learning tasks separately for each empathy group resulting from the median split on the IRI (Davis, 1980) Fantasy score: low, <16; high, ≥16

Fig. 4
figure 4

Mean correct responses averaged over all stimulus pairs and all test blocks in the active and observational learning tasks separately for each empathy group resulting from the median split on the IRI (Davis, 1980) Fantasy score: low, <16; high, ≥16, with bars indicating standard errors, *p < .05; ** p < .01; ns = not significant

ERPs

ERPs for the two learning tasks are depicted in Fig. 5. Topographies of the three analyzed components for ERPs following negative feedback are shown in Fig. 6. Repeated measures ANOVAs were computed separately for P200, FRN, and P300 components, involving learning task (active/observational) and feedback (reward/nonreward) as factors. For the P200, a main effect of learning task was found, F(1, 33) = 37.302, p < .001, η p 2 = .531, indicating larger amplitudes in active than in observational learning. The main effect of feedback and the interaction between both factors did not reach significance (both ps > .150). Analysis of FRN amplitudes revealed a main effect of feedback, F(1, 33) = 14.730, p = .001, η p 2 = .309, as well as an interaction between learning task and feedback, F(1, 33) = 7.777, p = .009, η p 2 = .191. FRN amplitudes were generally larger after nonreward than after reward. Post hoc paired t-tests carried out to resolve the interaction showed a significant difference between reward and nonreward for the active learning task, t(33) = 4.118, p = .001, η p 2 = .339, and only a tendency for the observational learning task, t(33) = 1.975, p = .057, η p 2 = .106. There was no main effect of learning task (p = .359).

Fig. 5
figure 5

Grand averages of event-related potentials at FCz for a active and b observational learning trials. Zero represents feedback onset applied to both reward and nonreward

Fig. 6
figure 6

Topographic activation maps of the three analyzed ERP components following negative feedback

Analysis of P300 amplitudes showed a main effect of learning task, F(1, 33) = 69.41, p = .001, η p 2 = .678, with higher amplitudes during active learning and an interaction between learning task and feedback, F(1, 33) = 17.12, p = .001, η p 2 = .342, but no main effect of feedback (p = .113). Post hoc t-tests computed to resolve the interaction indicated no amplitude difference between negative and positive feedback on the observational learning task (p = .262) but a significantly larger P300 amplitude for nonreward than for reward on the active learning task, t(33) = −2.90, p = .007, η p 2 = .203.

Correlations between ERPs, empathy scores, and the belief in the cover story

None of the correlations between P200 amplitude in the active or observational learning task and empathy state or trait measures reached the corrected significance threshold (all ps > .015). Near-significant positive correlations emerged between the P200 following positive and negative feedback in the active learning condition and the IRI Empathic Concern subscale, r = .356, p = .039, and r = .410, p = .016 (for all other correlations, p > .050). Near-significant negative correlations were found between the P200 for negative feedback in the observational learning condition and the EQ, r = −.369, p = .041, and the IRI Perspective Taking subscale, r = −.388, p = .023. Correlation analyses for the FRN did not reveal any significant relationships with state or trait empathy (all ps ≥ .049).

Significant or near-significant negative correlations emerged between P300 amplitude and IRI Perspective Taking (for nonreward, r = −.560, p = .001; for reward, r = −.416, p = .015) for observational but not for active learning (all ps > .050). There were no correlations between the P300 amplitude and empathy state measures (all ps > .285).

A significant correlation emerged between the degree to which the participant believed the cover story was true and oFRN difference waves (reward − nonreward), r = −.393, p = .022. There were no correlations between belief strength and P300 amplitude (all ps > .500).

Feedback processing in high and low empathizers

For ERPs, correlations with IRI Perspective Taking showed the highest coefficient and the highest significance level. The median score on the IRI Perspective Taking scale was 16, so that 15 participants were classified into the low-empathy grouppt (<16) and 19 participants into the highly empathic grouppt (≥16). The two groupspt did not differ with respect to age, sex, or intelligence (all ps > .389). Descriptive statistics are displayed in Table 2.

Table 2 Means (SDs) of background measures of the low- and high-empathy groups resulting from the median split of IRI Perspective Taking score (low, <16; high, ≥16)

Figure 7 shows the ERPs for positive and negative feedback in active and observational learning in both groups. The effect of empathy grouppt on P300 amplitude did not reach significance, F(1, 32) = 3.02, p = .092, η p 2 = .086, reflecting somewhat lower P300 amplitudes in the high-empathy grouppt. Furthermore, a significant interaction emerged between empathy grouppt and learning task, F(1, 32) = 4.91, p = .034, η p 2 = .133. Figure 8 shows the mean P300 amplitude in the active and observational learning task, separately for the low- and the high-empathy grouppt. Independent-samples t-tests were computed to compare P300 amplitudes between high- and low-empathy groupspt on the active and observational learning tasks, pooled over reward and nonreward: The group difference was significant on the observational, t(32) = 2.26, p = .031, η p 2 = .138, but not on the active (p = .459) learning task. The higher-empathy grouppt showed reduced P300 amplitudes. Further main effects or interactions did not reach significance (all ps > .158).

Fig. 7
figure 7

Grand averages of event-related potentials at FCz for a active and b observational learning task separately for each empathy group resulting from the median split on the IRI (Davis, 1980). Perspective Taking score: low, <16; high, ≥16. Zero represents feedback onset applied to both reward and nonreward

Fig. 8
figure 8

Mean P300 amplitude, pooled over reward and nonreward feedback, in active and observational learning, separately for each empathy group resulting from the median split on the IRI (Davis, 1980). Perspective Taking score: low, <16; high, ≥16, with bars indicating standard errors. *p < .05; **p < .01; ns = not significant

Additional analyses of P200 and N100 effects

Visual inspection of Fig. 7 also suggests a between-group difference in the P200 amplitude, in accordance with the near-significant negative correlation between IRI Perspective Taking and the P200. Apart from the main effect of learning task (see above), a main effect of group indeed emerged, with higher amplitudes for participants scoring low on empathy, F(1, 32) = 4.24, p = .048, η p 2 = .117. The interaction between learning task and group, as well as all other effects, did not reach significance (all ps > .100).

Furthermore, an even earlier modulation by IRI Perspective Taking as reflected in the N100 amplitude appeared to emerge. An exploratory analysis of the N100 peak amplitude indeed yielded significant interactions between learning task and grouppt, indicating higher N100 amplitudes in the high grouppt in observational but not active learning, F(1, 32) = 8.418, p = .007, η p 2 = .208, and between feedback and group, F(1, 32) = 4.188, p = .049, η p 2 = .116, showing that N100 amplitudes were higher in the high-empathy grouppt for reward. Despite these effects, no significant correlations emerged between N100 amplitudes in the four conditions and IRI Perspective Taking or any other trait empathy measure for the whole sample of participants (all ps > .047).

Correlations between ERP measures and learning performance

For observational learning, no significant correlations emerged between overall performance accuracy and oERN, oFRN, or P300 amplitude. Similarly, there were no significant relationships between FRN or P300 amplitude and active learning (all ps > .410).

Discussion

The aim of the present study was to investigate the relationship between trait empathy and feedback processing during active and observational learning, with a particular focus on differences between active and observational learning. In agreement with well-replicated findings, larger amplitudes for both the FRN and the P300 were elicited during active, as compared with observational, learning (Bellebaum et al., 2010; Itagaki & Katayama, 2008; Leng & Zhou, 2010; Ma et al., 2011). A similar pattern was found for the P200, which was also more pronounced for active than for observational learning. Furthermore, larger FRN amplitudes were observed after a monetary loss than after a monetary gain (Bellebaum & Daum, 2008; Hajcak et al., 2007; Holroyd et al., 2003; Yasuda et al., 2004).

The main novel finding our study adds to the existing literature is that higher IRI Fantasy scores were related to poorer performance and higher IRI Perspective Taking scores were associated with smaller P300 amplitudes during observational learning only. Active learning and FRN amplitudes remained unaffected by empathy. For the P200, a weaker relationship with empathy emerged, indicating lower amplitudes in participants with higher IRI Perspective Taking scores. This effect was, however, not specific for observational learning. Importantly, the ERP findings for participants scoring high versus low on IRI Perspective Taking cannot be influenced by performance differences between groups. Correlations between empathy scores and observational learning performance were restricted to the IRI Fantasy subscale. Furthermore, participants with high and low IRI Perspective Taking scores showed very similar overall performance accuracy in both active and observational learning (about 75 % in both groups for both learning tasks). In the subsequent paragraphs, the main findings of the study will be interpreted and integrated with the overall result pattern and the existing literature.

Empathy, learning performance, and ERPs

Learning performance

Concerning our finding of a relationship between IRI Fantasy scores and poorer learning performance, Davis (1983) has related the IRI Fantasy subscale to the construct of introversion, as reflected, for example, by higher social anxiety and shyness. Also, individuals displaying a heightened degree of discomfort and hypervigilance in social situations have been shown to be characterized by higher IRI Fantasy subscores (Larson et al., 2010). Thus, it is plausible that observational learning performance might be disrupted in such individuals. In a previous study from our lab (Kobza et al., 2011), this relationship might not have been detected as the pooled cognitive IRI empathy scores (Fantasy + Perspective Taking) were used for the correlational analyses. On the other hand, Koban et al. (2010) did not find any significant associations between trait empathy and lower performance on a go/no-go task, neither for active nor for observational learning, although these authors framed the participant’s gains as being dependent on the second participant’s performance in a competitive versus cooperative context in order to encourage empathic perspective taking.

ERPs

Our findings regarding a significant relationship between empathy, active versus observational learning performance, and associated ERPs stand in contrast with some previous findings. For instance, Santesso and Segalowitz (2009) and Larson et al. (2010) reported an association between higher ERN amplitudes during active performance and higher trait empathy scores, interpreting the effect of lower empathy in terms of less vigilance and less concern about outcomes. Few studies have investigated the relationship between the FRN and empathy (or the feedback-locked P300 and empathy), and some of those who postulated a positive association did not explicitly assess empathy (Itagaki & Katayama, 2008; Ma et al., 2011) but assumed this association on the basis of the fact that the oFRN was modulated by the relationship between observer and observed person. Fukushima and Hiraki (2006, 2009), who did use questionnaire measures of empathy, reported positive associations between higher trait empathy and larger oFRN amplitudes. Koban et al. (2012) observed higher P300 amplitudes in association with higher IRI Empathic Concern scores during observational learning and increased P300 and FRN amplitudes in association with higher IRI Perspective Taking during active learning. However, these correlations were mainly driven by participants in a cooperation condition, in contrast to a competitive performance condition. The FRN was not influenced by empathy in our study. In contrast to this, Fukushima and Hiraki (2009) found larger oFRN amplitudes for individuals scoring higher on the IRI Fantasy subscale during the observation of a human player, but not during the observation of computer-simulated players. An explanation for the inconsistency between both studies could be that Fukushima and Hiraki (2009) seated the participants next to each other, which might have increased the effect of social context relative to our study.

Previous studies have already associated the P200 to empathic processing of pain- or distress-related stimuli (Meng et al., 2013; Rodrigo et al., 2011; Sheng, Liu, Zhou, Zhou, & Han, 2013), suggesting that reduced perceived novelty of stimuli (e.g., the expectation of pain-related material) is related to decreased P200 amplitudes. This might explain why, in our study, better perspective taking and, thus, potentially facilitated prediction of the other person’s behavior might have decreased P200 amplitudes in the observational condition. It is more difficult to explain why this effect also emerged for active learning, but, potentially, thinking about the kind of stimulus–outcome contingencies the experimenter might have come up with might play a role in this regard.

Finally, higher N100 amplitudes in participants with high IRI Perspective Taking scores during observational learning may be related to stronger affective sharing in these individuals (Fan & Han, 2008). This effect needs to be interpreted with caution, however, since no significant correlations were seen between N100 amplitudes and IRI Perspective Taking in the overall sample of participants and the effect emerged only in the analysis based on the median split between participants with low and high IRI Perspective Taking scores. No significant correlations of ERPs and behavioral data with state empathy measures were seen.

Potential explanations for the inconsistencies

The inconsistencies regarding previous studies investigating error and feedback processing during observational learning may be partly caused by different measures used to assess empathy (e.g., EQ vs. IRI) or subtle differences regarding the underlying ERPs. For instance, it is conceivable that the FRN is more likely to be influenced by empathy. Although the ERN and the FRN components are both supposed to be generated in the ACC (Holroyd & Coles, 2002), they are elicited differently. While the ERN has been linked to an automatic internal error detection mechanism, the FRN is elicited following external feedback. Possibly, the externally generated FRN is more sensitive to social context than is the ERN.

Furthermore, the task used in the present study differed in important ways from those employed in previous studies examining the relationship between empathy and feedback processing. Our participants had to actually learn from the observed behavior, transferring the knowledge they acquired by observing stimulus–reward contingencies in other participants to their own performance during active test trials. This might explain why, in contrast to other studies (Newman-Norlund, Ganesh, van Schie, de Bruijn, & Bekkering, 2009; Shane, Stevens, Harenski, & Kiehl, 2009), which gauged the emotional reactions of participants to the errors committed by friends versus foes/strangers, cognitive, rather than affective, empathy components were related to the behavioral and neural correlates of observational learning. We did not find any significant correlations with trait affective empathy components or with state affective empathy measures, probably because the gains and losses of the observed person did not have any emotional significance for the observer: Neither were the monetary rewards of the observer dependent on them, as, for example, in the recent study by Koban et al. (2012), where cooperation (being jointly rewarded for one’s own and the observed partner’s performance) was relevant, nor was there any emotional relationship between observer and observed person. A previous study from our lab (Kobza et al., 2011) showed that higher trait affective empathy was related to poorer performance and higher trait cognitive empathy to smaller oFRN differences (negative − positive feedback), but only when feedback was ambiguous and contingencies difficult to learn (i.e., in the EF condition with 60 % vs. 40 % stimulus–reward contingencies). In the present study, we averaged both performance and FRN/P300 amplitudes over trial types when computing the correlations with empathy to gauge the overarching principles in the association between empathy and performance monitoring. According to the present findings, cognitive empathy might be more strongly involved in the process of gaining insight into the global stimulus–reward contingencies, both behaviorally and as reflected by the P300. As in the Kobza et al. study, the correlations between trait empathy and behavioral performance on observational learning were negative, and additionally, P300 amplitudes were adversely associated with trait cognitive empathy. However, it is plausible that in a task like ours, where participants’ own learning performance and monetary gains depend on the insights they gain into stimulus–reward contingencies by observing others, the tendency to adopt the other person’s cognitive and emotional perspective might indeed distract from the task at hand. It is quite remarkable, in fact, that processing of the social context might have priority over stimulus–reward association learning and the prospect of monetary reward. Inspected more closely, IRI Fantasy subscores affected learning behavior most strongly, whereas reduced P300 amplitudes were associated with higher IRI Perspective Taking scores. On the other hand, IRI Perspective Taking scores were not related to IRI Fantasy scores (p = .502), and across conditions, P300 amplitudes did not directly correlate with learning performance on both tasks (all ps > .233). Therefore, both the ERP and the behavioral data were independently influenced by different aspects of cognitive empathy.

Potential alternative explanations of our findings

It has to be noted that ERP components in learning tasks are typically undergoing learning-related changes. With learning, the FRN in response to errors typically decreases, and the ERN for performance errors increases. This effect is, however, strongest in tasks involving deterministic action–outcome associations and has, to our knowledge, not been described for observational learning so far. Nevertheless, we conducted further exploratory analyses on the ERPs for the observational learning task (not reported in the Results section) to explore whether (1) an observer ERN (see van Schie, Mars, Coles, & Bekkering, 2004) would emerge and (2) the relationships between ERPs and empathy changed during the course of the experiment. Since clear action–outcome associations could be learned only for reward probabilities of 80 % and 70 %, only trials of those conditions entered the analysis. We found an oERN-like component in observational learning, which was, however, not significantly modulated by experimental phase (first vs. second half). Furthermore, this component correlated neither with state or trait empathy scores nor with oFRN amplitude. With respect to correlations between empathy and ERP components, the overall pattern was comparable for ERPs from the first and second halves. However, at least for the subset of trials analyzed, a clear correlation between IRI Perspective Taking and the P300 emerged only for the first half of the experiment, suggesting that processes related to learning, like reward expectation, may have a stronger influence on this stage of processing than in the second half of the experiment. In future investigations, it might be of interest to systematically explore which effect learning by observing a well-performing versus a poorly performing model has on the relevant ERPs during active and observational learning.

Finally, as was pointed out in the introduction, we wanted to investigate observational, and not imitation, learning in our study. The lack of correlations between observed and own performance during observational learning shows that participants were not just imitating the model from which they learned. However, in future studies, it might be important to disentangle the difference between imitation and observational learning by adding a third condition with participants simply watching someone else’s performance without imitating the motor response.

Conclusions

Taken together, our data support the notion of empathy-related processes playing a more prominent role during observation of another person’s performance, as compared with active performance. Also, we have shown that in cases where the observer’s performance has to be guided by what he or she derives about stimulus–reward contingencies from observing someone else perform on a task, a higher tendency to cognitively adopt another person’s emotional point of view might actually disrupt performance instead of improving it.