People are capable of robust and detailed memory for auditory sequences such as speech and music; speakers can accurately recite long passages of prose from memory, and singers can perform long, complex pieces of music from memory. How do performers remember those auditory sequences? Performers’ memory may be encoded in terms of auditory features, motoric features of their productions, or auditory–motor associations that performers have formed. Two lines of research offer predictions for the role of motor learning in auditory memory. One line of research has documented that producing words can improve later recognition of those words (MacLeod, Gopie, Hourihan, Neary, & Ozubko, 2010; Ozubko & MacLeod, 2010). Another line has documented that when performers learn motor tasks, they acquire motor representations that include detailed information about specific sensory outcomes from those actions (Hommel, Müsseler, Aschersleben, & Prinz, 2001; Lahav, Saltzman, & Schlaug, 2007). According to these lines of research, auditory–motor associations acquired while learning to perform may improve later auditory recognition. Here, we compared how motor and auditory learning contribute to auditory memory for music by testing how musicians’ recognition memory is influenced by their own experience producing that music. Auditory and motor information can be manipulated independently of one another in music performance with the use of electronic instruments, and therefore music offers a useful domain in which to compare the effects of auditory and motor learning on memory.

Several studies have demonstrated that production experience improves later recognition of written words (Gathercole & Conway, 1988; MacLeod et al., 2010; Ozubko & MacLeod, 2010). MacLeod et al. showed that people better recognized written words that they had spoken aloud or silently mouthed, as compared with words that they had silently read; the authors called this phenomenon the “production effect” (MacLeod et al., 2010). Other studies suggested that the auditory component of vocalization also enhances memory; words were better remembered when they were heard versus when they were silently read at learning (Gathercole & Conway, 1988; MacLeod, 2011). However, spoken words were better remembered than words that were either heard without being produced or silently mouthed (Gathercole & Conway, 1988), suggesting that both auditory and motor experience contribute to later memory of written words. Ozubko and MacLeod suggested that production aids later recognition by making produced items distinctive from nonproduced items at encoding, which can help people identify previously encountered items at test. These authors proposed that distinctiveness is the principal mechanism by which production aids later item recognition. It is possible that motor experience also plays a specific role in helping people encode and later recognize items. Therefore, one aim of the present study was to examine the specific contributions of both motor and auditory experience from music performance on later auditory recognition. A second aim was to examine whether auditory distinctiveness affects recognition, by testing how the variability in auditory features influenced later auditory recognition.

A specific role for motor learning in recognition memory is supported by sensorimotor integration perspectives that posit that the ways in which people produce motor actions can influence how they perceive sensory stimuli that are mapped to those actions (cf. Hommel et al., 2001). For example, pianists’ perception of ambiguously rising or falling tone sequences can be biased by the direction of their own simultaneously performed finger movements on the piano (Repp & Knoblich, 2007). The common coding theory of perception and action (Hommel et al., 2001), as well as motor theories of speech perception, explain these findings by proposing shared representations for actions and for the sensory feedback produced by those actions (Elsner & Hommel, 2001; Hommel, 2009; Liberman & Mattingly, 1985; Prinz, 1997). Sensorimotor integration views are supported by neuroimaging studies of speech and music performance, which have demonstrated tight connections between motor and auditory circuits in the brain. Listening to speech sounds engages premotor cortical regions (Wilson, Saygin, Sereno, & Iacoboni, 2004) and facilitates articulator muscle activity (Fadiga, Craighero, Buccino, & Rizzolatti, 2002). Similarly, training in music performance is accompanied by increased premotor response to musical stimuli (Bangert et al., 2006; Baumann et al., 2007; S. Brown & Martinez, 2007; Lahav et al., 2007). On the basis of this evidence for auditory–motor integration in performers, production should influence later recognition memory by tightening the associations between actions and specific sensory outcomes. Thus, a third aim of the present study was to examine how auditory–motor associations at learning influenced later auditory recognition.

The importance of both auditory and motor information at learning for subsequent recall of music is well documented (Finney & Palmer, 2003; Highben & Palmer, 2004); however, the relative roles of auditory and motor learning on subsequent auditory recognition of music are unknown. Some evidence suggests that motor processes may contribute to the recognition of previously produced stimuli. Producers can recognize the sensory output of their own actions, whether that output is visual (Knoblich & Flach, 2001) or auditory (Flach, Knoblich, & Prinz, 2004). Musicians can distinguish recordings of their own performances from other musicians’ performances better than chance (Keller, Knoblich, & Repp, 2007; Repp & Knoblich, 2004). Recognition of self-generated stimuli may be aided by a motor system response to previously produced stimuli (Keller et al., 2007), by long-term auditory memory for performance features, or by both. Thus, an unresolved question is how motor and auditory experience could influence performers’ subsequent recognition of music, beyond recognition of one’s own performances. We examined whether auditory and motor information gained during learning would aid subsequent recognition by comparing recognition following listening only (auditory only), performing without sound (motor only), and both listening and performing (normal performance). Importantly, the melodies heard at the recognition stage were always computerized recordings with metronomic timing and uniform intensity. Therefore, the auditory feedback heard during normal (self-generated) performance was acoustically different from the recordings presented at recognition, allowing us to examine the influences of auditory and motor learning on auditory recognition beyond any influence of schematic knowledge of one’s own performance style.

Auditory and motor feedback are normally coupled temporally in music performance. Studies of sensorimotor integration have indicated that performers are very sensitive to temporal delays between the motor movements they produce and the auditory sequences they hear. For example, pianists alter the timing of motor movements (with increased temporal variability) when auditory feedback is delayed (Pfordresher, 2003, 2006). Despite this evidence, no studies have yet examined how the sensorimotor coupling of perception and action influences how performers encode and later remember music. Learning may be best when pianists’ motor and sensory patterns perfectly match in time and in sounded intensity, as they do in normal piano performance. We contrasted learning conditions in which pianists’ auditory feedback was strongly coupled with their motor performance (normal performance) or was only weakly coupled with their motor performance (performing along with a prerecorded stimulus that contained different temporal and intensity properties). This design allowed us to examine the influence of the auditory–motor coupling at encoding on later auditory recognition.

Performers’ ability to remember music may also depend on mental practice or imagery abilities (Coffman, 1990; Driskell, Copper, & Moran, 1994; Ross, 1985). Performers demonstrate greater performance improvements following a combination of physical and mental practice, as compared with physical practice alone (Coffman, 1990; Ross, 1985). Certain auditory–motor neural systems are also engaged by both mental practice and overt (physical) performance (Baumann et al., 2007; Halpern & Zatorre, 1999; Jeannerod, 2001; Lotze et al., 1999). Moreover, auditory imagery abilities correlate with how accurately musicians perform music from memory that was learned in the absence of auditory feedback (Highben & Palmer, 2004; see also Hubbard, 2010). Auditory or motor imagery abilities may therefore be related to how well musicians access associations between the auditory and motor components of music, particularly when only one component is present. Highben and Palmer suggested that performers skilled in auditory imagery were able to mentally fill in the missing auditory feedback at learning. Musicians’ abilities to engage in auditory or motor imagery may influence how they encode music. We examined whether individual differences in performers’ auditory or motor imagery abilities influence encoding of music when auditory or motor information is missing at learning.

We report two experiments in which auditory and motor information was manipulated as pianists learned unfamiliar melodies, and the pianists subsequently performed an auditory recognition task. Recognition scores were compared with measures of individual auditory and motor imagery abilities. The first experiment examined musicians’ recognition after practicing melodies three or six times each at learning, allowing us to investigate effects of the amount of auditory or motor practice on recognition memory. The second experiment investigated the role of auditory distinctiveness during learning on subsequent recognition memory. Several memory perspectives suggest that increased distinctiveness among related items should improve later memory for those items (G. D. A. Brown, Vousden, & McCormack, 2009; Gathercole & Conway, 1988; Ozubko & MacLeod, 2010). Therefore, for the second experiment we increased the acoustic variation in the presented melodies during the learning conditions. In both experiments, performers were expected to remember music that they had performed (with motor and auditory information present) better than music that they had only heard. The coupling strength between auditory and motor information at learning was also expected to influence auditory recognition: Performers were expected to have better memory for music that they had performed while hearing their own auditory feedback versus music that was not self-produced. Motor learning was expected to influence auditory recognition even with increased acoustic variation at learning. Auditory and motor imagery abilities were expected to compensate for missing information at learning: Performers with high auditory or motor imagery abilities were expected to have better memory for music learned without auditory or motor practice, respectively, as compared to performers with low imagery abilities.

Experiment 1

In Experiment 1, we investigated four potential effects on auditory memory for music: type of practice (auditory/motor), amount of practice (high/low), auditory–motor coupling (strong/weak), and performers’ imagery abilities (auditory/motor). Skilled pianists learned melodies in each of four learning conditions in which the presence and coupling of auditory and motor information were manipulated. Participants learned the melodies by listening alone (auditory-only learning), performing without auditory feedback (motor-only learning), performing with normal auditory feedback (strongly coupled auditory–motor learning), and performing along with computer-generated recordings (weakly coupled auditory–motor learning). They then completed an auditory recognition task in which they heard computer-generated versions of the melodies (acoustically identical to those heard in the weakly coupled and auditory-only conditions), and they completed auditory and motor imagery tasks.

The following predictions were tested: (1) If both motor and auditory experience influence performers’ auditory recognition memory, then recognition of music learned in the two auditory–motor conditions (the strongly and weakly coupled conditions) should be better than recognition of music learned in the auditory-only or motor-only conditions. (2) If the acoustic match of auditory stimuli from learning to recognition influences recognition, then recognition should be better in the weakly coupled and auditory-only conditions than in the strongly coupled condition. (3) If the coupling between auditory and motor information during learning influences recognition, then recognition should be better for music learned in the strongly coupled condition than in the weakly coupled condition. (4) If individual differences in mental imagery abilities influence recognition by compensating for missing information at learning, then imagery abilities should correlate with recognition memory following auditory-only and motor-only learning. (5) Greater amounts of practice (six vs. three practice trials) should enhance recognition.

Method

Participants

A group of 48 adult pianists (28 female, 20 male) were recruited from the Montreal music community. The pianists had had an average of 13.44 (range = 8–23) years of private piano instruction and reported having no speaking, hearing, or learning disorders. Ten of the participants reported having absolute (perfect) pitch, and 2 participants reported having perfect pitch on the piano only; these participants had higher auditory imagery scores than did participants with relative pitch only [t(1, 46) = 3.84, p < .05]. The patterns of results reported below were the same with or without participants with absolute pitch.

Equipment

Participants performed on a Roland RD 700 electronic keyboard with weighted keys. They listened to their performances or to computer-generated stimuli presented through Sennheiser HD 280 headphones at a comfortable volume. The stimuli were presented on a Roland Soundcanvas SC-55 tone generator. Auditory feedback from the keyboard was controlled, and all keystroke responses were recorded in MIDI format using the FTAP program (Finney, 2001) on a Dell PC.

Stimulus materials

The musical stimuli were 48 short melodies (two to three measures long) composed for the experiment according to the conventions of Western tonal music. Melodies were designed to be performed on the piano keyboard with the right hand. Each melody was in 4/4 meter and consisted of 12 pitches within a one-octave range; each melody contained a unique pitch sequence. Half of the melodies were in major keys, and half were in minor keys, and the melodies contained diverse rhythmic patterns. In addition, no two melodies had the same beginning or ending three-note pitch sequence, to further ensure that the melodies were distinguishable.

The 48 melodies were selected on the basis of pilot testing within a larger set of 64 melodies, to ensure that they did not differ in terms of their baseline memorability. A total of 16 musically trained listeners (M = 10.06 years of instrumental instruction, range = 8–14) heard 32 novel melodies (List A: targets) and later recognized those melodies presented among 32 additional novel melodies (List B: foils). Half of the participants heard List A as targets and List B as foils, and the other half of the participants heard the reverse pairing. Melodies were then examined in terms of how accurately participants identified them as targets (hit rate: M = 67.19%, SE = 2.72) and how easily participants rejected them as foils (correct rejection rate: M = 66.77%, SE = 2.70). We then compared the hit rate and correct rejection rate for each melody in order to ensure that the target melodies and foil melodies would be equally well identified as such in the experiments. We therefore calculated the difference between target and foil accuracy for each melody (hits minus correct rejections: M = 0.42%, SE = 4.20); this difference reflected each melody’s likelihood of recognition relative to its likelihood of rejection. Lists A and B did not differ on any of the accuracy scores [hit rate, t(1, 62) = 1.00, p > .05; correct rejection rate, t(1, 62) = 0.71, p > .05; hits minus correct rejections, t(1, 62) = 0.19, p > .05]. Sixteen of the melodies were identified as outliers in these recognition score distributions and were excluded from the experiments.

The remaining 48 stimuli were divided into a set of 28 melodies to be presented at learning (pilot hit rate: M = 66.07%, SE = 3.01) and a set of 20 melodies to be presented at recognition as foils (pilot correct rejection rate: M = 65.63%, SE = 3.61); thus, the hit rate for the 28 targets was similar to the correct rejection rate for the 20 foils (F < 1). The 28 melodies were divided into four subsets with similar hit rate scores (F < 1) and similar distributions of major and minor keys; these four subsets were rotated among the four learning conditions. Finally, 20 melodies were chosen from the set of 28 to be presented as targets at recognition along with the 20 foils, providing a 50% occurrence rate of targets and foils at recognition. This set of 20 target melodies was matched in musical key distribution to the set of 20 foils. The number of melodies presented at learning was higher than the number presented as targets at recognition in order to avoid ceiling effects on recognition scores. The mean hit rate in the pilot experiment was above chance (67%), and we expected the learning manipulations to further improve recognition. Thus, the eight “filler” items presented at learning but not at recognition were the same for all participants, and there were always two fillers per condition. Pilot testing therefore confirmed that the melodies chosen for learning and foils were similarly recognizable (higher than chance) and sufficiently distinguishable from one another.

Melodies were presented in standard musical notation in all learning conditions. Numbers underneath each notated pitch specified the sequence of finger movements that pianists should use, to ensure uniformity of performance across participants and learning conditions. Computer-generated recordings of the melodies with a piano timbre (Soundcanvas: “piano 1”) were presented in the weakly coupled and auditory-only learning conditions and in the recognition task. The tempo was set to a quarter-note (beat) interonset interval (IOI) of 450 ms. Each melody was preceded by four metronome beats set to a drum timbre (Soundcanvas: “deep snare”). The first presentation of each stimulus was preceded by a set of four metronome beats in which the first beat was set to a bell timbre (Soundcanvas: “tube bell”) to signal the beginning of the trial. A silence lasting either three or four beats separated each melody presentation from the next set of metronome beats. During the recognition task, all melodies were presented with the same metronomic timing and uniform intensity as in the auditory-only and weakly coupled conditions. Each trial of the recognition task contained a melody sounded twice after the sound of a warning bell.

Design

Experiment 1 used a repeated measures mixed design. The within-participants variable was learning condition (four conditions), the between-participants variable was number of practice trials per melody (three or six), and the dependent measures were auditory recognition scores, motor imagery ability, and auditory imagery ability. Participants were assigned randomly to the two practice amount conditions (three or six trials); neither musical training [number of years of private piano instruction: M = 13.44, t(1, 46) < 1] nor weekly practice [number of hours of weekly piano practice: M = 10.17, t(1, 46) < 1] differed between the two groups. Participants learned seven melodies in each of the four conditions. The strongly coupled learning condition required participants to perform seven melodies with normal auditory feedback. The weakly coupled learning condition required participants to perform seven melodies on a keyboard while synchronizing with computerized recordings of the melodies (without hearing their own feedback). The motor-only learning condition required participants to perform seven melodies while hearing no auditory feedback. The auditory-only learning condition required participants to listen to computerized recordings of seven melodies. The trials were blocked by learning condition, and the order of learning conditions and the stimuli used in each condition were counterbalanced across participants in a Latin-square fashion. Forty melodies were presented during the recognition test, half of which were targets (five presented in each learning condition), and half of which were foils. All melodies were presented at recognition in a pseudorandom order, such that successive melodies did not share the same musical key. This pseudorandom order was reversed for half of the participants.

Procedure

After signing a consent form, each participant completed a brief test of their sight-reading ability (their ability to perform novel music accurately from musical notation); only participants who accurately performed the sight-reading test without any pitch errors within two trials were admitted to the experiment.

Participants then completed the four learning conditions. Participants were told that their memory for the melodies would be tested, but they were not told the nature of the memory test. Participants sat at an electronic keyboard while wearing headphones. Prior to each learning condition, participants heard and/or performed a practice sequence (a C major scale) to familiarize themselves with the current learning condition. Four metronome beats preceded each practice trial. In the auditory-only learning condition, participants heard computerized recordings of seven melodies while holding their hands in tight fists to prevent them from moving their fingers. In the motor-only learning condition, participants performed seven melodies without hearing any auditory feedback. The participants heard the first pitch of the melody on each practice trial to provide them with an auditory cue for how the melody would sound. In the strongly coupled learning condition, participants performed seven melodies and heard their own performances (with their own intensity and timing). In the weakly coupled condition, participants performed seven melodies while hearing the computer-generated melodies instead of their own feedback. Participants were instructed to perform in time with the recordings. In the motor-only, strongly coupled, and weakly coupled conditions, participants were instructed to perform at the tempo indicated by the metronome beats preceding each practice trial. Half of the participants practiced each melody three times, and the other half practiced each melody six times within each learning condition.

Participants then spent 3 min completing a music background questionnaire. Following this short delay, participants were tested on their auditory recognition of the melodies that they had learned. Participants heard 40 melodies and were instructed to indicate which melodies they had encountered in the learning conditions by circling “old” or “new” on an answer sheet. Participants also indicated their confidence in their responses on a 3-point scale (1 = unsure, 3 = confident).

Finally, participants completed tests of auditory and motor imagery ability. The auditory imagery test, adapted from Wing’s (1968) battery of aural skills, required participants to detect differences between a notated melody and a sounded melody, presented simultaneously. On each of 12 trials, participants had to report whether the notated melody was the same as or different from the heard melody; on some trials, the notated and heard melodies would differ by a single tone (for further details, see Highben & Palmer, 2004). The motor imagery test, adapted from Highben and Palmer, required participants to detect differences between an imagined and a performed sequence of finger movements. On each of 12 trials, participants first memorized an eight-item finger movement sequence for the left hand from a sequence of pictures; they then performed an eight-note sequence of finger movements with the left hand on a piano keyboard. Participants had to report whether the memorized movement sequence was the same as or different from the performed movement sequence; on some trials, the memorized and performed movement sequences differed only by one finger movement. The entire experiment lasted approximately 80 min, and participants received a nominal fee.

Results

Auditory recognition by learning condition and practice amount

The alpha level was set at .05 for all tests. Sensitivity (d') scores were calculated to index participants’ sensitivity to targets relative to foils (Coombs, Dawes, & Tversky, 1970). These scores were calculated by subtracting the standardized (z-score) false alarm rate (incorrect foil identification) from the standardized hit rate (correct target identification) within each learning condition for each participant, yielding four d' scores per participant. The mean false alarm rate was 22.40% (SE = 1.69); the false alarm rates did not differ by practice amount.Footnote 1 A 2 × 4 ANOVA on recognition (d' scores) by amount of practice and learning condition revealed a main effect of learning condition, F(3, 138) = 33.81, MSE = 1.36, p < .05, a main effect of practice amount, F(1, 46) = 6.72, MSE = 1.96, p < .05, and no interaction, F < 1. The mean d' scores are shown by learning condition and practice amount in Fig. 1. Recognition scores were higher following six practice trials (M = 2.11, SE = 0.15) than following three practice trials (M = 1.58, SE = 0.15). Recognition scores for the motor-only learning condition were lower than those for all other learning conditions (LSD = 0.47, p < .05), which did not differ from one another. There was a trend toward higher recognition in the strongly coupled condition than in the auditory-only condition (LSD = 0.46, p = .07). All mean d' values shown in Fig. 1 are significantly greater than zero (ps < .05), with the sole exception of the three-practice-trial motor-only condition, indicating some effects of increased practice with motor-only learning on auditory recognition. The same pattern of results was found when the analyses were conducted on hit rates instead of d′ scores.

Melody recognition may have depended on performed pitch accuracy during learning. Although overall pitch accuracy during learning was high (trials with no errors: M = 94.68%, SE = .56), occasional errors may have influenced recognition. Each participant’s d′ scores were recalculated for target melodies they had performed correctly on all learning trials (without pitch errors). A 2 x 4 ANOVA on these scores by practice amount and learning condition indicated a main effect of practice amount (F(1, 46) = 8.54, MSe = 2.34, p < .05), a main effect of learning condition (F(3, 138) = 36.04, MSe = 1.74, p < .05), and no interaction (F < 1). Better recognition followed six practice trials than three practice trials, and worse recognition followed the motor-only condition than all other learning conditions. In addition, recognition in the strongly-coupled condition (M = 2.72, SE = .20) was greater than recognition in the auditory-only condition (M = 2.10, SE = .18; LSD = .53, p < .05). Thus, for melodies that participants performed correctly, motor learning enhanced auditory recognition beyond auditory learning alone.

Fig. 1
figure 1

Recognition scores (mean d' values) by learning condition and amount of practice for Experiment 1. Error bars represent standard errors

Auditory recognition and stimulus/performance features during learning

To test whether recognition was influenced by specific features of the auditory feedback or of the pianists’ motor movements during learning, recognition scores were correlated with specific stimulus and/or performance features in each of the learning conditions. The mean IOIs for each quarter-note beat (mean beat IOI) and the standard deviations (SDs) of the quarter-note IOIs (beat variability) were examined for each practice trial for the target melodies (melodies presented during both learning and recognition, five melodies per condition). Mean intensity (the mean velocity of each keypress) and intensity variability (the SD of keypress velocities) were also examined for each practice trial. Correlations were computed between each participant’s mean values for the stimulus/performance measures and their recognition scores following each learning condition. The mean produced beat IOI in the weakly coupled condition (M = 449.6 ms, SE = 0.35) correlated positively with the recognition scores (N = 48, r = .37, p < .05), indicating that participants who performed more slowly had better memory in this condition.Footnote 2 The mean asynchronies in this learning condition were small and negative (M = –12.23 ms, SE = 0.64), indicating that participants successfully synchronized with recordings in this condition and tended to anticipate the stimulus onsets, typical of synchronization with temporally regular stimuli (cf. Aschersleben, 2002). Acoustic and performance properties at learning did not correlate with recognition scores in the other learning conditions, and the timing and intensity measures did not differ by practice amount in any learning condition.

Auditory recognition and imagery scores

To test potential influences of imagery on later recognition, the individual pianists’ recognition scores were correlated with their auditory and motor imagery test scores (calculated as percentages correct out of 12 items). These correlations were calculated separately for participants who received three and six practice trials per melody, in order to compare correlations across different amounts of practice. As is shown in Table 1, auditory imagery scores correlated positively with recognition following the motor-only condition for pianists with three or six practice trials, suggesting that auditory imagery filled in for missing auditory feedback at learning. Auditory imagery scores also correlated negatively with recognition in the weakly coupled condition for pianists who completed three practice trials. Motor imagery scores correlated positively with recognition in the weakly coupled condition for pianists who completed six practice trials. Auditory and motor imagery scores did not correlate with each other (r = .18, p > .05) and did not differ by practice amount (auditory imagery, M = 82.64, SE = 2.85; motor imagery, M = 84.03, SE = 1.55; F < 1).

Table 1 Experiment 1: correlations between imagery scores and auditory recognition scores (d') for three and six practice trials

Discussion

Four main factors influenced performers’ auditory recognition of music that they had heard or performed. First, melodies were recognized better than chance (d' values > 0) following all learning conditions (motor and auditory), except with smaller practice amounts in the motor-only condition. Second, recognition was substantially better for music that was learned with sound (with or without motor movements) than without sound, as expected. Third, increased practice amounts improved recognition in all learning conditions; performers’ auditory recognition of melodies was improved by increases in auditory or motor practice. Finally, melodies that were correctly performed with normal feedback (strongly coupled learning) were better recognized than melodies that were only heard (auditory-only learning). Overall, these findings suggest that both motor and auditory learning enhance auditory recognition of music, but that influences of motor learning on auditory recognition are contingent on strong coupling of auditory and motor information during learning.

Individual differences in performers’ auditory and motor imagery abilities were associated with their recognition. Performers with higher auditory imagery scores had better recognition following motor-only learning, suggesting that auditory imagery abilities aided later recognition by filling in for missing auditory feedback at learning. This finding, which held across different amounts of practice, is consistent with the previous finding that performers with high auditory imagery skills were better at performing melodies from memory that they had learned without auditory feedback (Highben & Palmer, 2004). Among performers who received three practice trials, those with higher auditory imagery scores had worse recognition in the weakly coupled condition, perhaps due to interference between their own imagery and the stimuli in this condition; presumably, their auditory imagery was related to their own motor performances, whose timing and intensity features did not match the metronomic auditory stimuli. Among performers who received six practice trials per melody, those with higher motor imagery scores had better recognition following the weakly coupled condition. Motor imagery abilities may have been most useful in the weakly coupled condition due to the increased challenge of synchronizing with the melodies. This correlation did not appear among performers who received three practice trials, perhaps because this group did not have enough motor practice to effectively engage motor imagery.

Performers demonstrated better recognition following strongly coupled auditory–motor learning than following auditory-only learning, despite the fact that melodies differed acoustically between learning and recognition in the strongly coupled condition only. This effect suggests that performers form a more generalized representation of melodies by the time of auditory recognition. The question remains, however, whether better memory following the strongly coupled condition was due to specific motor experience or to greater acoustic variation in the auditory stimuli; only in this condition did performers hear temporal and intensity variations from one performance to the next. This increased variation could have enhanced the auditory distinctiveness of the melodies in the strongly coupled condition, which might have aided subsequent recognition; this interpretation is consistent with the general-distinctiveness explanation offered by MacLeod et al. (2010) for the production effect on subsequent recognition memory. We tested in Experiment 2 how exposure to acoustically varying (nonmetronomic) performances in all auditory learning conditions (with or without performance) affects subsequent recognition, and whether greater auditory distinctiveness enhances or diminishes the effects of auditory–motor learning.

Experiment 2

In Experiment 2, we investigated how hearing and performing with acoustically varying melodies influences subsequent auditory recognition. Most music that people hear and recognize contains more acoustic variation than did the computer-generated music of Experiment 1; thus, Experiment 2 was a naturalistic extension of Experiment 1. Pianists practiced each novel melody six times in the same learning conditions as in Experiment 1, with the following change: Different acoustically varying recordings of each melody were presented on each learning trial in the weakly coupled and auditory-only learning conditions. As in Experiment 1, pianists were instructed to perform along with the melodies that they heard in the weakly coupled condition, without hearing their own feedback. This learning condition was expected to further decouple motor performance from auditory information, due to the increased challenge of performing along with unfamiliar, temporally varying performances (Keller et al., 2007). As in Experiment 1, strongly coupled auditory–motor learning was expected to improve recognition beyond auditory learning alone. Increased auditory distinctiveness in the weakly coupled condition might enhance later recognition of auditory stimuli, or alternatively, it might diminish recognition due to increased decoupling of the auditory information from the motor performance.

Method

Participants

A group of 24 adult pianists (9 female, 15 male) were recruited from the Montreal community. None of the pianists had participated in Experiment 1. They had a mean of 13.44 (range = 8–25) years of formal, private piano instruction. Participants reported having no speaking, hearing, or learning disorders. Six of the participants reported having absolute (perfect) pitch; these participants had higher auditory imagery scores than did the participants with relative pitch only, t(1, 22) = 2.29, p < .05. The pattern of results described below was the same when participants with absolute pitch were excluded from the analyses, with one exception (see the imagery findings below).

Stimulus materials

The same melodies used in Experiment 1 were presented in Experiment 2, with the following change: Auditory stimuli presented in the weakly coupled and auditory-only conditions were chosen from the note-perfect performances of participants who had practiced melodies six times in the strongly coupled (normal performance) learning condition in Experiment 1. Performances by six different pianists were selected for each melody: therefore, a different human performance of the same melody was heard on each trial in the weakly coupled and auditory-only learning conditions. The mean tempo of the stimuli was close to the metronomic tempo of 450 ms (454.33 ms), and the mean intensity level of the performances was normalized (mean velocity = 69.0). All other aspects of the performances (the timing of tone onsets and offsets and the difference in intensity between successive tones) were retained in their original forms. Among the 20 stimuli presented at both learning and recognition, the mean beat IOI was 451.0 ms (range: 416–490.9 ms), and beat variability (the mean SD of the beat IOI within each stimulus) was 18.4 ms (range: 7.3–33.3 ms). The mean beat IOI and beat variability did not differ among the four target melody sets that were assigned to each learning condition (ps > .05). The same six performances of each melody were presented to all participants in different learning conditions. The assignment of each set of melodies to each learning condition was counterbalanced across participants, so that the mean beat IOI and mean beat variability were equivalent for the auditory-only and weakly coupled learning conditions.

Equipment, design, and procedure

The equipment, design, and procedure of Experiment 2 were the same as in Experiment 1, except that all participants in Experiment 2 received six practice trials per melody at learning.

Results

Auditory recognition by learning condition

Participants’ responses to targets and foils at recognition were again converted to d' scores. The mean false alarm rate was 26.04% (SE = 2.50). An ANOVA on recognition scores (d' scores) indicated a significant effect of learning condition, F(3, 69) = 17.89, MSE = 1.32, p < .05; as is shown in Fig. 2, recognition following the motor-only condition was worse than recognition in all other conditions (LSD = 0.66, p < .05), which did not differ from one another. All d' values shown in Fig. 2 are significantly larger than zero (ps < .05), indicating some effects of motor-only learning on auditory recognition. The same pattern of results was found when the analyses were run on hit rates instead of d' scores.

Fig. 2
figure 2

Recognition scores (mean d' values) by learning condition for Experiment 2. Error bars represent standard errors

Recognition was further examined for melodies that had been performed correctly at learning. The overall pitch accuracy during learning was high (mean correct trials = 93.5%, SE = 1.08). Each participant’s d′ scores were recalculated using only their responses to target melodies that they had performed without pitch errors on all practice trials. An ANOVA on these recognition scores confirmed a significant effect of learning condition, F(3, 69) = 13.31, MSE = 2.00, p < .05; recognition scores were lower in the motor-only condition than in all other learning conditions (LSD = 0.81, p < .05), which did not differ from one another.

Auditory recognition and stimulus/performance features during learning

Recognition scores were compared with the specific stimulus and/or performance features that were heard or produced in each learning condition: (1) the timing/intensity characteristics of the auditory stimuli (strongly coupled condition, weakly coupled condition, and auditory-only condition) and (2) the timing/intensity characteristics of the pianists’ motor performances (strongly coupled condition, weakly coupled condition, and motor-only condition). Correlations between the stimulus and/or performance features of the melodies (mean beat IOI, SD of beat IOI, and SD of intensity) in each learning condition and recognition scores are shown in Table 2. Analyses are reported by stimulus or by performance, allowing for dissociation of those variables in the weakly coupled condition.

Table 2 Experiment 2: correlations between performance/stimulus characteristics and auditory recognition scores (d')

The findings displayed in Table 2 show that recognition of melodies from the weakly coupled condition correlated with the acoustic stimulus features in this condition. The stimulus tempo (mean beat IOI) correlated positively with recognition, indicating that pianists who performed with slower stimuli in the weakly coupled condition had better recognition. Tempo variability (beat SD) and intensity variability (intensity SD) correlated negatively with recognition; pianists who heard less-variable stimuli in the weakly coupled condition had better recognition. The same tempo and intensity characteristics of participants’ performances in the weakly coupled condition did not correlate with subsequent recognition. Thus, the acoustic features of the sounded stimuli, and not features of the pianists’ own productions, were associated with later recognition.

The stimulus features and performance features were correlated for each melody in the weakly coupled learning condition. The mean beat IOI produced by performers correlated positively with the mean beat IOI of the auditory stimuli (r = .90, p < .05), suggesting that performers closely matched the tempi they heard. The mean asynchrony between performance onsets and stimulus onsets (produced beat onset minus stimulus beat onset, M = –11.37, SE = 0.93) was negative, indicating that performers anticipated the stimulus onsets, as expected. Pianists’ variability in their produced tempo (SD of beat IOI) in the weakly coupled condition increased for the acoustically variable stimuli in Experiment 2 (M = 29.34 ms, SE = 1.41) relative to the temporally regular stimuli in Experiment 1 (M = 22.07 ms, SE = 0.71) [t(1, 70) = 5.14, p < .05]. Pianists’ variability in asynchrony (SD of asynchrony) in the weakly coupled condition also increased in Experiment 2 (M = 26.13, SE = 1.03) relative to Experiment 1 (M = 18.42, SE = 0.57) [t(1, 70) = 7.08, p < .05]. These results document that performance was influenced by the acoustic variations in Experiment 2 and suggest that the temporal variability of the melodies did indeed increase their distinctiveness.

Auditory recognition and imagery scores

Recognition scores were correlated with auditory and motor imagery scores, as in Experiment 1. Neither imagery test correlated with recognition scores. When participants with absolute pitch were removed from the analyses, a positive correlation between motor imagery scores and recognition in the auditory-only condition emerged (r = .40, p < .05), suggesting that imagery filled in for missing information at learning, as had been found in Experiment 1. Auditory and motor imagery scores did not correlate with each other (r = .004, p > .05).

Auditory recognition: comparison between experiments 1 and 2

Melody recognition following six practice trials in Experiment 1 (n = 24) was compared with recognition in Experiment 2 (N = 24). A two-factor Experiment × Learning Condition ANOVA on d' scores revealed a main effect of learning condition, F(3, 138) = 33.30, MSE = 1.38, p < .05; recognition in the strongly coupled condition was better than that in the auditory-only condition, and recognition in the motor-only condition was worse than that in the other conditions (LSD = 0.48, p < .05). There was no main effect of experiment or any interaction (Fs < 1).

The participants from Experiments 1 (n = 24) and 2 (N = 24) did not differ in their auditory imagery abilities, years of piano instruction, or hours of weekly piano practice (ps > .05). Participants in Experiment 1 scored higher on motor imagery measures (M = 84.72, SE = 2.11) than did participants in Experiment 2 (M = 77.43, SE = 2.63) [F(1, 46) = 4.68, MSE = 136.31, p < .05]. This difference may partially explain the weaker relationship between recognition and imagery scores in Experiment 2, whose participants had lower motor imagery scores.

Discussion

Experiment 2 investigated the influence of auditory distinctiveness on subsequent auditory recognition by having participants hear and perform with acoustically varying melodies during the weakly coupled auditory–motor and auditory-only learning conditions. Recognition was measured after participants listened to or played along with a variety of human performances in the auditory-only and weakly coupled conditions, in contrast to the computer-generated (metronomically regular) acoustic stimuli presented in Experiment 1. Recognition scores yielded a pattern of findings similar to the one found in Experiment 1: Performers were able to recognize melodies better than chance following all learning conditions, and recognition scores following motor-only learning were lower than scores in all of the other learning conditions. The combined results of Experiments 1 and 2 revealed that recognition was higher in the strongly coupled condition than in the auditory-only condition, suggesting an enhanced effect of motor learning on subsequent auditory memory. The acoustic characteristics of the melodies correlated with recognition scores, but only when melodies were accompanied by performance at learning (weakly coupled condition), further suggesting a role of auditory–motor learning on subsequent recognition. Overall, the auditory distinctiveness of the melodies in Experiment 2 did not change the original findings that motor experience can enhance auditory memory for music. Thus, Experiment 2 extended the results of Experiment 1 by documenting that the influence of motor learning on later recognition transcended the specific acoustic characteristics of the music: Memory for music comprises an abstraction of the pitch sequence, and motor learning further enhances this abstract auditory memory.

General discussion

In two experiments, we manipulated the presence of auditory and motor information as pianists learned unfamiliar melodies, and the pianists subsequently performed an auditory recognition task. The coupling between auditory and motor information at learning was also manipulated: Auditory information was strongly coupled with (generated by) or weakly coupled with (presented during, but not generated by) pianists’ movements during learning. In the first experiment, we manipulated the amount of practice during learning, and in the second experiment we increased the acoustic distinctiveness of the melodies presented during learning. Three main factors affected performers’ recognition scores. First, performers recognized the melodies that they had performed (with motor and auditory information present) better than melodies that they had only heard. Second, recognition scores were influenced by the distinctive acoustic features of the melodies during weakly coupled auditory–motor learning: When performers had performed along with the melodies, they better recognized melodies that were slower or more regular in timing and intensity than melodies with greater variability. Finally, performers with high imagery abilities exhibited better recognition for melodies learned with altered or absent auditory or motor information, as compared to performers with low imagery abilities. We discuss each of these influences on auditory recognition in turn.

Auditory–motor learning improved pianists’ auditory recognition beyond recognition from auditory-only learning. This effect was specific to the strongly coupled learning condition (normal performance), suggesting that the influence of motor learning on auditory memory depends on a close coupling or match between auditory and motor information. This effect also depended on accurate performance and sufficient practice at learning. The finding is notable that strongly coupled auditory–motor learning resulted in higher recognition than did the auditory-only condition, because the stimuli at recognition were acoustically identical to those in the auditory-only learning condition, but not to those in the strongly coupled learning condition. These findings are consistent with the “production effect” (MacLeod et al., 2010), since the strongly coupled condition involved production and the auditory-only condition did not. The production effect cannot explain the lack of a difference between the weakly coupled and auditory-only conditions, however, since weakly coupled learning also involved production. We propose that strong auditory–motor coupling aids learning by facilitating the formation of auditory–motor associations. Like MacLeod and colleagues, we propose that production aids subsequent recognition by providing additional retrieval cues; we further propose that motor processes play a specific role in the encoding and subsequent recognition of music, by enhancing the auditory–motor associations that underlie performance. Our proposal is consistent with theories of sensorimotor integration that have posited that actions can shape sensory perception through mental simulation of action plans (Jeannerod, 2001; Keller et al., 2007) or through shared sensory and motor representations (Hommel, 2009; Hommel et al., 2001; Liberman & Mattingly, 1985). Therefore, we suggest that the formation of sensorimotor associations at learning may aid subsequent auditory recognition.

The distinctive auditory features of the melodies in Experiment 2 influenced subsequent auditory recognition. Melodies with slower tempi and more regular timing and intensity patterns were better remembered by pianists when they performed along with those melodies. The acoustic features of the melodies—not features of the pianists’ own (produced) performances—predicted recognition following weakly coupled auditory–motor learning (as is shown in Table 2). Acoustic features alone (auditory-only learning) did not predict later recognition. Therefore, the relationships between the acoustic features at learning and subsequent recognition are best explained not by auditory or motor encoding alone, but rather by joint auditory–motor encoding. Auditory stimuli with particular characteristics such as reduced tempo, greater temporal regularity, and greater intensity regularity may be easier for performers to track with their movements, and therefore easier to learn. Greater acoustic distinctiveness did not improve recognition on average, as the results of Experiment 2 (with variations in acoustic features) did not differ overall from those of Experiment 1 (with no variations in acoustic features). Instead, distinctive acoustic features that were accompanied by performance (motor movements) at learning influenced later recognition.

Individual performers’ imagery abilities influenced recognition of music that was learned with altered or absent auditory or motor information. Performers’ auditory imagery scores correlated positively with their recognition of music that they had learned without auditory feedback (motor-only condition). Skilled auditory imagers may have been able to mentally compensate for missing auditory feedback at learning, as was suggested by previous findings (Highben & Palmer, 2004). When performers received small amounts of practice (three trials), auditory imagery scores were inversely correlated with recognition of music learned when the auditory information did not precisely match their own actions (weakly coupled condition). These performers may have experienced interference when they heard auditory information inconsistent with their own productions, thus disrupting encoding. When performers received larger amounts of practice (six trials), motor imagery scores were positively correlated with recognition of music learned with both auditory and motor components (weakly coupled condition). Experiment 2 documented a correlation between motor imagery and recognition following auditory-only learning, a finding that similarly suggests compensation for missing motor information at learning. We hypothesize that imagery abilities help performers form auditory–motor associations when sensory or motor information is missing at learning; this will be an avenue for future research.

Possible limitations in interpreting these findings include the fact that the auditory and motor imagery measures may have engaged multiple types of imagery or cognitive strategies. Musicians may engage motor and auditory processes when reading musical notation (Brodsky, Kessler, Rubinstein, Ginsborg, & Henik, 2008), as they did in the auditory imagery task reported here. Despite this possibility, a distinction between the motor and auditory imagery tasks is supported by the fact that scores on the two measures did not correlate with one another, and they correlated differentially with recognition following the different learning conditions. Motor learning may also incorporate multiple processes, including efferent components (plans, intentions, or motor commands generated by the central nervous system) and afferent components (proprioceptive, kinesthetic, and tactile feedback to the central nervous system). Measures of cortical and muscle activity or somatosensory feedback manipulations could help to clarify which components of motor learning contribute to auditory recognition.

In sum, performing musicians can use acquired motor information following relatively brief amounts of practice to aid subsequent auditory recognition of that music. Improved auditory recognition was not contingent on a precise auditory match between the stimuli presented at learning and test; instead, musicians’ encoding was aided by motor experience that was strongly coupled to auditory feedback. Performers’ ability to engage in mental imagery influenced their recognition most in learning conditions that limited the amount of auditory or motor information present. Further investigations of auditory–motor learning with functional imaging techniques may help to clarify how production aids auditory recognition and may determine how auditory and motor imagery operate during music learning and retrieval.