Music ensemble playing is a peculiar, complex and naturalistic form of nonverbal joint action that is of scientific interest to researchers in music psychology, performance science and neuroscience (Keller et al.,
2014). In order to achieve a cohesive and expressive performance, trained ensemble musicians in the Western classical tradition coordinate and adjust their tempo, sound and bodily gestures to align with that of their co-performer(s), agree on a shared understanding of the composer’s intentions, predict co-performer(s) individual intentions, and often monitor audience responses.
Musicians’ body motion during ensemble performance is continuous and multifunctional. Some aspects of motion are directly involved in sound production; other aspects, often referred to as “ancillary motion”, support sound-producing motion, facilitate interpersonal communication and interactions, help with achieving expressive goals, and provide visual expressive cues to audience/co-performers (Jensenius et al.,
2010). Ancillary motion relates to the performer’s understanding of the piece’s structural significance, to the coordination of musical phrases (Thompson & Luck,
2011), and to expressive intentions (Dahl & Friberg,
2007; Thompson & Luck,
2011). In sum, body motion can help co-performers to achieve their intended musical interpretation and intentions and ease coordination.
In this study, we analysed interperformer coordination, as manifested in musicians’ continuous head motion during duo performances, in relation to the empathic perspective taking ability of the musicians and the structure of the music performed.
In the following sections, we discuss the current understanding of the role of body motion in ensemble performance and the relationships between interperformer coordination and empathy, review methodologies for the analysis of musicians’ bodily coordination, and then pose hypotheses for the current study.
Body motion in ensemble performance
Musicians’ body motion plays a fundamental role in ensemble playing. Some studies have shown that head movements can reflect the emotionally expressive intentions that musicians aim to convey. For example, the information flow between members of a professional trio, as measured in their anterior-posterior head sway, was found to be higher when performing with emotional expression than without (i.e., a mechanical performance) (Chang et al.,
2019). Pianists’ head movement velocities have also been found to differ depending on the pianists’ expressive intent, and to be higher in expressive serene conditions than in sad, allegro, and overexpressive conditions (Castellano et al.,
2008).
Some emotional intentions that musicians aim to convey during solo performances can also be communicated to observers through body motion. In an experiment involving participants’ ratings of silent videos of marimba clips in which a musician was instructed to express different intentions, Dahl and Friberg (
2007) found that the intentions of happiness, sadness and anger of marimba performances were successfully communicated by the musician to participants. Anger was mostly represented through jerky movements, happiness was represented through large movements, and sadness was represented through slow and smooth movements.
Musicians’ body motion can also reveal leader-follower relationships between musicians during ensemble playing. Designated leaders in piano duos tend to raise their fingers higher than designated followers (Goebl & Palmer,
2009), and a study of coupling in the body sway of string quartet players shows that assigned leaders influence the others more and are less influenced by others than are followers (Chang et al.,
2017). Studies on leadership in string quartets in ecologically more valid situations (i.e., without researchers assigning the role of leader) highlight the complexity and differentiated patterns of dependencies rather than the more traditional role allocation that attributes leadership to the first violin (Glowinski et al.,
2012; Timmers et al.,
2014). These results demonstrate that musicians’ body movements reflect musical roles of leader or follower, and suggest that the leader-follower relationship can impact the way musicians adapt their behaviours to that of the co-performers during ensemble performance. These findings also imply that leader-follower roles are flexible and may be exchanged back and forth during a piece.
Furthermore, body motion in music ensembles can facilitate interpersonal communication, coordination and interaction. Certain acceleration patterns in leaders’ head gestures, comprising the period of deceleration following acceleration peaks, were found to communicate beat position in piano duos synchronizing piece entrances (Bishop & Goebl,
2017). Also found were increases in interperformer coordination, quantity of head movements and explicit cueing gestures in piano and clarinet duos during irregularly timed passages compared to other parts of a piece, demonstrating a tendency to interact visually during periods of temporal instability (Bishop et al.,
2019). Pianists’ head movements in piano duos were found to be more synchronized when auditory feedback was reduced (i.e., both pianists could hear only themselves though playing together), compared to when designated leaders could hear only themselves, whilst followers had full feedback, and also compared to when both pianists had full auditory feedback. These results demonstrate that musicians can adapt their body motion as a way to maintain successful interpersonal coordination if auditory information is incomplete (Goebl & Palmer,
2009).
Taken together, these studies demonstrate that musicians’ body motion can relate to higher-order piece structure, reflect the dynamics of nonverbal information flow and facilitate communication between co-performers during ensemble performances.
Relationships between interperformer interactions and empathy
Joint action activities can enhance the capacity to understand what another person is experiencing, which is generally referred to as “empathy”, a term coined in 1909 by Edward Titchener as translation from “Einfühlung” (“feeling into”) used in the German aesthetic theory of the nineteenth century (Titchener,
1909). Empirical investigations in joint action activities have recently revealed the bidirectional relationship between interpersonal coordination and empathy, which represents a fundamental social-psychological component of ensemble playing.
A body of research has analysed whether and how interpersonal interactions in joint music making enhance empathy related skills in children and adults. Results demonstrate that long-term participation in group-based, interactive musical activities increases emotional empathy scores in school-aged children (Rabinowitch et al.,
2012), and sympathy and pro-social skills in children having poor pro-social skills before the musical intervention took place (Schellenberg et al.,
2015). Interestingly, it has been demonstrated that the ensemble experience of college music students sampled in the United States and in South Korea relates to the student’s empathy skills (Cho,
2019; Cho & Han,
2021).
In addition to studies analysing the impact of ensemble playing on empathy, research has also demonstrated the impact of empathy on joint action. Empathy impacts the three core cognitive-motor skills (i.e, anticipation, adaptation, and attention) underpinning interpersonal coordination in expressive ensemble performance (Keller,
2014). In a multidimensional approach to empathy, the Empathic Perspective Taking (EPT) trait,
1 a component of cognitive empathy referring to the individual tendency to adopt the psychological point of view of other(s) (Davis,
1980,
1983), has been the focus of a number of investigations in ensemble performance.
Cognitive neuroscience studies, analysing joint-action in “simulated” piano duos (i.e., pianists who believe that they are playing along with a second pianist, though performing with a pre-recorded performance), show that higher empathic perspective-taking scores are correlated with higher microtiming adaptation (Novembre et al.,
2014). Similar studies in this field have shown that more empathic musicians rely on motor simulation to a higher degree, since EPT was found to be positively correlated with neurophysiological measures (i.e., corticospinal excitability recorded by means of electromyography) indicating pianists’ ability to represent their co-performer’s actions in their own motor system (Novembre et al.,
2012). A recent study further investigated the role of EPT during a joint music-making task, by demonstrating that this promotes interpersonal synchronization accuracy measured at low-order note-to-note synchronization, and that designated followers with high EPT scores show greater predictive skills than the low EPT followers, as they lagged behind leaders to a smaller degree (Novembre et al.,
2019).
In summary, these results demonstrate that empathy improves synchronization skills. However, these results have not been corroborated in the context of ecologically valid ensemble performances. Empathy may also facilitate coordination at a deep expressive level, leading to effects on unintentional coordination of musicians’ head motion in ensembles. Empathy might also impact leader-follower relationships by promoting followers’ greater abilities in anticipating leaders’ behaviour, due to the enhancement of predictive skills and simulation mechanisms. The goal of our study was to shed some more light on this aspect by seeking evidence of the impact of EPT on interperformer coordination of piano singing-duos, operationalised in terms of musicians’ head motion.
Measures of interpersonal coordination are meant to characterise the synchronicity of two person-related time series of sensor readings. The investigation can be performed in the time- and/or frequency-domain (Issartel et al.,
2015). Time-lagged cross-correlations methods may allow for an adequate measurement of the synchrony in applications to event timing data. However, cross-correlation methods are prone to producing spurious results when applied to sensor readings of musicians’ body movements (Dean & Dunsmuir,
2016): in response to the flow of the musical score, the sensor data stream is smoothly changing, hence auto-correlated as well as non-stationary, i.e. with statistical properties changing over time. Frequency-domain analyses of time-series data are of particular interest to studies of ensemble performance, in which ensemble musicians’ expressive movements often reflect the hierarchical structure of the music, defined by subdivisions which are found within beats, within bars, within phrases, within sections, within pieces (Demos & Chaffin,
2017; Demos et al.,
2017). A well-known mathematical method for spectrum analysis is the Fourier transform, which computes the power of individual sinusoidal components. This specific method is very efficient with interactions relying on stable frequencies across time, as it assumes the stationarity of the processes in time (Issartel et al.,
2006); however, it can present a practical limitation in the case of movement interactions in ensemble performances, where a dominant rhythm cannot always be set and does not readily translate into fixed-frequency body oscillations, as faster and slower bodily oscillations can frequently occur.
An alternative method that circumvents the rigid assumptions of Fourier analysis is wavelet analysis. It allows for variable frequencies across time and can thus capture also intermittent oscillations in the time-frequency domain (Torrence & Compo,
1998) as well as nested rhythmic structures (Schmidt et al.,
2012; Washburn et al.,
2014). When applied to a single time series, the wavelet transform will provide information about the time-frequency structure of the series; that is, which frequency is important at what time.
The joint wavelet transformation of two time series yields the so-called cross-wavelet transform (CWT), which provides information about which frequency is an important constituent in both series. Thus, applied to time series of two musicians’ sensor readings, CWT analysis gives insight into the strength, or power, of this joint frequency in bodily oscillations. In addition, the oscillations at a joint frequency can be in phase (e.g., both musicians moving forward and backward in sync), or out of phase (e.g., one musician moving forward while the other is moving backward). At any point in time, CWT analysis allows to measure the degree of synchronization of joint-frequency oscillations by means of the phase difference, or phase shift. Therefore, the concept of phase difference will not only indicate whether the two oscillations are in phase or out of phase, but also which oscillation is leading (i.e., reaches its peak or trough first, within a cycle). CWT analysis thus permits the identification of patterns of coordination between two musicians and also provides an indication of the tendency to lead or follow. The CWT has been found useful in a variety of disciplines, including geophysics (Grinsted et al.,
2004), electroencephalographic studies (De Carli et al.,
2004), and also social psychology measuring interpersonal interactions in dance settings (Washburn et al.,
2014) and between co-actors during joke-telling tasks (Schmidt et al.,
2014).
A few studies have already demonstrated the potential of using CWT in the context of music ensemble performances. Walton et al. (
2015) showed different patterns in the lateral forearms movements of piano players emerging as a function of the musical context, when piano duos improvised and played in synchrony with an ostinato backing track. Specifically, coordination between pianists was stronger when playing in synchrony rather than improvising, and synchronization power was strongest at the four second period length, corresponding to the melodic phrase of four ascending chords repeating every four seconds. This aspect was further investigated by Eerola et al. (
2018) using wavelet and cross-wavelet transform analysis with computer vision tools; the researchers demonstrated a range of periodic behaviours in each performer with frequency peaks that differ for non-pulsed and pulsed jazz duo performances, being at higher frequencies (0.75 and 0.40 Hz, faster movements) for the former, and lower frequencies (0.50 and 0.33 Hz, slower movements) for the latter. The strength of the interperformer interactions in the non-pulsed music, as measured by CWT power, was found to predict audience perception of communicative interactions between co-performers. In a later study, Jakubowski et al. (
2020) found that synchrony judgments of jazz duo performances were related to the regularity of the musical pulse, and synchrony ratings increased when musicians’ periodic movements were at similar frequency bands.
CWT has been also used in the context of Indian classical instrumental music, and results show that interpersonal coordination of movements was greater at metrical boundaries and mostly related to cadential than other metrical instances (Clayton et al.,
2019). Furthermore, Dotov et al. (
2021) found that interpersonal synchrony between audience members, as manifested in their head movements, was tighter with music higher in groove and when audience members could see each other rather than in absence of visual contact, suggesting that social context and musical features impact how the music is embodied. Taken together, these studies demonstrate the successful application of CWT analysis for a better understanding of interpersonal interactions in ensemble playing, by offering new insights into the correspondence between interpersonal coordination and musical features such as structural hierarchy and music styles.
Based on a plurifrequential approach to motion analysis, we investigated interpersonal movement coordination by means of cross-wavelet analysis in piano-singing duos. More specifically, we operationalised the strength of coordination as the power of common periodic oscillations in musicians’ head motion, and the tendency to lead and follow a co-performer in terms of the phase difference between these periodic oscillations. We expected that power and phase difference at nested time scales, from micro structure (i.e., half bar) to macro structure (i.e., four bars and form sections), would reveal the hierarchical nature of head motion in line with the hierarchical structure of the music and the dynamical aspects of leadership.