Present study
The main aim of our study was to investigate how musicians’ body behaviour contributes to the perception of the overall quality of music performance. Specifically, we wanted to understand how the preferred motion degrees differed depending on the musical repertoire and musical expertise, and whether observers relied more on the quantity or quality of motion for their evaluation. Previous research demonstrated that increased amounts of movement associate with increased performance ratings (e.g. Bugaj et al.,
2019; Davidson,
1993; Nusseck & Wanderley,
2009, see Introduction for more details). However, given that most studies use two (Moura et al.,
2023; Nápoles et al.,
2022) or three motion styles (Silveira,
2014; Van Zijl & Luck,
2013), it is natural that participants incline towards upvoting the condition with larger movement and vice versa. To better understand this phenomenon, we expanded the number of motion degrees included in the design (five for each of the two musical excerpts). These were thoroughly selected to represent gradual levels of global quantity of motion (QoM), calculated from the motion data. Hence, the first motion degree, D1, corresponded to minimal movement, and the last, D5, to exaggerated movement. To assess the influence of motion quality, in the in-between motion degrees (D2…D4), we included performances presenting predominant gesture types (i.e. flap, head nod).
We hypothesised that, if observers relied more on the quantity of motion, ratings would increase as a function of the QoM values (Hypothesis 1a). Contrarily, if observers relied more on the quality of motion, the ratings of one or more in-between motion degrees presenting a prominent gesture type (D2…D4) would surpass the exaggerated degree (D5) (H1b).
Regarding musical style, we used two contrasting pieces representing positive and negative valence. In Western classical music, fast tempo and major modes are associated with positive valence (i.e. happiness, joy), and slow tempo and minor modes with negative valence (i.e. sad) (Husain et al.,
2002; Schellenberg et al.,
2008; Thompson et al.,
2001; Webster & Weir,
2005). Therefore, following previous findings (Trevor & Huron,
2018), we hypothesised that the “optimal motion degree” would differ according to the musical style: increased motion would be preferred for the energetic, joyous excerpt, and average to low motion would be preferred for the slow-paced melancholic excerpt (H2). We further considered that ratings would differ between musician and non-musician groups, with the latter associating exaggerated motion conditions with better performance (H3).
Second, we wanted to revisit the role of the visual component in music performance perception to validate whether body behaviour is, in fact, fundamental for audience engagement. For that, based on previous research (i.e. Coutinho & Scherer,
2017; Davidson,
1993; Lange et al.,
2022), we presented the set of ten stimuli in audio-only (A), audio–visual (AV) and visual-only (V) conditions. Here, we hypothesised that patterns of visual dominance would emerge in both excerpts (H4). Regarding expertise, considering the existing contradictory findings, we predicted that either non-musicians would make more use of visual cues (Davidson,
1995; Davidson & Correia,
2002; Huang & Krumhansl,
2011) (H5a), the inverse (Lange et al.,
2022) (H5b), or not to find significant differences between groups (Tsay,
2013,
2014) (H5c). In addition, it would be expected that the AV condition would collect the highest ratings due to its richer multisensorial nature, considering that presentations of congruent audio–visual stimuli enhance perception when compared to unimodal conditions (Id et al.,
2019; Thompson et al.,
2005).
Discussion
In this study, we investigated the role of musicians’ body behaviour in the observers’ perception of music performance quality. It is well established that body movements influence how the audience evaluates (e.g. Broughton & Stevens,
2009), feels (e.g. Coutinho & Scherer,
2017) and even hears (e.g. Schutz & Lipscomb,
2007) performances. We built on previous research by increasing the number of motion degrees included in our design, a suggestion that has been placed before (Silveira,
2014), which allowed us to further inspect how quantity and quality of motion take part in performance evaluation. In addition, we used stimuli presenting two contrasting musical excerpts to understand if the motion degrees perceived as optimal vary depending on musical style. Finally, we compared ratings in audio-only (A), audio–visual (AV) and visual-only (V) conditions to test for sensory dominance. To analyse the potential effects of musical expertise, both musician and non-musician participants took part in the experiment.
Our first set of hypotheses (H1…H3) focussed on body behaviour in AV perception, the most proximal condition to real performance settings. We found that musical style has a determinant effect on whether observers’ build their evaluation based on the quantity or quality of motion. For the positive-valenced Creston excerpt (fast, energetic, major mode), the ratings went up as the quantity of motion increased (D1 < D2 < D3 < D4 < D5), whereas for the negative-valenced Debussy excerpt (slow, melancholic, harmonically complex), the quality of motion was more important than the quantity (D1 < D3 < D2 < D5 < D4). These results suggest that positive-valenced music is perceived as matching high motion profiles, and negative-valenced music yields a different music-movement match perception, in which specific gestural behaviours can be perceived as more adequate. In this sense, our findings partially align with other studies in which better performative ratings were recorded in conditions with increased amounts of movement (Broughton & Stevens,
2009; Bugaj et al.,
2019; Burger & Wöllner,
2023; Davidson,
1993; Grady & Gilliam,
2020; Juchniewicz,
2008; Moura et al.,
2023; Nápoles et al.,
2022; Nusseck & Wanderley,
2009; Silveira,
2014; Trevor & Huron,
2018; Van Zijl & Luck,
2013). However, we expand on these studies by demonstrating that such observation is only applicable to the case of happy, energetic, and harmonically familiar music. Concomitantly, in the study by Trevor and Huron (
2018), when asked to adjust the movements of musicians’ stick figures to create the best performances, participants favoured augmented movement for fast technical passages and only slightly above normal movement for slower lyrical passages. Yet, here, in the Debussy, the flap motion degree (D4) was perceived as the most adequate, and it presented the second-highest QoM value. Also, anterior–posterior sway (D2) scored higher than mediolateral sway (D3), although it held lower QoM. Based on these results, we infer that, for the case of negative-valenced music, isolated motion types are perceived as more adequate than overall exaggerated motion, possibly due to the way they are executed in association with the music (i.e. based on visual inspection, the performer’s flaps reflected phrasing intentions). Further research is needed to better understand this effect. Whatsoever, common to both excerpts, the minimal movement degree (D1) was the lowest scored, suggesting that, despite the musical style, performances with restricted motion are perceived as less expressive, professional and overall worse. This result directly aligns with previous studies in which low movement conditions were the lowest rated (Davidson,
1993; Moura et al.,
2023; Weiss et al.,
2018). Hence, if listening to moving musical forms whilst watching static body motion is perceived as poorer music performance, we conclude, after previous work (Id et al.,
2019; Thompson et al.,
2005), that a certain level of congruency between sound and visuals is required to enhance perceptual experiences.
Musical expertise did not have a significant effect on the AV motion degree ratings, suggesting that musicians and non-musicians share proximal conceptions regarding the match between music and body behaviour. Nevertheless, it had a significant effect on the ratings of the musical excerpts. Whilst non-musicians gave significantly higher ratings to Creston in comparison to Debussy, musicians gave proximal ratings to both, indicating that they can equally appreciate performances of analogous quality levels (here, expert performances) independently of their personal musical preference. Studies have demonstrated that musical training translates into a greater preference towards musical complexity (Matthews et al.,
2019; North & Hargreaves,
1995; Witek et al.,
2023). Furthermore, experts develop distancing mechanisms that attenuate preliminary emotional reactions in art appreciation (Leder & Schwarz,
2017; Leder et al.,
2014), allowing them to focus on aesthetic qualities related to stylistic and formal aspects (Scherer,
2005). This phenomenon enables experts, for example, to appreciate negatively valenced art more than non-experts (Leder et al.,
2014), as they are able to detach from the emotional impact of the artwork. These findings support why musicians gave similar ratings across excerpts, even though the Debussy presented a more complex harmonic and rhythmical structure. On the other hand, in Western classical music, research has shown that listeners typically associate fast tempo and major modes with happiness, whereas sad-perceived music is usually slow and minor (Schellenberg et al.,
2008). Up-tempo, major excerpts are higher rated (Burger & Wöllner,
2023; Husain et al.,
2002; Schellenberg et al.,
2008; Webster & Weir,
2005) and provoke greater levels of post-listening enjoyment (Thompson et al.,
2001). In this sense, non-musicians followed a natural tendency to prefer joyous, energetic music rather than the complex Debussian excerpt, whose compositional style departs from traditional diatonicity and can ultimately sound unfamiliar to unexperienced listeners (Laneve et al.,
2023).
To investigate our second set of hypotheses (H4, H5), we analysed the differences between motion degrees in A, AV and V conditions. The main finding deriving from the interaction analysis was that, whereas both musicians and non-musicians displayed patterns of visual dominance for the Creston excerpt, for the Debussy, musicians shifted to auditory dominance. Again, the musical style had a significant effect on multisensory perception. In the Creston, our results align with studies demonstrating that expert musicians make equally strong use of vision as novices (Tsay,
2013,
2014). Furthermore, the pattern of visual dominance found in both groups aligns with several studies showing that visual cues exceed auditory ones in music performance evaluation (Broughton & Stevens,
2009; Coutinho & Scherer,
2017; Lange et al.,
2022; Pope,
2019; Schutz & Lipscomb,
2007; Wapnick et al.,
2004). In the Debussy, inversely, we observed that non-musicians relied more on the visuals, as suggested by other group of studies (Davidson,
1995; Davidson & Correia,
2002; Huang & Krumhansl,
2011). Repertoire following harmonically innovative musical systems, such as the Debussy, can potentially induce more ambiguous listening experiences (Laneve et al.,
2023). In this sense, we hypothesise that, due to the superior compositional and emotional complexity of the Debussy, musicians were absorbed by the auditory information, focussing their attention on the sound. In the study by Kawase and Obata (
2016), observers’ visual attention was directed to the main melodic parts, hence shaped by sound. Accordingly, sonic information has revealed to be prevalent in emotion-related perceptual tasks (Shoda & Adachi,
2016; Vines et al.,
2011), specifically in identifying music with negative valence (Van Zijl & Luck,
2013). Our results also reinforce the idea that visual and auditory cues interact in complex ways depending on the context (Coutinho & Scherer,
2017; Li et al.,
2021; Vuoskoski et al.,
2014), highlighting the need to replicate such findings with an expanded repertoire. For example, in Li and Colleagues (
2021), visual condition was effective in communicating tense and relaxed timbres, but other timbres were heavily dependent on the performer. Accordingly, we believe that sensory dominance dynamically transfers depending on the musical style.
The influence of body movement in evaluation is further validated, in some cases, through oscillations in the hierarchies of motion degrees between sensory conditions. For example, in the Creston excerpt, D2 was rated first (musicians) and second (non-musicians) in A condition, drastically falling to fourth in AV (both groups). Furthermore, in the AV condition, the ratings of both groups increased as a function of the QoM. This reinforces that performances of positively valenced music are prejudiced when combined with constrained motion profiles and, hence, augmented when combined with exaggerated amounts of motion. However, in contrast to what we had initially predicted, the A condition received higher overall ratings than the AV and V, respectively. Considering that the AV condition provided multimodal input, it would be expected that it would be more appealing, considering previous views on the visual dimension as a driving force for audiences to attend live performances rather than consume recordings (Bergeron & Lopes,
2009; Cook,
2008; Platz & Kopiez,
2012). Yet, live performances also involve a dimension of human interaction which is not possible to account for in studies of this kind. The decrease in AV and V scores can possibly be related to the use of an avatar to represent human performers, which can translate into a less natural way of watching performances. Although it is true that some of the studies we refer to used regular video (e.g. Lange et al.,
2022; Nápoles et al.,
2022; Silveira,
2014; Tsay,
2013,
2014), it is well known that it does not allow for control of confounders such as physical appearance or dress style (e.g. Griffiths,
2008; Urbaniak & Mitchell,
2022). Hence, we followed the methodology of using kinematic, de-characterised displays (e.g. Davidson,
1993; Vinken & Heinen,
2022; Weiss et al.,
2018), to homogenise the set of stimuli. In the future, experiments using humanoid avatars or even
in loco experiments involving live performances (e.g. Coutinho & Scherer,
2017) would be desirable to pursue this result.
One limitation of our study relates to the fact that the musical excerpts used, retrieved from emblematic saxophone works, combine multiple components, including tempo and rhythm, tonality, or motific structure. Based on the associations between musical aspects and valence reported in previous research (Husain et al.,
2002; Scherer,
2005; Thompson et al.,
2001; Webster & Weir,
2005), we treated musical style as a global concept encompassing these various features. Consequently, it was not possible to assert if the effect of the music was due to the interaction of factors together or if certain factors had stronger isolated contributions. For example, Burnham and Colleagues (
2021) found that pitch direction mediated the ability of participants to identify major and minor modes, ultimately associated with positive and negative constructs. In this sense, it would be interesting to conduct follow-up research under a manipulation paradigm, thus controlling the stimuli, for instance, by presenting them at gradual levels of tempi or transposing the excerpts to other modes. Nevertheless, as previously stated (Battcock & Schutz,
2019,
2022), we highlight the importance of using renown repertoire in perceptual studies. First, it allows for a better understanding of the real, everyday concert experience, involving the exposition to complex music with its multiple layers acting cumulatively as planned by its composer. Second, using original repertoire allows for the preservation of the natural expressive behaviour of the performers and the creation of knowledge that is directly applicable to their instrumental practise.
On this basis, we conclude by emphasising that body behaviour has a strong impact on musical performance communication, particularly considering its interaction with musical style. Positively and negatively valenced musical excerpts result in distinct conceptions of optimal motion amongst observers. Furthermore, we validate that non-musician prioritise visuals over sound independently from musical context, whereas musicians turn to sound in contexts with increased complexity. We strongly motivate further research, including other musical excerpts, genres and instruments, to better define the optimal motion degrees according to music categories. The applications of our findings can be transposed into music performance pedagogy, allowing performers to adapt their motion style to the repertoire being performed. The resulting knowledge can also contribute to the development of applications for performance analysis and real-time motion monitoring for musicians and music students. More broadly, these findings provide insights towards the conception of concert models integrating multimodality as a means for enhancing listening experiences and promoting musical understanding, engagement, and emotionality.