Movement sonification and the guidance hypothesis in perceptual-motor learning
Concurrent augmented feedback is perceptual feedback about a movement which is presented live, alongside and during motor performance. It has been used successfully to enhance acquisition and learning in a wide range of motor tasks (Sigrist, Rauter, Riener, & Wolf,
2013a). However, learners typically become dependent on augmented information and performance declines when it is withdrawn (Park, Shea, & Wright,
2000; Schmidt & Wulf,
1997; Schmidt,
1991; Sigrist, Rauter, Riener, & Wolf,
2013b; Vander Linden, Cauraugh, & Greene,
1993). The high level of performance seen in the presence of concurrent feedback rarely persists into no-feedback retention tests, which constitute a truer test of learning (Salmoni, Schmidt, & Walter,
1984). The explanation for this is that learners come to rely too heavily on the augmented information provided by concurrent feedback, and ignore task-intrinsic sources of sensory feedback, an effect known as the ‘guidance hypothesis’ (Adams,
1971). Once augmented feedback is removed, the learner must rely on comparatively unfamiliar sources of intrinsic feedback (e.g. proprioception) and performance declines as a result of impaired performance-monitoring ability (Anderson, Magill, Sekiya, & Ryan,
2005). Intrinsic sources of sensory feedback may be unattended when augmented feedback is available for two possible reasons. The feedback display may simply distract attention from otherwise available intrinsic information, or it may provide performance information which is much easier to use than intrinsic sources.
1
Emerging evidence suggests, however, that the guidance hypothesis is not a general principle of feedback as had previously been assumed (Danna et al.,
2015; Mononen, Viitasalo, Konttinen, & Era,
2003; van Vugt & Tillmann,
2015; for a review, see Dyer, Stapleton and Rodger,
2015). Experiments using concurrent feedback in the auditory modality have shown that speed of acquisition can be enhanced using sound without impairing performance on subsequent no-feedback retention tests (Kennedy, Boyle, & Shea,
2013; Ronsse et al.,
2011). Digitally transforming human movement data into sound (termed ‘sonification’ of movement) has long been practiced in the field of Sonic Arts as a method of musical expression (Hermann, Hunt, & Neuhoff,
2011; Medeiros & Wanderley,
2014). Recently Sonification of movement has emerged in the motor skill learning literature as a viable alternative to visual display for the presentation of concurrent augmented feedback, occasionally overcoming the limitations associated with feedback presented in the visual modality (Effenberg,
2005; Sigrist et al.,
2013a).
For example: Mononen, Viitasalo, Konttinen, and Era (
2003) sonified one-dimensional aiming error in rifle training by mapping positional error of the gun barrel to sonic pitch. Their participants, therefore, had access to an additional layer of performance-relevant information through sound and performance was improved as a result. Unlike concurrent feedback experiments in the visual modality, no decline in performance was observed following the removal of augmented feedback. The enhancement effect of feedback was maintained on no-feedback retention tests, even several days later.
Ronsse et al. (
2011) tell a similar story and provide a rare example of visual and auditory concurrent augmented feedback contrasted on the same experimental task (90° out-of-phase bimanual flexion/extension). Concurrent visual feedback was provided in the form of a Lissajous figure (which draws a circle from perfect performance of a 90° phase relationship) and auditory feedback via Sonification of changes in wrist direction, which results in a ‘galloping rhythm’ when movements are performed accurately. They found that although visual feedback allowed learners to reach optimal performance more quickly than auditory feedback, this high level of performance was maintained only by the auditory group in no-feedback retention. A typical guidance effect was found following the removal of visual feedback, but not auditory feedback. Heitger et al. (
2012) replicated the behavioural findings of Ronsse et al. using the same bimanual task.
These findings represent a slight challenge to traditional interpretations of the guidance effect, which assume that feedback presented 100 % of the time during acquisition will lead to decline when it is withdrawn because intrinsic proprioceptive feedback has been attentionally neglected (Anderson et al.,
2005; Sigrist et al.,
2013a). However, these results make a lot of sense from a broad ecological perspective. A possible explanation for the apparent advantage of sonification will be elaborated in the following sections.
An ecological perspective on the guidance effect in bimanual tasks
If we consider motor control and learning to be a purely perception–action phenomenon (Fowler & Turvey,
1978; Gibson,
1969), the difference between visual concurrent feedback and sonification becomes more clear. The perceptual information about performance available to a learner during acquisition of a novel motor skill has broad implications for performance and retention. From an ecological perspective, attaining a skilful or accomplished level of performance in a given task is characterised by perceptual refinement (Michaels & Carello,
1981), wherein an individual gradually tunes into (and acts to produce) perceptual information within a range which specifies good motor performance. Concurrent feedback enhances motor performance by making such task-relevant perceptual information more salient or accessible (Wilson, Collins, & Bingham,
2005). The challenge for a learner is to learn how to use this information in the context of the task goals.
Bimanual coordination tasks are an ideal vehicle to probe these processes, as level of task difficulty is clearly defined in terms of either phase relationship (Kelso, Scholz, & Schoner,
1986) or polyrhythmic timing ratio (Summers, Rosenbaum, Burns, & Ford,
1993). In bimanual coordination tasks, the perceptual information associated with good performance (i.e. phase relationship or polyrhythmic ratio) is not clearly specified through intrinsic feedback alone, making these tasks extremely difficult to learn without concurrent feedback to make the information more available—typically via a visual Lissajous plot (Kovacs, Buchanan, & Shea,
2009; Kovacs & Shea,
2011; Wang, Kennedy, Boyle, & Shea,
2013). The effects of concurrent feedback on bimanual coordination tasks are, therefore, very strong (Kovacs, Buchanan, & Shea,
2010).
Motor learning in bimanual coordination tasks is clearly perceptually based
2 (Franz, Zelaznik, Swinnen, & Walter,
2001; Mechsner, Kerzel, Knoblich, & Prinz,
2001; Wilson, Snapp-Childs, Coats, & Bingham,
2010). Bimanual coordination performance is so difficult to perceive intrinsically that learner attention is occupied entirely by controlling the feedback display; this is by far the most valuable information that the environment offers in the context of the task—and guidance effects are the norm (Kovacs et al.,
2009; Kovacs & Shea,
2011). In this situation, the learner does not actually learn to produce the bimanual task; he/she learns how to manipulate the Lissajous display. This is demonstrated by Kovacs et al. (
2010) who found that removing vision of the limbs allowed participants to very quickly learn to produce a 5:3 bimanual ratio—a feat previously thought to be impossible without extensive practice. Removing vision of the limbs may have helped because it streamlined/refined the perception–action loop to a single stream: perception of the dot’s movement and control over that action. As far as the learner was concerned, removing vision of the limbs relegated them to a plane of total non-existence, as the brain effectively adopted direct control over the movement of the dot (Swinnen & Wenderoth,
2004). It is very difficult to perceive useful information about bimanual coordination from the limbs themselves, and in fact any such information may actually conflict with the Lissajous information, as argued by Kovacs et al.
The guidance effect then comes as no surprise. In the case of visual feedback, the display
is the task. This fact is not of great concern if one’s goal is to push the limits of perceptual control of action (Kovacs et al.,
2010), but it is a real problem if the aim is to produce learning which transfers outside the lab. If the only way (or, the most effective way) for the learner to perceive their performance is through an augmented feedback display, then he/she will not be able to perform the task in its absence. In the next section, movement sonification will be examined from the same perspective.
Noisy events, perceptual unification and sonification
Sonification is (or rather, can be) more than just another method for abstract display of symbolic movement data (Roddy & Furlong,
2014). There are distinct perceptual and phenomenological qualities of sound perception which may make it a more appropriate modality for meaningful concurrent feedback than a visual display (Dyer et al.,
2015). These qualities can explain sonification’s potential immunity to the guidance effect.
Sound is intrinsically linked to movement (Leman,
2008; Repp,
1993; Sievers, Polansky, Casey, & Wheatley,
2013). In everyday life, sounds automatically become part of multimodal event perception (Gaver,
1993). Thanks to our extensive interactive experience with a noisy environment, we can perceive a surprising amount of action-relevant information from an auditory event (Giordano & McAdams,
2006; Houben, Kohlrausch, & Hermes,
2004; van Dinther & Patterson,
2006; Young, Rodger, & Craig,
2013). In the case of sounds produced by action, fMRI studies during passive listening have recorded neural activations similar to those observed during previous action performance (Kohler et al.,
2002; Lahav, Saltzman, & Schlaug,
2007). Behavioural effects are especially strong for extensively practiced noisy actions, for example instrumental performance (Taylor & Witt,
2015). Additionally, specific actions can even be identified from their sonified velocity profile alone (Vinken et al.,
2013). Summarised, sound and movement are ecologically coupled. Sound is inherently meaningful to the moving individual, and if it were employed as concurrent augmented feedback in a motor skill learning study, the link between participant movement and feedback could potentially be much tighter, and feedback less of an abstraction. In other words, sound as feedback is more coupled to fundamental task kinematics than a visual display. The use of sound can perhaps more explicitly include the body in the perception–action loop.
As shown by Ronsse et al. (
2011) and Kennedy, Boyle and Shea (
2013), auditory models/demonstrations of bimanual task performance along with sonification as feedback are effective for training complex coordination tasks. Making perceptual information about bimanual task performance more salient or perceivable leads to reduced variability in associated action, as shown by Wilson, Collins, and Bingham (
2005). This seems to be a general perceptual effect which also applies to sound information and unimanual tasks. van Vugt and Tillmann (
2015) found that accurate sonic feedback improved tapping accuracy in a learned motor task to a greater degree than jittered feedback. Interestingly, improved performance in the sonification group persisted into no-feedback retention and transfer tests. The temporal resolution of the auditory system is known to be much finer than that of the somatosensory system (Hirsh & Watson,
1996; Tinazzi et al.,
2002), so one would expect more accurate temporal perception of any event paired with sound. Following an ecological approach to motor learning (Gibson,
1969), and assuming that perception never happens in isolation from action, it stands to reason that enhanced perceptual acuity for action’s consequences (i.e. feedback) will necessarily result in better control of action.
Ronsse et al. show that, although slightly slower, sonification is as effective for teaching a novel coordination pattern as the more commonly used Lissajous figure. Lissajous feedback works through perceptual unification, a transformation wherein a difficult bimanual task is consolidated and abstracted to create a new, more coherent and unitary percept (for the effect of perceptual unification on other bimanual tasks without Lissajous feedback, see Franz et al.,
2001; Mechsner et al.,
2001). Unification makes relevant perceptual information about the higher-order variable of relative phase/timing ratio more available, which allows effective and stable action production. We argue that a demonstration through sound functionally does the same thing; it consolidates a dual-task into a rhythm, which can be perceived and reproduced as a single action.
The potential advantage of sonification over Lissajous as
concurrent feedback lies in the degree of abstraction, or transformation. As argued earlier, and presupposing good sound design,
3 Sonification of bimanual coordination does not entail the same degree of transformation as does feedback displayed as a Lissajous figure, the Gestalt form of which differs substantially from the underlying kinematics of bimanual coordination. By contrast, sonification is layered on top of and can be used to emphasise relevant task kinematics. This can allow direct perception of phase relationship or timing ratio without subsuming the main motor task, as recommended by Wilson et al. (
2010). Information about the higher-order relationship between the hands is present in task-intrinsic proprioceptive feedback; we should be able to use sound to train participants to perceive it directly—eliminating the guidance effect of concurrent feedback.
Our aim in this paper is twofold. First, we aim to further scientific understanding of the guidance effect of concurrent feedback, specifically how it relates to sonification. Second, we aim to separate the effects of perceptual unification from feedback to test whether unification of the task goals (through adding sound to the demonstration) is sufficient to enhance learning, or whether there is a distinct advantage of sonification as concurrent feedback. At this point, it is not yet clear whether the effects of sound on learning in Kennedy et al. (
2013) are due to either perceptual unification through a sonic demonstration, or concurrent movement sonification. Performance in bimanual coordination is improved by perceptual unification alone (Franz & McCormick,
2010; Franz et al.,
2001), and it will be important to establish this difference going forward. After all, one need not provide online Sonification of movement during practice at all if performance can be enhanced to the same degree using a pre-recorded, sonified demonstration.
To this end we have designed a novel bimanual shape-tracing apparatus to teach participants to produce a 4:3 rhythmic coordination pattern, a task previously shown to be difficult to learn (Summers et al.,
1993).
We hypothesise that the use of sonification as auditory feedback will not lead to a guidance effect relative to no-sound control. Like Lissajous feedback, sonification represents a method to perceptually unify a bimanual task; however, it does not rely on a transformation and abstraction of the fundamental task kinematics. For this reason, we expect both enhanced performance of the sonification group during practice, and maintenance of this enhanced performance into retention-without-feedback.
We additionally hypothesise that performance in the condition in which the demonstration alone is sonified (hereafter referred to as the ‘sound-demo condition’) will benefit from the use of sound to perceptually unify the task demands, which will manifest as enhanced performance during practice and into retention relative to no-sound control.
We will also compare between the sound-demo alone and sonification as concurrent feedback. Both conditions perceptually unify the task demands, however, live sonification may confer a relative advantage in the acquisition stage by enhancing online temporal perception of performance. Improved perceptual acuity through sound should, in general, manifest as better performance (Fowler & Turvey,
1978), and we expect to see as much in this task, good performance in which is based at least partly on fine temporal control.