Different parts of our brain code the perceptual features and actions related to an object, causing a binding problem: how does the brain discriminate the information of a particular event from the features of other events? Hommel (1998) suggested the event file concept: an episodic memory trace binding perceptual and motor information pertaining to an object. By adapting Hommel’s paradigm to emotional faces in a previous study (Coll & Grandjean, 2016), we demonstrated that emotion could take part in an event file with motor responses. We also postulate such binding to occur with emotional prosodies, due to an equal importance of automatic reactions to such events. However, contrary to static emotional expressions, prosodies develop through time and temporal dynamics may influence the integration of these stimuli. To investigate this effect, we developed three studies with task-relevant and -irrelevant emotional prosodies. Our results showed that emotion could interact with motor responses when it was task relevant. When it was task irrelevant, this integration was also observed, but only when participants were led to focus on the details of the voices, that is, in a loudness task. No such binding was observed when participants performed a location task, in which emotion could be ignored. These results indicate that emotional binding is not restricted to visual information but is a general phenomenon allowing organisms to integrate emotion and action in an efficient and adaptive way. We discuss the influence of temporal dynamics in the emotion–action binding and the implication of Hommel’s paradigm.