Elsevier

Brain and Cognition

Volume 69, Issue 1, February 2009, Pages 108-115
Brain and Cognition

Lateralization of visuospatial attention across face regions varies with emotional prosody

https://doi.org/10.1016/j.bandc.2008.06.002Get rights and content

Abstract

It is well-established that linguistic processing is primarily a left-hemisphere activity, while emotional prosody processing is lateralized to the right hemisphere. Does attention, directed at different regions of the talker’s face, reflect this pattern of lateralization? We investigated visuospatial attention across a talker’s face with a dual-task paradigm, using dot detection and language comprehension measures. A static image of a talker was shown while participants listened to speeches spoken in two prosodic formats, emotional or neutral. A single dot was superimposed on the speaker’s face in one of 4 facial regions on half of the trials. Dot detection effects depended on emotion condition—in the neutral condition, discriminability was greater for the right-, than for the left-, side of the face image, and at the mouth, compared to the eye region. The opposite effects occurred in the emotional prosody condition. The results support a model wherein visuospatial attention used during language comprehension is directed by the left hemisphere given neutral emotional prosody, and by the right hemisphere given primarily negative emotional prosodic cues.

Introduction

Despite nearly 60 years of research on how human attention functions when processing electronic displays of text and graphic information or when understanding auditory speech, comparatively little is known about human attention as it applied to auditory-visual speech comprehension. The purpose of the present study is to investigate how a visuospatial attention mechanism that is used to process a talker’s facial information is affected by the nature of prosodic information heard during a language comprehension task.

In face-to-face communication, information arrives to our language understanding systems from the talker’s mouth, in the form of auditory linguistic and prosodic information, and from the face, in the form of visual articulatory cues (visible speech), facial expressions, eye gazes, and head movements. This smorgasbord of information available to our mind and senses is modulated by an attention system that can be flexibly adapted to suit the circumstances of the particular listening and viewing situation. Further, the amount of influence of visible speech is dependent on contextual and perceiver characteristics (e.g., Jordan and Sergeant, 2000, Sekiyama and Tohkura, 1993). Several studies conducted in our laboratory have yielded adult-age effects of visible speech influence during auditory-visual language processing. Compared to younger adults, older adults are usually found to be more reliant on visible speech (Thompson, 1995, Thompson and Malloy, 2004); although, the age effect reverses during extremely attention-demanding task situations, such as during a shadowing task, when younger adults’ shadowing performance is improved by the presence of visible speech while older adults’ performance is not (Thompson & Guzman, 1999).

Two hypotheses for older adults’ greater reliance on visible speech were tested (Thompson & Malloy, 2004). One hypothesis was that older adults have built up more finely tuned visible speech representations that are used during auditory-visible speech processing. This hypothesis was not supported. The hypothesis that was supported was that older adults focus more of their limited attention resources away from the eyes and toward the mouth area of the face during active speech processing. Greater focus on the mouth region compared to the eye region did not occur for either age group during silent video presentations of the same speech stimuli, suggesting that attention is not automatically directed to the mouth area as a consequence of movement in that spatial region. These findings suggest that the listener is choosing not to direct attention towards the mouth region in the absence of audible speech, even though the motion cues are present and available for processing in silent video conditions.

Attention has been long-argued to be both a data-driven process (a sequence of increasingly sophisticated analyses that begin with the incoming sensory data) and a conceptually-driven process (highest-level expectations of experiences further refined by analyses based upon a particular context) (Norman, 1976). One way to understand how the attention system functions during auditory-visual language understanding is to presume that it works simultaneously in a data-driven and a conceptually-driven manner. In a conceptually-driven manner, attention could be directed towards regions of the face that normally provide the maximal amount of sensory information to supplement the comprehension process. If attention is strategically directed towards regions of the talker’s face using an attention mechanism that responds to the signal strength of the talker’s facial display, attention should be greatest at the side of the face presenting the greatest amount of information.

There is evidence supporting this prediction. In human adults, greater movement (Wylie & Goodale, 1988) and amplitude of motion (Holowka & Pettito, 2002), and in nonhuman primates, earlier onset in the timing of movement (Hauser & Akre, 2001) appears on the left side of the face (viewer’s right) during the production of emotional facial expressions. Importantly, the left side of photographic face images is perceived to be more expressive of facial emotion (Nicholls, Wolfgang, Clode, & Lindell, 2002). Research also shows a morphological asymmetry between the right and left sides of the mouth during the production of speech, such that the right side of the face, controlled by the adult talker’s language (left) hemisphere, displays greater amplitude and velocity of motion than does the left side (Graves et al., 1982, Wolf and Goodale, 1987). Human infants also show a right-side-wider effect when babbling (Holowka & Pettito, 2002). These researchers argue that this asymmetrical pattern of facial displays reflects an underlying asymmetry in activation of the two cerebral hemispheres, specifically, primary activation of the left hemisphere for linguistic productions and the right hemisphere for emotional expressions.

One study investigated visuospatial attention during auditory-visual language understanding using syllables as stimuli. Nicholls, Searle, and Bradshaw (2004) investigated asymmetries in the perception of auditory-visual speech with the McGurk effect (McGurk & MacDonald, 1976), an illusion which is produced when incongruent speech and visible speech are presented for syllable classification (“ga” mouth movements combined with an auditory “ba” results in the report of “da”). They covered up the right or left sides of the talker’s face, and found that McGurk-type errors were lower when the side of the face with the most movement was covered (the right side), suggesting that the perceiver’s attention is normally toward the left hemifield. Perhaps even more interesting is the fact that when the faces were mirror reversed, there were fewer McGurk-type errors when neither side of the face was covered. This suggests that participants were directing their attention to the side of the face that they expected would convey the most information, i.e., the right side of the talker’s face, which was actually the left side when mirror reversed. This finding implies that it is not the presence or absence of the facial cues that always elicits attention in a bottom-up fashion; instead, the expectation of the cues directs attention, at least in some circumstances, supporting a top-down or conceptually-driven model of attention.

While the Nicholls et al. (2004) results support our assumption of the conceptually-driven nature of visuospatial attention during auditory-visual speech understanding (Thompson & Malloy, 2004), research conducted in our laboratory shows the opposite effect for the region of the face attended to—specifically, our research shows a visuospatial attention bias towards the talker’s left side (right-side-of-face image) during visual-spoken language understanding (Thompson and Malloy, 2004, Thompson et al., 2004). Using a novel methodology, we previously investigated the distribution of visuospatial attention across a talker’s face during the comprehension of lengthy speech passages (Thompson et al., 2004). Dots were superimposed on the talker’s face for 16 ms. Participants completed a comprehension task (the primary task), and the secondary task of dot detection. Dot detection performance for different areas of the face (right, left, above eyes, below mouth) indicated the degree of visuospatial attention on those areas. Our results showed greater attention was distributed on the right side of the talking face image, compared to the left side. Smeele, Massaro, Cohen, and Sittig (1998) also found a right visual field advantage in visible speech syllable and nonspeech (nonemotional) mouth movement identification, indicating primary involvement of the left hemisphere during the encoding of both types of mouth movements.

The discrepant right- versus left-side-of-face-image attention results could be partially explained by the nature of the tasks. Syllable classification of visual-spoken utterances containing often incongruent auditory and visual cues (Nicholls et al., 2004) is quite different from our dual-task paradigm using lengthy speech passages of congruent auditory-visual stimuli. In syllable classification tasks, the perceiver may employ an atypical strategy of concentrating visuospatial attention on the most informative facial region for making a perceptual judgment, while thinking rather minimalist linguistic thoughts. In contrast, the dual-task comprehension paradigm is more similar to natural language understanding wherein all levels of linguistic and semantic analysis occur, albeit with the added secondary task of detecting a dot on someone’s face. Thus, in our methodological paradigm the language hemisphere should be highly activated and the visuospatial attention system highly engaged. Regardless, results from our laboratory using lengthy speech stimuli in a comprehension task do not support the notion that a visuospatial attention mechanism is conceptually-driven to focus on the side of the face with the most informative visible speech cues, while the Nicholls et al. (2004) results using McGurk-type stimuli and a syllable identification task do support it.

In addition to the right-side-of-face-image effect, we found that greater attention was paid to the mouth than the eye area (Thompson and Malloy, 2004, Thompson et al., 2004). We also showed a reversal of this effect, greater attention to the eye area than mouth area, based on differences in the emotional expressiveness of the talker (Thompson et al., 2004). Perceivers might devote more attention to the eye area in attempt to discern emotion cues from the face (Ekman, 1997, Kimble et al., 1981).

How can these differences in attention biases for different facial regions that occur during auditory-visual discourse processing be accounted for with a model of the distribution of attention during varying conditions of emotion interpretation?

There are many factors to take into consideration. To begin, it is customarily assumed that both hemispheres of the brain are actively engaged in processing information at all times. However, there exists a division of labor between the two coordinated brain hemispheres. One way in which a division of labor functions is in dividing the visual input from the left and right visual fields for initial processing of the stimulus. In their investigation of visuospatial attention, Reuter-Lorenz, Kinsbourne, and Moscovitch (1990) used a line bisection task paradigm in which participants viewed a tachistoscopically-presented horizontal line and judged the location of a vertical intersect. Results showed attention was biased in the direction contralateral to the activated brain hemisphere, and referred to this general attention asymmetry as the attention-orienting hypothesis. The rightward- and leftward-attentional biases were also found not to depend on task relevance, suggesting a data-driven or automatic component to lateralized asymmetry of attention. Moreover, leftward and rightward attention asymmetries were found not to be a result of hemispatial position of the stimulus, in other words, orienting to an absolute region of space, but rather, results supported the notion of orienting in a direction of space. Thus far, the attention-orienting hypothesis has not been tested in the context of visuospatial attention during auditory-visual discourse processing. The results from Nicholls et al. (2004) in the non-mirror-reversal condition are consistent with the activation-orienting hypothesis, assuming greater dynamic motion produces a greater level of activation in the corresponding hemisphere, and, that attention is directed contralateral to the more highly activated hemisphere. However, the leftward bias produced even in the mirror-reversal condition highlights the fact that attention is modulated by a mechanism that is not purely data-driven.

Since the task in the present study incorporates a static face stimulus in conjunction with auditory speech, some consideration must be given to attention biases that might occur as a result of face processing. Many studies show a left visual field (right hemisphere) advantage for perceptual processing of static images of faces using tasks that do not involve language processing, but do involve emotion processing or making attractiveness judgments (e.g., Burt and Perrett, 1997, Klein et al., 1976, Moscovitch et al., 1976). Topographic mapping of the brain shows that the right hemisphere activates when interpreting facial expressions of emotions (Gunning-Dixon et al., 2003, Levy et al., 1990, Mikhailova et al., 1996). Finally, Prodan, Orbelo, Testa, and Ross (2001) found that the right hemisphere was more highly activated when processing the upper region of the face while the left hemisphere was more highly activated while processing the lower region of the face. One explanation for this finding is that greater activation of the right hemisphere causes greater attention toward the pickup of emotion cues from the eye region. Likewise, differential activation of the left hemisphere could imply greater attention to the mouth for visible speech encoding.

Yet, since the task in the present study is primarily a language comprehension task, greater consideration should be given to patterns of attention that should be produced given varying types of auditory language input. Research going back to the 1960s using dichotic listening tasks has demonstrated a consistent right ear advantage (REA) in reporting words using both auditory-alone (e.g., Broadbent and Gregory, 1964, Hiscock et al., 1999), and auditory-visual (Thompson et al., 2007, Thompson and Guzman, 1999) speech stimuli, reflecting left hemisphere specialization for linguistic processing. Grimshaw, Kwasny, Covell, and Johnson (2003) recently reported evidence in a dichotic listening paradigm favoring a direct access interpretation; that is, each hemisphere was found capable of linguistic processing, however, the left hemisphere processed the auditory stimuli faster and more accurately. Topographic brain mapping studies have also shown primary activation in the left hemisphere during silent lipreading (Calvert et al., 1997) and visible speech processing (Campbell, De Gelder, & De Haan, 1996).

When auditory speech understanding includes a greater emphasis on the interpretation of emotional prosody however, a different pattern of hemispheric activation emerges. While prosody can convey both linguistic and emotional information, prosody is referred to here as the intonational pattern of the voice that has emotive connotations for the listener. Dichotic listening tasks in normal participants almost universally show a left ear advantage (LEA) for comprehension of emotional prosody (e.g., Bryden and MacRae, 1989, Erhan et al., 1998, Grimshaw, 1998, Grimshaw et al., 2003, Stirling et al., 2000). Brain imaging and event-related brain potential (ERP) studies find activation for emotional prosody primarily in right-hemisphere brain regions (Buchanan et al., 2000, George et al., 1996, Erhan et al., 1998, Shapiro and Danly, 1985, Shipley-Brown et al., 1988). Recent functional-anatomic findings from stroke patients support the claim that emotional prosody is a dominant and lateralized right-hemisphere function (Ross & Monnot, 2008). Finally, when the same spoken messages are presented in different emotional tones of voice, a REA is observed when participants attend to the linguistic content, but a LEA is observed when attending to emotional prosody (Ley & Bryden, 1982). These studies all support right hemisphere specialization for emotional prosody processing, and further, the Ley and Bryden (1982) study affirms that the processing of emotional prosody contains a conceptually-driven component.

In summary, the evidence overwhelmingly supports a model of attention orienting during face-to-face discourse understanding situations that presumes that attention is, at least in part, a conceptually-driven process (Ley and Bryden, 1982, Nicholls et al., 2004, Thompson and Malloy, 2004). Taking it one step further, a model of visuospatial attention during auditory-speech discourse processing predicts that attention is directed to different regions of the talker’s face, depending on the extent to which the listener is actively engaged in emotion processing. Specifically, absent the goal of attending to the emotional content of the message, the left hemisphere should be more highly engaged for language processing than the right hemisphere. Consequently, owing to the combined findings that the left hemisphere is primarily responsible for linguistic processing (Grimshaw et al., 2003) and spatial attention is oriented to the contralateral hemisphere (Reuter-Lorenz et al., 1990), the right side of the face image should receive greater focus of attention than the left side. Furthermore, greater attention to the mouth than the eye region should also occur with speech stimuli spoken with neutral prosody (Prodan et al., 2001) even when that region of the face is devoid of visible speech cues, due to the habit of looking at that region and expecting to find relevant information to encode. However, the relative degree of activation of the right hemisphere should increase in the presence of emotional content presented in the face and voice (e.g., Ross & Monnot, 2008). It follows that, in a condition with a high degree of emotional prosody, attention should be directed to the region of the face contralateral to the right hemisphere, the left side of the face image. Because the eye region is particularly rich with cues to the emotional content of the speaker’s meaning (Ekman, 1997), and is associated with right hemisphere activation (Prodan et al., 2001), attention should also be more directed to this region than to the mouth region of the face when speech contains a generous degree of emotional prosody.

In the present study, we explore the question of where is visuospatial attention directed on the speaker’s face when the emotional prosody conveyed in the voice is either high or low. To this end, we conducted a dual-task experiment, where participants listened to speeches as their primary task, and responded to the presence or absence of a dot presented in one of four pre-cued locations on the talker’s face. To rule out attention capture to eye or mouth regions because of motion cues, a static image of the talker was presented. Unlike our earlier experiments, head and eye movements were controlled in this experiment through the use of a chin rest and 8 ms dot presentations, in order to more precisely determine the pattern of attention distribution across regions of the talker’s face. A signal detection theory (SDT) paradigm was also used in assessing visuospatial attention to tease apart sensitivity and biasing effects (Green & Swets, 1966). Consistent with the requirements of SDT, attention was pre-cued to one of four regions on the face. This design parameter ensured that there was ambiguity only in the presence versus absence of the signal, and not also in terms of the location of the signal. The semantic and emotional content of speeches was controlled by presenting the same speeches in both emotion and neutral expression prosodic format to different participants. Two-thirds of the speeches were presented under noisy conditions with the intent of motivating the perceiver to devote a large amount of attention resources to the primary task of language comprehension. Comprehension was assessed with multiple choice questions presented as text and dispersed throughout eight segments within each speech.

Support for the lateralization of visuospatial attention model would be seen: (a) in greater dot detection discriminability (higher D primes) for the right side of the face image than for the left side, and for the mouth region than the eye region, in the neutral prosodic expression condition, and (b) in greater dot detection discriminability for the left side of the face image than for the right side, and for the eye region than the mouth region, in the emotional prosodic expression condition.

Section snippets

Participants

Participants were 14 adults (M age = 23.5 years, range 19–29; 6 male, 8 female), recruited from introductory psychology courses. All participants were right-handed, as determined using the Edinburgh Handedness Inventory.

Stimuli and apparatus

Stimuli included six videotaped speeches presented by one 22-year-old female of Hispanic ethnicity. Speech content was edited from published versions to an approximate length of fifteen minutes. Topics included “College women and body image” (Dobkin & Sippy, 2002); “Take back the

Dot detection analyses

Hits and false alarms were converted to D prime (d′) scores and were submitted to a repeated measures Analysis of Variance (ANOVA) using Emotion (2 levels) and Dot Position (4 levels) as within-subjects variables. In the initial ANOVA there was a significant Emotion × Dot Position interaction (F(3, 39) = 9.20, MSE = 2.04, p < .0001), consequently two ANOVAs were run on d′ data in the Neutral and Prosodic Expression conditions separately to clarify the pattern of results in these two conditions. The

Discussion

Using a modification of the dot detection paradigm (Thompson et al., 2004), we investigated differential patterns of visuospatial attention occurring during the comprehension of discourse presented with neutral and emotional prosodic expression. Our findings were: (a) for neutral prosodic expression conditions, visuospatial attention performance was better for the right side of the face image than for the left side, and for the mouth, compared to the eye area; (b) conversely, for emotional

Acknowledgments

The research was funded by a NIH NIGMS Grant (#3S06 GM008136) and the NIH MBRS RISE Grant Program (GM61222).

References (57)

  • D. Klein et al.

    Attentional mechanisms and perceptual asymmetries in tachistoscopic recognition of words and faces

    Neuropsychologia

    (1976)
  • J. Levy et al.

    The previous visual field: Effects of lateralization and response accuracy on current performance

    Neuropsychology

    (1990)
  • R.G. Ley et al.

    A dissociation of right and left hemispheric effects for recognizing emotional tone and verbal content

    Brain and Cognition

    (1982)
  • M.E.R. Nicholls et al.

    The effect of left and right poses on the expression of facial emotion

    Neuropsychologia

    (2002)
  • P.A. Reuter-Lorenz et al.

    Hemispheric control of spatial attention

    Brain and Cognition

    (1990)
  • E.D. Ross et al.

    Neurology of affective prosody and its functional-anatomic organization in the right hemisphere

    Brain and Language

    (2008)
  • K. Sekiyama et al.

    Inter-language differences in the influence of visual cues in speech perception

    Journal of Phonetics

    (1993)
  • B.E. Shapiro et al.

    The role of the right hemisphere in the control of speech prosody in prepositional and affective contexts

    Brain and Language

    (1985)
  • F. Shipley-Brown et al.

    Hemispheric processing of affective and linguistic intonation contours in normal subjects

    Brain and Language

    (1988)
  • M.E. Wolf et al.

    Oral asymmetries during verbal and non-verbal movements of the mouth

    Neuropsychologia

    (1987)
  • D.R. Wylie et al.

    Left-sided oral asymmetries in spontaneous but not posed smiles

    Neuropsychologia

    (1988)
  • D.E. Broadbent et al.

    Accuracy of recognition for speech presentation to the right and left ears

    Quarterly Journal of Experimental Psychology

    (1964)
  • M.P. Bryden et al.

    Dichotic laterality effects obtained with emotional words

    Neuropsychiatry, Neuropsychology, and Behavioral Neurology

    (1989)
  • G.A. Calvert et al.

    Activation of auditory cortex during silent lipreading

    Science

    (1997)
  • T. Canli et al.

    Hemispheric asymmetry for emotional stimuli detected with fMRI

    Neuroreport: An International Journal for the Rapid Communication of Research in Neuroscience

    (1998)
  • Clinton, H. (1995). Women’s rights are human rights. Retrieved January 23, 2003. Available from:...
  • R.J. Davidson

    Emotion and affective style: Hemispheric substrates

    Psychological Science

    (1992)
  • Dobkin, R., Sippy, S. (2002). The college women’s handbook. Retrieved January 23, 2003. Available from:...
  • Cited by (7)

    • Neuropsychological correlates of evocative multimodal speech: The combined roles of fearful prosody, visuospatial attention, cortisol response, and anxiety

      2022, Behavioural Brain Research
      Citation Excerpt :

      Even though there was no actual visible speech information that could aid processing, dot detection discriminability (d') scores were highest, and β scores were lowest, in all conditions when listeners were cued to focus on the mouth region. These findings replicate previous findings, which we contend cannot be interpreted assuming a data-driven model of human visuospatial attention during multimodal language processing [23], and is echoed in this claim from TCE researchers, “the brain does not simply react to incoming sensory inputs from the world (or from the body)” [2]. Rather, the brain is constantly building a mental model to use in interpreting incoming signals.

    • Auditory cognition and human performance: Research and applications

      2016, Auditory Cognition and Human Performance: Research and Applications
    View all citing articles on Scopus
    View full text