Article

Audio-visual cues distinguishing self- from system-directed speech in younger and older adults

Authors:
Rebecca Lunsford

Oregon Health & Science University, Beaverton, OR

Oregon Health & Science University, Beaverton, OR
View Profile

,
Sharon Oviatt

Oregon Health & Science University, Beaverton, OR

Oregon Health & Science University, Beaverton, OR
View Profile

,
Rachel Coulston

Oregon Health & Science University, Beaverton, OR

Oregon Health & Science University, Beaverton, OR
View Profile

ICMI '05: Proceedings of the 7th international conference on Multimodal interfacesOctober 2005Pages 167–174https://doi.org/10.1145/1088463.1088494

Published:04 October 2005Publication History

ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces

Pages 167–174

ABSTRACT

In spite of interest in developing robust open-microphone engagement techniques for mobile use and natural field contexts, there currently are no reliable techniques available. One problem is the lack of empirically-grounded models as guidance for distinguishing how users' audio-visual activity actually differs systematically when addressing a computer versus human partner. In particular, existing techniques have not been designed to handle high levels of user self talk as a source of "noise," and they typically assume that a user is addressing the system only when facing it while speaking. In the present research, data were collected during two related studies in which adults aged 18-89 interacted multimodally using speech and pen with a simulated map system. Results revealed that people engaged in self talk prior to addressing the system over 30% of the time, with no decrease in younger adults' rate of self talk compared with elders. Speakers' amplitude was lower during 96% of their self talk, with a substantial 26 dBr amplitude separation observed between self- and system-directed speech. The magnitude of speaker's amplitude separation ranged from approximately 10-60 dBr and diminished with age, with 79% of the variance predictable simply by knowing a person's age. In contrast to the clear differentiation of intended addressee revealed by amplitude separation, gaze at the system was not a reliable indicator of speech directed to the system, with users looking at the system over 98% of the time during both self- and system-directed speech. Results of this research have implications for the design of more effective open-microphone engagement for mobile and pervasive systems.

References

Bakx, I., K.v. Turnhout, & J. Terken. Facial orientation during multi-party interaction with information kiosks. Proceedings of the Interact Conference, 2003, Zurich, Switzerland: IOS Press, 701--704.Google Scholar
Berk, L.E. Why children talk to themselves. Scientific American, 1994, 271(5), 78--83.Google ScholarCross Ref
Boersma, P. & D. Weenik, Praat: doing phonetics by computer (Version 4.2). 2005. (URL: www.praat.org)Google Scholar
Buxton, W. Integrating the periphery and context: A new taxonomy of telematics. Proceedings of the Graphics Interface Conference, 1995, Quebec City, Quebec: Morgan Kaufman, 239--246.Google Scholar
Comblain, A. Working memory in Down's Syndrome: Training the rehearsal strategy. Down's Syndrome: Research and Practice, 1994, 2(3), 123--126.Google ScholarCross Ref
Czaja, S.J. & C.C. Lee., Designing computer systems for older adults. Handbook of Human-Computer Interaction, J. Jacko & A. Sears, eds: LEA, NY. 2002, 413--427. Google ScholarDigital Library
Duncan, R.M. & J.A. Cheyne. Private speech in young adults: Task difficulty, self-regulation, and psychological predication. Cognitive Development, 2002, 16, 889--906.Google Scholar
Katzenmaier, M., R. Steifelhagen, & T. Schultz. Identifying the addressee in human-human-robot interactions based on head pose and speech. Proceedings of the International Conference on Multimodal Interfaces, 2004, State College, PA: ACM Press, 144--151. Google ScholarDigital Library
Luria, A.R. The Role of Speech in the Regulation of Normal and Abnormal Behavior, 1961, Liveright, NY.Google Scholar
Meichenbaum, D. & J. Goodman. Reflection-impulsivity and verbal control of motor behavior. Child Development, 1969, 40, 785--797.Google Scholar
Messer, S.B. Reflection-impulsivity: A review. Psychological Bulletin, 1976, 83(6), 1026--1052.Google ScholarCross Ref
Neti, C., G. Iyengar, G. Potamianos, A. Senior, & B. Maison. Perceptual interfaces for information interaction: Joint processing of audio and visual information for human-computer interaction. Proceedings of the International Conference on Spoken Language Processing, 2000, Beijing: Chinese Friendship Publishers, 11--14.Google Scholar
Oppermann, D., F. Schiel, S. Steininger, & N. Berger. Off-talk - a problem for human-machine-interaction? Proceedings of the EuroSpeech Conference, 2001, Aalborg, Denmark: ISCA Secretariat, 2197--2200.Google Scholar
Oviatt, S.L., P.R. Cohen, & M.Q. Wang. Toward interface design for human language technology: Modality and structure as determinants of linguistic complexity. Speech Communication, 1994, 15, 283--300. Google ScholarDigital Library
Oviatt, S.L., R. Coulston, S. Tomko, B. Xiao, R. Lunsford, M. Wesson, & L. Carmichael. Toward a theory of organized multimodal integration patterns during human-computer interaction. Proceedings of the International Conference on Multimodal Interfaces, 2003, Vancouver, BC: ACM Press, 44--51. Google ScholarDigital Library
Paek, T., E. Horvitz, & E. Ringger. Continuous listening for unconstrained spoken dialog. Proceedings of the ICSLP, 2000, Beijing, China: 138--141.Google Scholar
Svirsky, M.A., H. Lane, J.S. Perkell, & J. Wozniak. Effects of short-term auditory deprivation on speech production in adult cochlear implant users. Journal of the Acoustic Society of America, 1992, 3, 1284--1300.Google ScholarCross Ref
Wilpon, J. & C. Jacobsen. A study of speech recognition for children and the elderly. Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 1996, Atlanta, GA: IEEE Press, 349--352. Google ScholarDigital Library
Winsler, A. & J. Naglieri. Overt and covert verbal problem-solving strategies: Developmental trends in use, awareness, and relations with task performance in children aged 5 to 17. Child Development, 2003, 74(3), 659--678.Google ScholarCross Ref
Xiao, B., R. Lunsford, R. Coulston, M. Wesson, & S.L. Oviatt. Modeling multimodal integration patterns and performance in seniors: Toward adaptive processing of individual differences. Proceedings of the International Conference on Multimodal Interfaces, 2003, Vancouver, BC: ACM Press, 265--272. Google ScholarDigital Library

Index Terms

Audio-visual cues distinguishing self- from system-directed speech in younger and older adults
1. Hardware
  1. Communication hardware, interfaces and storage
    1. Sound-based input / output
2. Human-centered computing
  1. Human computer interaction (HCI)
  2. Interaction design
    1. Interaction design process and methods
      1. Interface design prototyping
      2. User centered design
    2. Interaction design theory, concepts and paradigms

Recommendations

Toward open-microphone engagement for multiparty interactions
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

There currently is considerable interest in developing new open-microphone engagement techniques for speech and multimodal interfaces that perform robustly in complex mobile and multiparty field environments. State-of-the-art audio-visual open-...
Read More
Human perception of intended addressee during computer-assisted meetings
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

Recent research aims to develop new open-microphone engagement techniques capable of identifying when a speaker is addressing a computer versus human partner, including during computer-assisted group interactions. The present research explores: (1) how ...
Read More
When do we interact multimodally?: cognitive load and multimodal communication patterns
ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

Mobile usage patterns often entail high and fluctuating levels of difficulty as well as dual tasking. One major theme explored in this research is whether a flexible multimodal interface supports users in managing cognitive load. Findings from this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces
October 2005
344 pages
ISBN:1595930280
DOI:10.1145/1088463
General Chairs:
Gianni Lazzari
ITC-irst, Trento (Italy)
,
Fabio Pianesi
ITC-irst, Trento (Italy)
,
Program Chairs:
James Crowley
I.N.P. Grenoble (France)
,
Kenji Mase
Nagoya University (Japan)
,
Sharon Oviatt
Oregon Health & Sciences University
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 October 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
gaze
individual differences
intended addressee
multimodal interaction
open-microphone engagement
spoken amplitude
system adaptation
universal access
user modeling
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate453of1,080submissions,42%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 334
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Audio-visual cues distinguishing self- from system-directed speech in younger and older adults

ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Toward open-microphone engagement for multiparty interactions

Human perception of intended addressee during computer-assisted meetings

When do we interact multimodally?: cognitive load and multimodal communication patterns

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Audio-visual cues distinguishing self- from system-directed speech in younger and older adults

ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Toward open-microphone engagement for multiparty interactions

Human perception of intended addressee during computer-assisted meetings

When do we interact multimodally?: cognitive load and multimodal communication patterns

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media