skip to main content
10.1145/1088463.1088494acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

Audio-visual cues distinguishing self- from system-directed speech in younger and older adults

Published:04 October 2005Publication History

ABSTRACT

In spite of interest in developing robust open-microphone engagement techniques for mobile use and natural field contexts, there currently are no reliable techniques available. One problem is the lack of empirically-grounded models as guidance for distinguishing how users' audio-visual activity actually differs systematically when addressing a computer versus human partner. In particular, existing techniques have not been designed to handle high levels of user self talk as a source of "noise," and they typically assume that a user is addressing the system only when facing it while speaking. In the present research, data were collected during two related studies in which adults aged 18-89 interacted multimodally using speech and pen with a simulated map system. Results revealed that people engaged in self talk prior to addressing the system over 30% of the time, with no decrease in younger adults' rate of self talk compared with elders. Speakers' amplitude was lower during 96% of their self talk, with a substantial 26 dBr amplitude separation observed between self- and system-directed speech. The magnitude of speaker's amplitude separation ranged from approximately 10-60 dBr and diminished with age, with 79% of the variance predictable simply by knowing a person's age. In contrast to the clear differentiation of intended addressee revealed by amplitude separation, gaze at the system was not a reliable indicator of speech directed to the system, with users looking at the system over 98% of the time during both self- and system-directed speech. Results of this research have implications for the design of more effective open-microphone engagement for mobile and pervasive systems.

References

  1. Bakx, I., K.v. Turnhout, & J. Terken. Facial orientation during multi-party interaction with information kiosks. Proceedings of the Interact Conference, 2003, Zurich, Switzerland: IOS Press, 701--704.Google ScholarGoogle Scholar
  2. Berk, L.E. Why children talk to themselves. Scientific American, 1994, 271(5), 78--83.Google ScholarGoogle ScholarCross RefCross Ref
  3. Boersma, P. & D. Weenik, Praat: doing phonetics by computer (Version 4.2). 2005. (URL: www.praat.org)Google ScholarGoogle Scholar
  4. Buxton, W. Integrating the periphery and context: A new taxonomy of telematics. Proceedings of the Graphics Interface Conference, 1995, Quebec City, Quebec: Morgan Kaufman, 239--246.Google ScholarGoogle Scholar
  5. Comblain, A. Working memory in Down's Syndrome: Training the rehearsal strategy. Down's Syndrome: Research and Practice, 1994, 2(3), 123--126.Google ScholarGoogle ScholarCross RefCross Ref
  6. Czaja, S.J. & C.C. Lee., Designing computer systems for older adults. Handbook of Human-Computer Interaction, J. Jacko & A. Sears, eds: LEA, NY. 2002, 413--427. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Duncan, R.M. & J.A. Cheyne. Private speech in young adults: Task difficulty, self-regulation, and psychological predication. Cognitive Development, 2002, 16, 889--906.Google ScholarGoogle Scholar
  8. Katzenmaier, M., R. Steifelhagen, & T. Schultz. Identifying the addressee in human-human-robot interactions based on head pose and speech. Proceedings of the International Conference on Multimodal Interfaces, 2004, State College, PA: ACM Press, 144--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Luria, A.R. The Role of Speech in the Regulation of Normal and Abnormal Behavior, 1961, Liveright, NY.Google ScholarGoogle Scholar
  10. Meichenbaum, D. & J. Goodman. Reflection-impulsivity and verbal control of motor behavior. Child Development, 1969, 40, 785--797.Google ScholarGoogle Scholar
  11. Messer, S.B. Reflection-impulsivity: A review. Psychological Bulletin, 1976, 83(6), 1026--1052.Google ScholarGoogle ScholarCross RefCross Ref
  12. Neti, C., G. Iyengar, G. Potamianos, A. Senior, & B. Maison. Perceptual interfaces for information interaction: Joint processing of audio and visual information for human-computer interaction. Proceedings of the International Conference on Spoken Language Processing, 2000, Beijing: Chinese Friendship Publishers, 11--14.Google ScholarGoogle Scholar
  13. Oppermann, D., F. Schiel, S. Steininger, & N. Berger. Off-talk - a problem for human-machine-interaction? Proceedings of the EuroSpeech Conference, 2001, Aalborg, Denmark: ISCA Secretariat, 2197--2200.Google ScholarGoogle Scholar
  14. Oviatt, S.L., P.R. Cohen, & M.Q. Wang. Toward interface design for human language technology: Modality and structure as determinants of linguistic complexity. Speech Communication, 1994, 15, 283--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Oviatt, S.L., R. Coulston, S. Tomko, B. Xiao, R. Lunsford, M. Wesson, & L. Carmichael. Toward a theory of organized multimodal integration patterns during human-computer interaction. Proceedings of the International Conference on Multimodal Interfaces, 2003, Vancouver, BC: ACM Press, 44--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Paek, T., E. Horvitz, & E. Ringger. Continuous listening for unconstrained spoken dialog. Proceedings of the ICSLP, 2000, Beijing, China: 138--141.Google ScholarGoogle Scholar
  17. Svirsky, M.A., H. Lane, J.S. Perkell, & J. Wozniak. Effects of short-term auditory deprivation on speech production in adult cochlear implant users. Journal of the Acoustic Society of America, 1992, 3, 1284--1300.Google ScholarGoogle ScholarCross RefCross Ref
  18. Wilpon, J. & C. Jacobsen. A study of speech recognition for children and the elderly. Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 1996, Atlanta, GA: IEEE Press, 349--352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Winsler, A. & J. Naglieri. Overt and covert verbal problem-solving strategies: Developmental trends in use, awareness, and relations with task performance in children aged 5 to 17. Child Development, 2003, 74(3), 659--678.Google ScholarGoogle ScholarCross RefCross Ref
  20. Xiao, B., R. Lunsford, R. Coulston, M. Wesson, & S.L. Oviatt. Modeling multimodal integration patterns and performance in seniors: Toward adaptive processing of individual differences. Proceedings of the International Conference on Multimodal Interfaces, 2003, Vancouver, BC: ACM Press, 265--272. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Audio-visual cues distinguishing self- from system-directed speech in younger and older adults

                    Recommendations

                    Comments

                    Login options

                    Check if you have access through your login credentials or your institution to get full access on this article.

                    Sign in
                    • Published in

                      cover image ACM Conferences
                      ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces
                      October 2005
                      344 pages
                      ISBN:1595930280
                      DOI:10.1145/1088463

                      Copyright © 2005 ACM

                      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      • Published: 4 October 2005

                      Permissions

                      Request permissions about this article.

                      Request Permissions

                      Check for updates

                      Qualifiers

                      • Article

                      Acceptance Rates

                      Overall Acceptance Rate453of1,080submissions,42%

                    PDF Format

                    View or Download as a PDF file.

                    PDF

                    eReader

                    View online with eReader.

                    eReader