ABSTRACT
In spite of interest in developing robust open-microphone engagement techniques for mobile use and natural field contexts, there currently are no reliable techniques available. One problem is the lack of empirically-grounded models as guidance for distinguishing how users' audio-visual activity actually differs systematically when addressing a computer versus human partner. In particular, existing techniques have not been designed to handle high levels of user self talk as a source of "noise," and they typically assume that a user is addressing the system only when facing it while speaking. In the present research, data were collected during two related studies in which adults aged 18-89 interacted multimodally using speech and pen with a simulated map system. Results revealed that people engaged in self talk prior to addressing the system over 30% of the time, with no decrease in younger adults' rate of self talk compared with elders. Speakers' amplitude was lower during 96% of their self talk, with a substantial 26 dBr amplitude separation observed between self- and system-directed speech. The magnitude of speaker's amplitude separation ranged from approximately 10-60 dBr and diminished with age, with 79% of the variance predictable simply by knowing a person's age. In contrast to the clear differentiation of intended addressee revealed by amplitude separation, gaze at the system was not a reliable indicator of speech directed to the system, with users looking at the system over 98% of the time during both self- and system-directed speech. Results of this research have implications for the design of more effective open-microphone engagement for mobile and pervasive systems.
- Bakx, I., K.v. Turnhout, & J. Terken. Facial orientation during multi-party interaction with information kiosks. Proceedings of the Interact Conference, 2003, Zurich, Switzerland: IOS Press, 701--704.Google Scholar
- Berk, L.E. Why children talk to themselves. Scientific American, 1994, 271(5), 78--83.Google ScholarCross Ref
- Boersma, P. & D. Weenik, Praat: doing phonetics by computer (Version 4.2). 2005. (URL: www.praat.org)Google Scholar
- Buxton, W. Integrating the periphery and context: A new taxonomy of telematics. Proceedings of the Graphics Interface Conference, 1995, Quebec City, Quebec: Morgan Kaufman, 239--246.Google Scholar
- Comblain, A. Working memory in Down's Syndrome: Training the rehearsal strategy. Down's Syndrome: Research and Practice, 1994, 2(3), 123--126.Google ScholarCross Ref
- Czaja, S.J. & C.C. Lee., Designing computer systems for older adults. Handbook of Human-Computer Interaction, J. Jacko & A. Sears, eds: LEA, NY. 2002, 413--427. Google ScholarDigital Library
- Duncan, R.M. & J.A. Cheyne. Private speech in young adults: Task difficulty, self-regulation, and psychological predication. Cognitive Development, 2002, 16, 889--906.Google Scholar
- Katzenmaier, M., R. Steifelhagen, & T. Schultz. Identifying the addressee in human-human-robot interactions based on head pose and speech. Proceedings of the International Conference on Multimodal Interfaces, 2004, State College, PA: ACM Press, 144--151. Google ScholarDigital Library
- Luria, A.R. The Role of Speech in the Regulation of Normal and Abnormal Behavior, 1961, Liveright, NY.Google Scholar
- Meichenbaum, D. & J. Goodman. Reflection-impulsivity and verbal control of motor behavior. Child Development, 1969, 40, 785--797.Google Scholar
- Messer, S.B. Reflection-impulsivity: A review. Psychological Bulletin, 1976, 83(6), 1026--1052.Google ScholarCross Ref
- Neti, C., G. Iyengar, G. Potamianos, A. Senior, & B. Maison. Perceptual interfaces for information interaction: Joint processing of audio and visual information for human-computer interaction. Proceedings of the International Conference on Spoken Language Processing, 2000, Beijing: Chinese Friendship Publishers, 11--14.Google Scholar
- Oppermann, D., F. Schiel, S. Steininger, & N. Berger. Off-talk - a problem for human-machine-interaction? Proceedings of the EuroSpeech Conference, 2001, Aalborg, Denmark: ISCA Secretariat, 2197--2200.Google Scholar
- Oviatt, S.L., P.R. Cohen, & M.Q. Wang. Toward interface design for human language technology: Modality and structure as determinants of linguistic complexity. Speech Communication, 1994, 15, 283--300. Google ScholarDigital Library
- Oviatt, S.L., R. Coulston, S. Tomko, B. Xiao, R. Lunsford, M. Wesson, & L. Carmichael. Toward a theory of organized multimodal integration patterns during human-computer interaction. Proceedings of the International Conference on Multimodal Interfaces, 2003, Vancouver, BC: ACM Press, 44--51. Google ScholarDigital Library
- Paek, T., E. Horvitz, & E. Ringger. Continuous listening for unconstrained spoken dialog. Proceedings of the ICSLP, 2000, Beijing, China: 138--141.Google Scholar
- Svirsky, M.A., H. Lane, J.S. Perkell, & J. Wozniak. Effects of short-term auditory deprivation on speech production in adult cochlear implant users. Journal of the Acoustic Society of America, 1992, 3, 1284--1300.Google ScholarCross Ref
- Wilpon, J. & C. Jacobsen. A study of speech recognition for children and the elderly. Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 1996, Atlanta, GA: IEEE Press, 349--352. Google ScholarDigital Library
- Winsler, A. & J. Naglieri. Overt and covert verbal problem-solving strategies: Developmental trends in use, awareness, and relations with task performance in children aged 5 to 17. Child Development, 2003, 74(3), 659--678.Google ScholarCross Ref
- Xiao, B., R. Lunsford, R. Coulston, M. Wesson, & S.L. Oviatt. Modeling multimodal integration patterns and performance in seniors: Toward adaptive processing of individual differences. Proceedings of the International Conference on Multimodal Interfaces, 2003, Vancouver, BC: ACM Press, 265--272. Google ScholarDigital Library
Index Terms
- Audio-visual cues distinguishing self- from system-directed speech in younger and older adults
Recommendations
Toward open-microphone engagement for multiparty interactions
ICMI '06: Proceedings of the 8th international conference on Multimodal interfacesThere currently is considerable interest in developing new open-microphone engagement techniques for speech and multimodal interfaces that perform robustly in complex mobile and multiparty field environments. State-of-the-art audio-visual open-...
Human perception of intended addressee during computer-assisted meetings
ICMI '06: Proceedings of the 8th international conference on Multimodal interfacesRecent research aims to develop new open-microphone engagement techniques capable of identifying when a speaker is addressing a computer versus human partner, including during computer-assisted group interactions. The present research explores: (1) how ...
When do we interact multimodally?: cognitive load and multimodal communication patterns
ICMI '04: Proceedings of the 6th international conference on Multimodal interfacesMobile usage patterns often entail high and fluctuating levels of difficulty as well as dual tasking. One major theme explored in this research is whether a flexible multimodal interface supports users in managing cognitive load. Findings from this ...
Comments