skip to main content
10.1145/2141622.2141646acmotherconferencesArticle/Chapter ViewAbstractPublication PagespetraConference Proceedingsconference-collections
research-article

Audio visual speech recognition in noisy visual environments

Authors Info & Claims
Published:25 May 2011Publication History

ABSTRACT

Speech recognition is a natural means of interaction for a human with a smart assistive environment. In order for this interaction to be effective, such a system should attain a high recognition rate even under adverse conditions. Audio-visual speech recognition (AVSR) can be of help in such environments, especially under the presence of audio noise. However the impact of visual noise to its performance has not been studied sufficiently in the literature. In this paper, we examine the effects of visual noise to AVSR, reporting experiments on the relatively simple task of connected digit recognition, under moderate acoustic noise and a variety of types of visual noise. The latter can be caused by either faulty sensors or video signal transmission problems that can be found in smart assistive environments. Our AVSR system exhibits higher accuracy in comparison to an audio-only recognizer and robust performance in most cases of noisy video signals considered.

References

  1. J. Huang, X. Zhuang, V. Libal and G. Potamianos, "Long-time span acoustic activity analysis from far-field sensors in smart homes", In Proc. ICASSP, pp. 4173--4176, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Iwano, S. Tamura and S. Furui, "Bimodal speech recognition using lip movement measured by optical-flow analysis", In Proc. HSC, pp.187--190, 2001.Google ScholarGoogle Scholar
  3. S. Nakamura, H. Ito and K. Shikano, "Stream weight optimization of speech and lip image sequence for audio-visual speech recognition", In Proc. ICSLP, vol. 3, pp. 20--24, 2000.Google ScholarGoogle Scholar
  4. G. Potamianos, C. Neti, G. Gravier, A. Garg and A. W. Senior, "Recent advances in the automatic recognition of audio-visual speech.", Invited, In Proc. IEEE, vol. 91, no. 9, pp. 1306--1326, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  5. G. Potamianos, H. P. Graf and E. Cosatto, "An image transform approach for HMM based automatic lipreading", In Proc. ICIP, vol. 3, pp. 173--177, Chicago, IL, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  6. G. Bradski and A. Kaehler. "Learning OpenCV: Computer Vision with the OpenCV Library." O'Reilly Media, 1st edition, September 2008.Google ScholarGoogle Scholar
  7. C. M. Bishop, "Pattern Recognition and Machine Learning." Springer, Heidelberg, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Potamianos and P. Scalnon, "Exploiting lower face symmetry in appearance-based automatic speechreading", In Proc. Works. AVSP, pp. 79--84, Vancouver Island, Canada, 2005.Google ScholarGoogle Scholar
  9. S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev and P. Woodland, "The HTK Book", Cambridge Univ. Eng. Dept., Tech Rep, 2002.Google ScholarGoogle Scholar
  10. E. K. Patterson, S. Gurbuz, Z. Tufekci and J. N. Gowdy, "CUAVE: A new audio-visual database for multimodal human-computer interface research", In Proc. IEEE ICASSP, vol. 2, pp. 2017--2020, 2002.Google ScholarGoogle Scholar
  11. J. Shain, C. B. Owen and F. Makedon, "Detecting lip motion in digital video", In Proc. SPIE Multimedia Systems and Applications, vol. 3528, pp.15--25, 1999.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Audio visual speech recognition in noisy visual environments

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              PETRA '11: Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments
              May 2011
              401 pages
              ISBN:9781450307727
              DOI:10.1145/2141622

              Copyright © 2011 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 25 May 2011

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader