Abstract
Recent studies show that visual information contained in visual speech can be helpful for the performance enhancement of audio-only blind source separation (BSS) algorithms. Such information is exploited through the statistical characterisation of the coherence between the audio and visual speech using, e.g. a Gaussian mixture model (GMM). In this paper, we present two new contributions. An adapted expectation maximization (AEM) algorithm is proposed in the training process to model the audio-visual coherence upon the extracted features. The coherence is exploited to solve the permutation problem in the frequency domain using a new sorting scheme. We test our algorithm on the XM2VTS multimodal database. The experimental results show that our proposed algorithm outperforms traditional audio-only BSS.
This work was supported by the Engineering and Physical Sciences Research Council (EPSRC) (Grant number EP/H012842/1) and the MOD University Defence Research Centre on Signal Processing (UDRC).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jutten, C., Herault, J.: Blind Separation of Sources, Part I: An Adaptive Algorithm Based on Neuromimetic Architecture. Signal Process. 24(1), 1–10 (1991)
Comon, P.: Independent Component Analysis, a New Concept? Signal Process. 36(3), 287–314 (1994)
Cardoso, J.F., Souloumiac, A.: Blind Beamforming for Non-Gaussian Signals. IEEE Proc.-F 140(6), 362–370 (1993)
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, New York (2001)
Sodoyer, D., Schwartz, J.L., Girin, L., Klinkisch, J., Jutten, C.: Separation of Audio-Visual Speech Sources: a New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli. EURASIP J. Appl. Signal Process. 11, 1165–1173 (2002)
Wang, W., Cosker, D., Hicks, Y., Sanei, S., Chambers, J.: Video Assisted Speech Source Separation. In: Proc. IEEE ICASSP, pp. 425–428 (2005)
Rivet, B., Girin, L., Jutten, C.: Mixing Audiovisual Apeech Processing and Blind Source Separation for the Extraction of Speech Signals from Convolutive Mixtures. IEEE Trans. Audio Speech Lang. Process. 15(1), 96–108 (2009)
Anemüller, J., Kollmeier, B.: Amplitude Modulation Decorrelation for Convolutive Blind Source Separation. In: Proc. ICA, pp. 215–220 (2000)
Ikram, M.Z., Morgan, D.R.: A Beamforming Approach to Permutation Alignment for Multichannel Frequency-Domain Blind Speech Separation. In: Proc. IEEE ICASSP, pp. 881–884 (2002)
Matsuoka, K., Nakashima, S.: Minimal Distortion Principle for Blind Source Separation. In: Proc. ICA, pp. 722–727 (2001)
Thomas, J., Deville, Y., Hosseini, S.: Time-Domain Fast Fixed-Point Algorithms for Convolutive ICA. IEEE Signal Process. Lett. 13(4), 228–231 (2006)
Messer, K., Matas, J., Kittler, J., Luettin, J., Maitre, G.: XM2VTSDB: The Extended M2VTS Database. In: AVBPA (1999), http://www.ee.surrey.ac.uk/CVSSP/xm2vtsdb/
Westner, A.: Room Impulse Responses (1998), http://alumni.media.mit.edu/~westner/papers/ica99/node2.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, Q., Wang, W., Jackson, P. (2010). Use of Bimodal Coherence to Resolve Spectral Indeterminacy in Convolutive BSS. In: Vigneron, V., Zarzoso, V., Moreau, E., Gribonval, R., Vincent, E. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2010. Lecture Notes in Computer Science, vol 6365. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15995-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-15995-4_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15994-7
Online ISBN: 978-3-642-15995-4
eBook Packages: Computer ScienceComputer Science (R0)