Text-dependent speaker verification using classical LBG, adaptive LBG and FCM vector quantization

Soni, Badal; Debnath, Saswati; Das, Pradip K.

doi:10.1007/s10772-016-9346-4

Text-dependent speaker verification using classical LBG, adaptive LBG and FCM vector quantization

Published: 31 May 2016

Volume 19, pages 525–536, (2016)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Badal Soni¹,
Saswati Debnath¹ &
Pradip K. Das²

394 Accesses
8 Citations
Explore all metrics

Abstract

An important task of speaker verification is to generate speaker specific models and match an input speaker’s utterance with these models. This paper focuses on comparing the performance of text dependent speaker verification system using Mel Frequency Cepstral Coefficients feature and different Vector Quantization (VQ) based speaker modelling techniques to generate the speaker specific models. Speaker-specific information is mainly represented by spectral features and using these features we have developed the model which serves as an important entity for determining the claimed identity of the speaker. In the modelling part, we used Linde, Buzo, Gray (LBG) VQ, proposed adaptive LBG VQ and Fuzzy C Means (FCM) VQ for generating speaker specific model. The experimental results that are performed on microphonic database shows that accuracy significantly depends on the size of the codebook in all VQ techniques, and on FCM VQ accuracy also depend on the value of learning parameter of the objective function. Experiment results shows that how the accuracy of speaker verification system is depend on different representations of the codebook, different size of codebook in VQ modelling techniques and learning parameter in FCM VQ.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Chinese dialect speech recognition: a comprehensive survey

Article Open access 31 January 2024

Noise robust automatic speech recognition: review and analysis

Article 24 June 2023

References

Atal, B. S. (1976). Automatic recognition of speakers from their voices. Proceedings of the IEEE, 64(4), 460–475.
Article Google Scholar
Becchetti, C., & Ricotti, L. P. (1999). Speech recognition. New York: Wiley.
Google Scholar
Bezdek, J. C., & Harris, J. D. (1978). Fuzzy portions and relations: An axiomatic basis for clustering. Fuzzy Sets and Systems, 1, 111–127.
Article MathSciNet MATH Google Scholar
Burton, D. K. (1987). Text-dependent speaker verification using vector quantization source coding. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(2), 133–143.
Article Google Scholar
Buzo, A., Gray, A., Gray, R., & Markel, J. (1980). Speech coding based upon vector quantization”. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(5), 562–574.
Article MathSciNet MATH Google Scholar
Cannon, R. L., Dave, J. A., & Bezdek, J. C. (1986). Efficient implementation of the fuzzy C-means clustering algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(2), 248.
Article MATH Google Scholar
Chan, K. P., & Cheung, Y. S. (1992). Clustering of clusters. Pattern Recognition, 25, 211–217.
Article Google Scholar
Chen, S. H., & Luo, Y. R. (2009). Speaker verification using MFCC and support vector machine. Proceedings of the International MultiConference of Engineers and Computer Scientists., 1, 18.
Google Scholar
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–365.
Article Google Scholar
Deller, J. R., Proakis, J. G., & Hansen, H. L. (1993). Discrete time processing of speech signals. New York, NY: Macmillan.
Google Scholar
Douglas, A. R. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108.
Article Google Scholar
Douglas, A. R. (2001). An overview of automatic speaker recognition technology. IEEE international Conference on Acoustic, Speech, and signal processing (ICASSP), 4, IV-4072–IV-4075.
Douglas, A. R., & Richard, C. R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72.
Article Google Scholar
Fallahzadeh, M. R., Farokhi, F., Izadian, M., & Berangi, A. A. (2011). A hybrid reliable algorithm for speaker recognition based on improved DTW and VQ by genetic algorithm in noisy environment. International Conference on Multimedia and Signal Processing, 2, 269–273.
Google Scholar
Feng, L. (2004).Speaker recognition. IMM-THESIS: ISSN 1601-233X.
Gold, B., & Morgan, N. (2000). Speech and audio signal processing. New York, NY: Wiley.
Google Scholar
Gupta, C. S., Prasanna, S. R. M., & Yegnanarayana, B. (2002). Auto associative neural network models for online speaker verification using source features from vowels. International Joint Conference on Neural Networks, IJCNN’02, 2, 1252–1257.
Google Scholar
Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. 4th International conference on signal processing and communication systems (ICSPCS), pp. 1–5.
Ilyas, M. Z., Samad, S. A., Hussain, A., & Ishak, K. A. (2007). Speaker verification using vector quantization and hidden markov model. 5th IEEE student conference on research and development SCOReD, Malaysia, pp. 1–5.
Jayanna, H. S., & Prasanna S. R. M. (2008). Fuzzy vector quantization for speaker recognition under limited data conditions. IEEE Region 10 Conference TENCON, pp. 1–4.
Kabir, A., & Ahsan, S. M. M. (2007). Vector quantization in text dependent automatic speaker recognition using mel-frequency cepstrum coefficient. 6th WSEAS international conference on circuits, systems, electronics, control & signal processing, Cairo, Egypt.
Karpov, E. (2003). Real time speaker identification. Master’s thesis, Department of Computer Science, University of Joensuu.
Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design”. IEEE Transactions on Communications, 28(1), 84–95.
Article Google Scholar
Liu, M., Huang, T. S., & Zhang, Z. (2006). Robust local scoring function for text-independent speaker verification. International Conference on Pattern Recognition (ICPR), Hong Kong, pp. 1146–1149.
Memon, S., & Lech, M. (2008). Speaker verification based on information theoretic vector quantization. Wireless Networks, Information Processing and Systems, Communications in Computer and Information Science, 20, 391–399.
Article Google Scholar
Moureaux, J. M., Gauthier, P., Barlaud, M., & Bellemain, P. (1994). Vector quantization of raw SAR data. IEEE International Conference on Acoustics, Speech, and Signal Processing, 5, 189–192.
Google Scholar
Nijhawan, G., & Soni, M. K. (2014). Speaker recognition using MFCC and vector quantization. International Journal on Recent Trends in Engineering and Technology, 11(1), 211–218.
Google Scholar
Ou, G., & Ke, D. (2004). Text-independent speaker verification based on relation of MFCC components. International symposium on Chinese spoken language processing, pp. 57–60.
Pal, N. R., & Bezdek, J. C. (1995). On cluster validity for the fuzzy c-mean model. IEEE Transaction on Fuzzy System, 3, 370–379.
Article Google Scholar
Pandit, M., & Kittler, J. (1998). Feature selection for a DTW-based speaker verification system. IEEE International Conference on Acoustics, Speech and Signal Processing, 2, 769–772.
Google Scholar
Prasanna, S. R. M., Zachariah, J. M., & Yegnanarayana, B. (2004). Neural network models for combining evidence from spectral and suprasegmental features for text-dependent speaker verification. International conference on intelligent sensing and information processing, pp 359–363.
Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Upper Saddle River: Prentice Hall.
MATH Google Scholar
Rabiner, L. R., Levinson, S. E., & Sondhi, M. M. (1983). On the application of vector quantization and hidden Markov models to speaker independent, isolated word recognition. The, Bell System Technical Journal, 62(4), 1075–1105.
Article MathSciNet Google Scholar
Rabiner, L., Rosenberg, A., & Levinson, S. (1978). Considerations in dynamic time warping algorithms for discrete word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26, 575–582.
Article MATH Google Scholar
Ramirez, J., Gorriz, J. M., & Segura, J. C. (2007). Voice activity detection. fundamentals and speech recognition system robustness. Robust Speech Recognition and Understanding. ISBN 987-3-90213-08-0.
Saquib, Z., Salam, N., Nair, R., P., Pandey, N., & Joshi, A. (2010). A survey on automatic speaker recognition systems. In Communications in computer and information science (vol. 123, pp. 134–145) Berlin: Springer.
Shena, F., & Hasegawa, O. (2006). An adaptive incremental LBG for vector quantization. Neural Network, 19(5), 694–704.
Article MATH Google Scholar
Shore, J., & Burton, D. (1983). Discrete utterance speech recognition without time alignment. IEEE Transactions on Information Theory, 29(4), 473–491.
Article Google Scholar
Tappert, C. C., & Das, S. K. (1978). Memory and time improvements in dynamic time for matching speech pattern. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26, 583–586.
Article Google Scholar
Um, I. T., Won, J. J., & Kim, M. H. (2000). Text independent speaker verification using modular neural network. IEEE-INNS-ENNS international joint conference on neural networks. Vol. 6, pp. 97–102.
Wong, L. P., & Russell, M. (2001). Text-dependent speaker verification under noisy conditions using parallel model combination. In IEEE international conference on acoustics, speech, and signal processing, Vol. 1 pp. 457–460.
Wu, Z., Gao, S., Cling, E. S., & Li, H.(2014). A study on replay attack and anti-spoofing for text-dependent speaker verification. Signal and information processing association annual summit and conference (APSIPA), pp. 1–5.
Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, C. S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed-textspeaker verification system. IEEE Transactions on Speech and Audio Processing, 3(4), 575.
Article Google Scholar
Yoma, N. B., & Villar, M. (2002). Speaker verification in noise using a stochastic version of the weighted Viterbi algorithm. IEEE Transactions on Speech and Audio Processing, 10(3), 158–166.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, National Institute of Technology, Silchar, India
Badal Soni & Saswati Debnath
Department of Computer Science & Engineering, IIT Guwahati, Assam, India
Pradip K. Das

Authors

Badal Soni
View author publications
You can also search for this author in PubMed Google Scholar
Saswati Debnath
View author publications
You can also search for this author in PubMed Google Scholar
Pradip K. Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Badal Soni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Soni, B., Debnath, S. & Das, P.K. Text-dependent speaker verification using classical LBG, adaptive LBG and FCM vector quantization. Int J Speech Technol 19, 525–536 (2016). https://doi.org/10.1007/s10772-016-9346-4

Download citation

Received: 13 November 2015
Accepted: 23 May 2016
Published: 31 May 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s10772-016-9346-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text-dependent speaker verification using classical LBG, adaptive LBG and FCM vector quantization

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Chinese dialect speech recognition: a comprehensive survey

Noise robust automatic speech recognition: review and analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Text-dependent speaker verification using classical LBG, adaptive LBG and FCM vector quantization

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Chinese dialect speech recognition: a comprehensive survey

Noise robust automatic speech recognition: review and analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation