Top

Journal of Autism and Developmental Disorders

Gepubliceerd in:

12-07-2022 | S.I. :Impact of Assistive Technology in Special Education

RETRACTED ARTICLE: Audio-Visual Automatic Speech Recognition Towards Education for Disabilities

Auteurs: Saswati Debnath, Pinki Roy, Suyel Namasudra, Ruben Gonzalez Crespo

Gepubliceerd in: Journal of Autism and Developmental Disorders | Uitgave 9/2023

Abstract

Education is a fundamental right that enriches everyone’s life. However, physically challenged people often debar from the general and advanced education system. Audio-Visual Automatic Speech Recognition (AV-ASR) based system is useful to improve the education of physically challenged people by providing hands-free computing. They can communicate to the learning system through AV-ASR. However, it is challenging to trace the lip correctly for visual modality. Thus, this paper addresses the appearance-based visual feature along with the co-occurrence statistical measure for visual speech recognition. Local Binary Pattern-Three Orthogonal Planes (LBP-TOP) and Grey-Level Co-occurrence Matrix (GLCM) is proposed for visual speech information. The experimental results show that the proposed system achieves 76.60 % accuracy for visual speech and 96.00 % accuracy for audio speech recognition.

vorige artikel Video Game Use, Aggression, and Social Impairment in Adolescents with Autism Spectrum Disorder

volgende artikel Discovery of eQTL Alleles Associated with Autism Spectrum Disorder: A Case–Control Study

Ahonen, T., et al. (2006). Face description with local binary patterns: Applications to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence,28(12), 2037–2041. https://doi.org/10.1109/TPAMI.2006.244.CrossRefPubMed

Azeta, A., et al. (2010). Intelligent voice-based e-education system: A framework and evaluation. International Journal of Computing,9, 327–334. https://doi.org/10.47839/ijc.9.4.726.CrossRef

Borde, P., et al. (2004). ‘vVISWa’: A multilingual multi-pose audio visual database for robust human computer interaction. International Journal of Computer Applications,137(4), 25–31. https://doi.org/10.5120/ijca2016908696.CrossRef

Borde, P., et al. (2014). Recognition of isolated words using Zernike and MFCC features for audio visual speech recognition. International Journal of Speech Technology,18(1), 23. https://doi.org/10.1007/s10772-014-9257-1.CrossRef

Chen, R., et al. (2022). Image-denoising algorithm based on improved K-singular value decomposition and atom optimization. CAAI Transactions on Intelligence Technology,7(1), 117–127. https://doi.org/10.1049/cit2.12044.CrossRef

Dave, N. (2015). A lip localization based visual feature extraction method. Electrical & Computer Engineering,4(4), 452. https://doi.org/10.14810/ecij.2015.4403.CrossRef

Davis, S., & Mermelstein, P. (1980). Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing,28(4), 357–365. https://doi.org/10.1109/TASSP.1980.1163420.CrossRef

Debnath, S., et al. (2021). Study of different feature extraction method for visual speech recognition. International Conference on Computer Communication and Informatics (ICCCI),2021, 1–5. https://doi.org/10.1109/ICCCI50826.2021.9402357.CrossRef

Debnath, S., & Roy, P. (2018). Study of speech enabled healthcare technology. International Journal of Medical Engineering and Informatics,11(1), 71–85. https://doi.org/10.1504/IJMEI.2019.096893.CrossRef

Debnath, S., & Roy, P. (2021). Appearance and shape-based hybrid visual feature extraction: Toward audio-visual automatic speech recognition. Signal, Image and Video Processing,15, 25–32. https://doi.org/10.1007/s11760-020-01717-0.CrossRef

Debnath, S., & Roy, P. (2021). Audio-visual automatic speech recognition using PZM, MFCC and statistical analysis. International Journal of Interactive Multimedia and Artificial Intelligence,7(2), 121–133. https://doi.org/10.9781/ijimai.2021.09.001.CrossRef

Devi, D., et al. (2020). A boosting-aided adaptive cluster-based undersampling approach for treatment of class imbalance problem. International Journal of Data Warehousing and Mining (IJDWM),16(3), 60–86. https://doi.org/10.4018/IJDWM.2020070104.CrossRef

Dupont, S., & Luettin, J. (2000). Audio-visual speech modeling for continuous speech recognition. IEEE Transaction on Multimedia,2(3), 141–151. https://doi.org/10.1109/6046.865479.CrossRef

Erber, N. P. (1975). Auditory-visual perception of speech. Journal of Speech and Hearing Disorders,40(4), 481–492. https://doi.org/10.1044/jshd.4004.481.CrossRefPubMed

Feng, W., et al. (2017). Audio visual speech recognition with multimodal recurrent neural networks. In International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 681–688, 14-19. https://doi.org/10.1109/IJCNN.2017.7965918

Galatas, G., et al. (2012). Audio-visual speech recognition using depth information from the Kinect in noisy video conditions. In Proceedings of International Conference on Pervasive Technologies Related to Assistive Environments, ACM, pp. 1–4 https://doi.org/10.1145/2413097.2413100

Gao, J., et al. (2021). Decentralized federated learning framework for the neighborhood: A case study on residential building load forecasting. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, ACM pp. 453–459. https://doi.org/10.1145/3485730.3493450

Ivanko, D., et al. (2021). An experimental analysis of different approaches to audio-visual speech recognition and lip-reading. In Proceedings of 15th International Conference on Electromechanics and Robotics, Springer, Singapore, pp. 197–209. https://doi.org/10.1007/978-981-15-5580-016

Jafarbigloo, S. K., & Danyali, H. (2021). Nuclear atypia grading in breast cancer histopathological images based on CNN feature extraction and LSTM classification. CAAI Transactions on Intelligence Technology,6(4), 426–439. https://doi.org/10.1049/cit2.12061.CrossRef

Jain, A., & Rathna, G. N. (2017). Visual speech recognition for isolated digits using discrete cosine transform and local binary pattern features. In IEEE Global Conference on Signal and Information Processing, IEEE, Montreal, pp. 368–372. https://doi.org/10.1109/GlobalSIP.2017.8308666

Jiang, R., et al. (2020). Object tracking on event cameras with offline-online learning. CAAI Transactions on Intelligence Technology,5(3), 165–171. https://doi.org/10.1049/trit.2019.0107.MathSciNetCrossRef

Kanungo, T., et al. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence,24(7), 2037–2041. https://doi.org/10.1109/TPAMI.2002.1017616.CrossRef

Kashevnik, A., et al. (2021). Multimodal corpus design for audio-visual speech recognition in vehicle cabin. IEEE Access,9, 34986–35003. https://doi.org/10.1109/ACCESS.2021.3062752.CrossRef

Kumar, L. A., et al. (2022). Deep learning based assistive technology on audio visual speech recognition for hearing impairedD. International Journal of Cognitive Computing in Engineering,3, 24–30. https://doi.org/10.1016/j.ijcce.2022.01.003.CrossRef

Kuncheva, I. (2004). Combining pattern classifiers: Methods and algorithms. Wiley.

Lazli, L., & Boukadoum, M. (2017). HMM/MLP speech recognition system using a novel data clustering approach. In IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE), IEEE, Windsor. https://doi.org/10.1109/CCECE.2017.7946644

Mohanaiah, P., et al. (2013). Image texture feature extraction using GLCM approach. International Journal of Scientific and Research Publications,3(5), 85.

Nadif, M., & Govaert, G. (2005). Block Clustering via the Block GEM and two-way EM algorithms. The 3rd ACS/IEEE International Conference on Computer Systems and Applications, IEEE. https://doi.org/10.1109/AICCSA.2005.1387029

Namasudra, S., & Roy, P. (2015). Size based access control model in cloud computing. In Proceeding of the International Conference on Electrical, Electronics, Signals, Communication and Optimization, IEEE, Visakhapatnam, pp. 1–4. https://doi.org/10.1109/EESCO.2015.7253753

Namasudra, S. (2020). Fast and secure data accessing by using DNA computing for the cloud environment. IEEE Transactions on Services Computing. https://doi.org/10.1109/TSC.2020.3046471.CrossRef

Namasudra, S., & Roy, P. (2017). A new table based protocol for data accessing in cloud computing. Journal of Information Science and Engineering,33(3), 585–609. https://doi.org/10.6688/JISE.2017.33.3.1.MathSciNetCrossRef

Noda, K., et al. (2014). Audio-visual speech recognition using deep learning. Applied Intelligence,42(4), 567. https://doi.org/10.1007/s10489-014-0629-7.CrossRef

Ojala, T., et al. (2002). Multi resolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transaction on Pattern Analysis and Machine Intelligence,24(7), 971–987. https://doi.org/10.1109/TPAMI.2002.1017623.CrossRef

Olivan, C. H., et al. (2021). Music boundary detection using convolutional neural networks: A comparative analysis of combined input features. International Journal of Interactive Multimedia and Artificial Intelligence,7(2), 78–88. https://doi.org/10.48550/arXiv.2008.07527.CrossRef

Patterson, E., et al. (2002). CUAVE: A new audio-visual database for multimodal human-computer interface research. In IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, Orlando. https://doi.org/10.1109/ICASSP.2002.5745028

Rauf, H. T., et al. (2021). Enhanced bat algorithm for COVID-19 short-term forecasting using optimized LSTM. Soft Computing,25(20), 12989–12999. https://doi.org/10.1007/s00500-021-06075-8.CrossRefPubMedPubMedCentral

Revathi, A., & Venkataramani, Y. (2009). Perceptual features based isolated digit and continuous speech recognition using iterative clustering approach networks and communication. In First International Conference on Networks & Communications, NetCoM., IEEE, Chennai. https://doi.org/10.1109/NetCoM.2009.32

Revathi, A., et al. (2019). Person authentication using speech as a biometric against play back attacks. Multimedia Tools Application,78(2), 1569–1582. https://doi.org/10.1007/s11042-018-6258-0.CrossRef

Shikha, B., et al. (2020). An extreme learning machine-relevance feedback framework for enhancing the accuracy of a hybrid image retrieval system. International Journal of Interactive Multimedia and Artificial Intelligence,6(2), 15–27. https://doi.org/10.9781/ijimai.2020.01.002.MathSciNetCrossRef

Shrawankar, U., & Thakare, V. (2010). Speech user interface for computer based education system. In International Conference on Signal and Image Processing, pp. 148–152. https://doi.org/10.1109/ICSIP.2010.5697459

Soni, B., et al. (2016). Text-dependent speaker verification using classical LBG, adaptive LBG and FCM vector quantization. International Journal of Speech Technology,19(3), 525–536. https://doi.org/10.1007/s10772-016-9346-4.CrossRef

Sui, C., et al. (2017). A cascade gray-stereo visual feature extraction method for visual and audio-visual speech recognition. Speech Communication,90(1), 89. https://doi.org/10.1016/j.specom.2017.01.005.CrossRef

Zhao, G., et al. (2009). Lipreading with local spatiotemporal descriptors. IEEE Transactions on Multimedia,11(7), 56. https://doi.org/10.1109/TMM.2009.2030637.CrossRef

Titel: RETRACTED ARTICLE: Audio-Visual Automatic Speech Recognition Towards Education for Disabilities
Auteurs: Saswati Debnath
Pinki Roy
Suyel Namasudra
Ruben Gonzalez Crespo
Publicatiedatum: 12-07-2022
Uitgeverij: Springer US
Gepubliceerd in: Journal of Autism and Developmental Disorders / Uitgave 9/2023
Print ISSN: 0162-3257
Elektronisch ISSN: 1573-3432
DOI: https://doi.org/10.1007/s10803-022-05654-4

Bohn Stafleu van Loghum

Deel dit onderdeel of sectie (kopieer de link)

Abstract

Log in om toegang te krijgen

Andere artikelen Uitgave 9/2023

Understanding and Supporting Attention Deficit Hyperactivity Disorder (ADHD) in the Primary School Classroom: Perspectives of Children with ADHD and their Teachers

Correction: Teaching Children with Autism Spectrum Disorder to Answer Questions Using an iPad-Based Speech-Generating Device

Autistic Traits and College Adjustment

Optimizing Parent Training to Improve Oral Health Behavior and Outcomes in Underserved Children with Autism Spectrum Disorder

Is quality of life related to high autistic traits, high ADHD traits and their Interaction? Evidence from a Young-Adult Community-Based twin sample

Discovery of eQTL Alleles Associated with Autism Spectrum Disorder: A Case–Control Study