Main Article Content
Speech has a significant role in sharing thoughts as an emotion-carrier. Speech emotion recognition has numerous applications in areas such as robots, mobile services, and psychological testing. Although the recognition of human emotions from speech has advanced thanks to deep learning, there are still issues with SER research, including a lack of data and poor model accuracy. When using Speech Emotion Recognition (SER), emotional traits frequently show up as various energy patterns in spectrograms. The potential applications of Speech Emotion Recognition (SER) in fields including human-computer interaction, emotional computing, and psychology have led to the field's emergence as a significant study area in recent years. The goal of this study is to investigate how to extract features from voice signals and identify emotions using MFCCs, and standard neural networks (CNNs). In the proposed approach, voice data is preprocessed to reduce background noise and extract pertinent features like MFCCs. The data are obtained from the Kaggle open source RAVDESS and are then reduced in dimension using feature selection technique. The dataset size is then increased and the robustness of the model is improved using data augmentation approaches. Due to its versatility and success in a variety of classification problems, the CNN algorithm is chosen as the classification approach.The outcomes demonstrate that the suggested approach outperforms other cutting-edge techniques and achieves excellent accuracy. The proposed system illustrates the value of data preprocessing, feature extraction, and classification techniques in achieving high performance in this task and shows the possibility of using MFCCs and CNNs for speech emotion recognition.
Mingke Xu, Fan Zhang, Xiaodong Cui, Wei Zhang,” Speech Emotion Recognition With Multiscale Area Attention And Data Augmentation”, IEEE 2021.
Shunming Zhong, Baoxian Yu, Han Zhang,”Exploration Of An Independent Training Framework For Speech Emotion Recognition”, IEEE 2020
“Multiple Models Fusion For Multi-label Classification In Speech Emotion Recognition Systems” Anwer Slimi, Nafaa Haffar, Mounir Zrigui, Henri Nicolas, Science direct 2022
Pengxu Jiang, Hongliang Fu, Huawei Tao, Peizhi Lei And Li Zhao, ”Parallelized Convolutional Recurrent Neural Network With Spectral Features For Speech Emotion Recognition”, IEEE 2019
Chinmay Chakraborty, Tusar Kanti Dash, Ganapati Panda, Sandeep Singh Solanki, “Phase-based Cepstral Features For Automatic Speech Emotion Recognition Of Low Resource Indian Languages”, ACM 2022.
Elias Nii Noi Ocquaye, Qirong Mao, Heping Song, “Dual Exclusive Attentive Transfer For Unsupervised Deep Convolutional Domain Adaptation In Speech Emotion Recognition” ,IEEE 2019.
Anwer Slimia , Nafaa Haffarb, Mounir Zriguib and Henri Nicolasa “Multiple Models Fusion for Multi-label Classification in Speech Emotion Recognition Systems” ScienceDirect 2022.
Jun Deng, Sascha Frühhol, Zixing Zhang And Björn Schuller “Recognizing Emotions From Whispered Speech Based on Acoustic Feature Transfer Learning” IEEE 2017.
Ngoc-Huynh Ho , Hyung-Jeong Yang ,Soo-Hyung Kim And Gueesang Lee “Multimodal Approach of Speech Emotion Recognition Using Multi-Level Multi-Head Fusion Attention-Based Recurrent Neural Network” IEEE 2020.
Sattaya Singkul ,Kuntpong Woraratpanya “Vector learning representation for generalized speech emotion recognition” https://doi.org/10.1016/j.heliyon.2022.e09196
Ziping Zhao , Zhongtian Bao, Yiqin Zhao, Zixing Zhang, Nicholas Cummins 3 , Zhao Ren, And Björn Schuller “Exploring Deep Spectrum Representations via Attention-Based Recurrent and Convolutional Neural Networks for Speech Emotion Recognition” vol 7 IEEE 2018.
M.U.Inamdar ,Anagha Sonawane, Kishor B. Bhangale “Sound based Human Emotion Recognition using MFCC & Multiple SVM” ICICIC 2017.
Bellagha,M.L.,Zrigui,M.,2020.Speaker naming in tv programs based on speaker role recognition ,in :2020 IEEE /ACS 17th International Conferenceon Computer Systems and Applications (AICCSA),IEEE.pp.1–8.
Nithya, R.S., Prabhakaran, M., Betty, P., 2018. Speech emotion recognition using deep learning. International Journal of Recent Technology and Engineering IJRTE 7, 2277–3878.
Pulung Nurtantio Andono, Guruh Fajar Shidik, Dwi Puji Prabowo, Dewi Pergiwati, and Ricardus Anggi Pramunendar. 2022. Bird Voice Classiication Based on Combination Feature Extraction and Reduction Dimension with the K-Nearest Neighbor. Int. J. Intell. Eng. Syst 15 (2022), 262ś272.
Gaurav Aggarwal, Sarada Prasad Gochhayat, and Latika Singh. 2021. “Parameterization techniques for automatic speech recognition system”.
R. Shah, S. Silwal, Using dimensionality reduction to optimize t-sne, preprint, arXiv :1912 .01098, 2019.
H. Gunes and B. Schuller, ``Categorical and dimensional affect analysis in continuous input: Current trends and future directions,'' Image Vis. Comput., vol. 31, no. 2, pp. 120/136, Feb. 2013.
M. Sreeshakthy and J. Preethi, ``Classication of human emotion from Deap EEG signal using hybrid improved neural networks with cuckoo search,'' Broad Res. Artif. Intell. Neurosci., vol. 6, nos. 34, pp. 60 73, 2016.