Emotion recognition using speech processing| International Journal of Innovative Science and Research Technology

Emotion Recognition using Speech Processing

Authors : Dwarika; Sudhanshu Shekhar Dadsena; Nikita Rawat

Volume/Issue : Volume 10 - 2025, Issue 5 - May

Google Scholar : https://tinyurl.com/yhhkmwna

DOI : https://doi.org/10.38124/ijisrt/25may2249

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : Speech Emotion Recognition (SER) is a growing field in artificial intelligence focused on identifying human emotions from vocal input. This research presents a web-based system that processes speech signals, extracts meaningful acoustic features, and applies a machine learning model to classify emotions such as happy, sad, angry, and neutral. The study utilises the RAVDESS dataset [1], and features are extracted using Librosa [2]. The model, trained with a Multi-Layer Perceptron (MLP), achieves a test accuracy of 72.22% [3]. A Flask-based web application is built for emotion analysis through uploaded audio files. The system has potential applications in mental health monitoring, interactive assistants, and human-computer communication.

Keywords : Speech Emotion Recognition, MFCC, Chroma, Mel Spectrogram, MLP Classifier, Flask, RAVDESS Dataset.

References :

Livingstone, S. R., & Russo, F. A. (2018). RAVDESS: Ryerson Audio-Visual Database of Emotional Speech and Song. https://doi.org/10.5281/zenodo.1188976
McFee, B., et al. (2015). Librosa: Audio and Music Signal Analysis in Python. https://librosa.org/
Geron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media.
El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features and methods.
Schuller, B., Steidl, S., & Batliner, A. (2009). The INTERSPEECH 2009 Emotion Challenge.
Fayek, H. M., Lech, M., & Cavedon, L. (2017). Evaluating deep learning architectures for speech emotion recognition. arXiv:1705.03122.
Zhao, J., Mao, X., & Chen, L. (2019). Speech emotion recognition using deep CNN-LSTM networks.
Latif, S., Rana, R., Khalifa, S., et al. (2020). Survey of Deep Learning Techniques for SER. IEEE Access.
Cho, K., et al. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
Creswell, J. W. (2014). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches.
Pedregosa, F., et al. (2011). Scikit-learn: Machine Learning in Python. https://scikit-learn.org/

Speech Emotion Recognition (SER) is a growing field in artificial intelligence focused on identifying human emotions from vocal input. This research presents a web-based system that processes speech signals, extracts meaningful acoustic features, and applies a machine learning model to classify emotions such as happy, sad, angry, and neutral. The study utilises the RAVDESS dataset [1], and features are extracted using Librosa [2]. The model, trained with a Multi-Layer Perceptron (MLP), achieves a test accuracy of 72.22% [3]. A Flask-based web application is built for emotion analysis through uploaded audio files. The system has potential applications in mental health monitoring, interactive assistants, and human-computer communication.

Keywords : Speech Emotion Recognition, MFCC, Chroma, Mel Spectrogram, MLP Classifier, Flask, RAVDESS Dataset.