Authors :
Dwarika; Sudhanshu Shekhar Dadsena; Nikita Rawat
Volume/Issue :
Volume 10 - 2025, Issue 5 - May
Google Scholar :
https://tinyurl.com/yhhkmwna
DOI :
https://doi.org/10.38124/ijisrt/25may2249
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Speech Emotion Recognition (SER) is a growing field in artificial intelligence focused on identifying human
emotions from vocal input. This research presents a web-based system that processes speech signals, extracts meaningful
acoustic features, and applies a machine learning model to classify emotions such as happy, sad, angry, and neutral. The
study utilises the RAVDESS dataset [1], and features are extracted using Librosa [2]. The model, trained with a Multi-Layer
Perceptron (MLP), achieves a test accuracy of 72.22% [3]. A Flask-based web application is built for emotion analysis
through uploaded audio files. The system has potential applications in mental health monitoring, interactive assistants, and
human-computer communication.
Keywords :
Speech Emotion Recognition, MFCC, Chroma, Mel Spectrogram, MLP Classifier, Flask, RAVDESS Dataset.
References :
- Livingstone, S. R., & Russo, F. A. (2018). RAVDESS: Ryerson Audio-Visual Database of Emotional Speech and Song. https://doi.org/10.5281/zenodo.1188976
- McFee, B., et al. (2015). Librosa: Audio and Music Signal Analysis in Python. https://librosa.org/
- Geron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media.
- El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features and methods.
- Schuller, B., Steidl, S., & Batliner, A. (2009). The INTERSPEECH 2009 Emotion Challenge.
- Fayek, H. M., Lech, M., & Cavedon, L. (2017). Evaluating deep learning architectures for speech emotion recognition. arXiv:1705.03122.
- Zhao, J., Mao, X., & Chen, L. (2019). Speech emotion recognition using deep CNN-LSTM networks.
- Latif, S., Rana, R., Khalifa, S., et al. (2020). Survey of Deep Learning Techniques for SER. IEEE Access.
- Cho, K., et al. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation.
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
- Creswell, J. W. (2014). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches.
- Pedregosa, F., et al. (2011). Scikit-learn: Machine Learning in Python. https://scikit-learn.org/
Speech Emotion Recognition (SER) is a growing field in artificial intelligence focused on identifying human
emotions from vocal input. This research presents a web-based system that processes speech signals, extracts meaningful
acoustic features, and applies a machine learning model to classify emotions such as happy, sad, angry, and neutral. The
study utilises the RAVDESS dataset [1], and features are extracted using Librosa [2]. The model, trained with a Multi-Layer
Perceptron (MLP), achieves a test accuracy of 72.22% [3]. A Flask-based web application is built for emotion analysis
through uploaded audio files. The system has potential applications in mental health monitoring, interactive assistants, and
human-computer communication.
Keywords :
Speech Emotion Recognition, MFCC, Chroma, Mel Spectrogram, MLP Classifier, Flask, RAVDESS Dataset.