Emotion Recognition using Speech Processing


Authors : Dwarika; Sudhanshu Shekhar Dadsena; Nikita Rawat

Volume/Issue : Volume 10 - 2025, Issue 5 - May


Google Scholar : https://tinyurl.com/yhhkmwna

DOI : https://doi.org/10.38124/ijisrt/25may2249

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : Speech Emotion Recognition (SER) is a growing field in artificial intelligence focused on identifying human emotions from vocal input. This research presents a web-based system that processes speech signals, extracts meaningful acoustic features, and applies a machine learning model to classify emotions such as happy, sad, angry, and neutral. The study utilises the RAVDESS dataset [1], and features are extracted using Librosa [2]. The model, trained with a Multi-Layer Perceptron (MLP), achieves a test accuracy of 72.22% [3]. A Flask-based web application is built for emotion analysis through uploaded audio files. The system has potential applications in mental health monitoring, interactive assistants, and human-computer communication.

Keywords : Speech Emotion Recognition, MFCC, Chroma, Mel Spectrogram, MLP Classifier, Flask, RAVDESS Dataset.

References :

  1. Livingstone, S. R., & Russo, F. A. (2018). RAVDESS: Ryerson Audio-Visual Database of Emotional Speech and Song. https://doi.org/10.5281/zenodo.1188976
  2. McFee, B., et al. (2015). Librosa: Audio and Music Signal Analysis in Python. https://librosa.org/
  3. Geron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media.
  4. El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features and methods.
  5. Schuller, B., Steidl, S., & Batliner, A. (2009). The INTERSPEECH 2009 Emotion Challenge.
  6. Fayek, H. M., Lech, M., & Cavedon, L. (2017). Evaluating deep learning architectures for speech emotion recognition. arXiv:1705.03122.
  7. Zhao, J., Mao, X., & Chen, L. (2019). Speech emotion recognition using deep CNN-LSTM networks.
  8. Latif, S., Rana, R., Khalifa, S., et al. (2020). Survey of Deep Learning Techniques for SER. IEEE Access.
  9. Cho, K., et al. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation.
  10. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
  11. Creswell, J. W. (2014). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches.
  12. Pedregosa, F., et al. (2011). Scikit-learn: Machine Learning in Python. https://scikit-learn.org/

Speech Emotion Recognition (SER) is a growing field in artificial intelligence focused on identifying human emotions from vocal input. This research presents a web-based system that processes speech signals, extracts meaningful acoustic features, and applies a machine learning model to classify emotions such as happy, sad, angry, and neutral. The study utilises the RAVDESS dataset [1], and features are extracted using Librosa [2]. The model, trained with a Multi-Layer Perceptron (MLP), achieves a test accuracy of 72.22% [3]. A Flask-based web application is built for emotion analysis through uploaded audio files. The system has potential applications in mental health monitoring, interactive assistants, and human-computer communication.

Keywords : Speech Emotion Recognition, MFCC, Chroma, Mel Spectrogram, MLP Classifier, Flask, RAVDESS Dataset.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe