Authors :
Diwakar Mainali; Saraswoti Shrestha; Umesh Thapa; Sanjib Nepali
Volume/Issue :
Volume 9 - 2024, Issue 10 - October
Google Scholar :
https://tinyurl.com/mrxsexjv
Scribd :
https://tinyurl.com/586c35ex
DOI :
https://doi.org/10.38124/ijisrt/IJISRT24OCT1478
Abstract :
Emotion recognition from text and speech has
become a critical area of research in artificial intelligence
(AI), enhancing human-computer interaction across
various sectors. This paper explores the methodologies
used in emotion recognition, focusing on Natural
Language Processing (NLP) for text and acoustic analysis
for speech. It reviews key machine learning and deep
learning models, including Support Vector Machines
(SVM), neural networks, and transformers, and
highlights the datasets commonly used in emotion
detection studies. The paper also addresses challenges
such as multimodal integration, data ambiguity, and
ethical considerations like privacy concerns and bias in
models. Applications in customer service, healthcare,
education, and entertainment are discussed, showcasing
the growing importance of emotion recognition in AI-
driven systems. Future research directions, including
advancements in deep learning, multimodal systems, and
real-time processing, are also explored to address existing
limitations.
Keywords :
Emotion Recognition, Natural Language Processing (NLP), Acoustic Analysis, Machine Learning, Deep Learning, Speech Emotion Recognition, Text Emotion
References :
- N. Alswaidan and M. E. B. Menai, "A survey of state-of-the-art approaches for emotion recognition in text," Knowledge and Information Systems, vol. 62, no. 8, pp. 2937-2987, 2020.
- N. Braunschweiler, R. Doddipatla, S. Keizer, and S. Stoyanchev, "Factors in emotion recognition with deep learning models using speech and text on multiple corpora," IEEE Signal Processing Letters, vol. 29, pp. 722-726, 2022.
- S. W. Byun, J. H. Kim, and S. P. Lee, "Multi-modal emotion recognition using speech features and text-embedding," Applied Sciences, vol. 11, no. 17, p. 7967, 2021.
- M. R. Makiuchi, K. Uto, and K. Shinoda, "Multimodal emotion recognition with high-level speech and text features," in 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021, pp. 350-357.
- K. Sailunaz, M. Dhaliwal, J. Rokne, and R. Alhajj, "Emotion detection from text and speech: A survey," Social Network Analysis and Mining, vol. 8, no. 1, p. 28, 2018.
- P. Singh, R. Srivastava, K. P. S. Rana, and V. Kumar, "A multimodal hierarchical approach to speech emotion recognition from audio and text," Knowledge-Based Systems, vol. 229, p. 107316, 2021.
- O. Verkholyak, A. Dvoynikova, and A. Karpov, "A bimodal approach for speech emotion recognition using audio and text," J. Internet Serv. Inf. Secur., vol. 11, no. 1, pp. 80-96, 2021.
- C. Wu, C. Huang, and H. Chen, "Text-independent speech emotion recognition using frequency adaptive features," Multimedia Tools and Applications, vol. 77, pp. 24353-24363, 2018.
- H. Xu, H. Zhang, K. Han, Y. Wang, Y. Peng, and X. Li, "Learning alignment for multimodal emotion recognition from speech," arXiv preprint arXiv:1909.05645, 2019.
- S. Yoon, S. Byun, and K. Jung, "Multimodal speech emotion recognition using audio and text," in 2018 IEEE Spoken Language Technology Workshop (SLT), 2018, pp. 112-118.
- S. K. Bharti, S. Varadhaganapathy, R. K. Gupta, P. K. Shukla, M. Bouye, S. K. Hingaa, and A. Mahmoud, "Text-based emotion recognition using deep learning approach," Computational Intelligence and Neuroscience, vol. 2022, p. 2645381, 2022.
- H. Lian, C. Lu, S. Li, Y. Zhao, C. Tang, and Y. Zong, "A survey of deep learning-based multimodal emotion recognition: Speech, text, and face," Entropy, vol. 25, no. 10, p. 1440, 2023.
- X. Cai, J. Yuan, R. Zheng, L. Huang, and K. Church, "Speech emotion recognition with multi-task learning," in Interspeech, 2021, pp. 4508-4512.
- B. T. Atmaja, K. Shirai, and M. Akagi, "Speech emotion recognition using speech feature and word embedding," in 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2019, pp. 519-523.
- L. Pepino, P. Riera, L. Ferrer, and A. Gravano, "Fusion approaches for emotion recognition from speech using acoustic and text-based features," in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6484-6488.
- M. S. Fahad, A. Ranjan, J. Yadav, and A. Deepak, "A survey of speech emotion recognition in natural environment," Digital Signal Processing, vol. 110, p. 102951, 2021.
- P. Hajek and M. Munk, "Speech emotion recognition and text sentiment analysis for financial distress prediction," Neural Computing and Applications, vol. 35, no. 29, pp. 21463-21477, 2023.
- T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, "M3er: Multiplicative multimodal emotion recognition using facial, textual, and speech cues," in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 2, pp. 1359-1367.
- P. Kumar, V. Kaushik, and B. Raman, "Towards the explainability of multimodal speech emotion recognition," in Interspeech, 2021, pp. 1748-1752.
- X. Zhang, M. J. Wang, and X. D. Guo, "Multi-modal emotion recognition based on deep learning in speech, video and text," in 2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP), 2020, pp. 328-333.
Emotion recognition from text and speech has
become a critical area of research in artificial intelligence
(AI), enhancing human-computer interaction across
various sectors. This paper explores the methodologies
used in emotion recognition, focusing on Natural
Language Processing (NLP) for text and acoustic analysis
for speech. It reviews key machine learning and deep
learning models, including Support Vector Machines
(SVM), neural networks, and transformers, and
highlights the datasets commonly used in emotion
detection studies. The paper also addresses challenges
such as multimodal integration, data ambiguity, and
ethical considerations like privacy concerns and bias in
models. Applications in customer service, healthcare,
education, and entertainment are discussed, showcasing
the growing importance of emotion recognition in AI-
driven systems. Future research directions, including
advancements in deep learning, multimodal systems, and
real-time processing, are also explored to address existing
limitations.
Keywords :
Emotion Recognition, Natural Language Processing (NLP), Acoustic Analysis, Machine Learning, Deep Learning, Speech Emotion Recognition, Text Emotion