Authors :
ZEWAR Shah; SHAN Zhiyong; Adnan
Volume/Issue :
Volume 9 - 2024, Issue 4 - April
Google Scholar :
https://tinyurl.com/2z5f7929
Scribd :
https://tinyurl.com/y6h42sju
DOI :
https://doi.org/10.38124/ijisrt/IJISRT24APR872
Abstract :
Speech is essential to human communication for
expressing and understanding feelings. Emotional speech
processing has challenges with expert data sampling,
dataset organization, and computational complexity in
large-scale analysis. This study aims to reduce data
redundancy and high dimensionality by introducing a new
speech emotion recognition system. The system employs
Diffusion Map to reduce dimensionality and includes
Decision Trees and K-Nearest Neighbors(KNN)ensemble
classifiers. These strategies are suggested to increase voice
emotion recognition accuracy. Speech emotion recognition
is gaining popularity in affective computing for usage in
medical, industry, and academics. This project aims to
provide an efficient and robust real-time emotion
identification framework. In order to identify emotions
using supervised machine learning models, this work makes
use of paralinguistic factors such as intensity, pitch, and
MFCC. In order to classify data, experimental analysis
integrates prosodic and spectral information utilizing
methods like Random Forest, Multilayer Perceptron, SVM,
KNN, and Gaussian Naïve Bayes. Fast training times make
these machine learning models excellent for real-time
applications. SVM and MLP have the highest accuracy at
70.86% and 79.52%, respectively. Comparisons to
benchmarks show significant improvements over earlier
models.
Keywords :
Feature Extraction, KNN, Speech Emotions, Diffusion Map, MFCC, and Feature Engineering.
Speech is essential to human communication for
expressing and understanding feelings. Emotional speech
processing has challenges with expert data sampling,
dataset organization, and computational complexity in
large-scale analysis. This study aims to reduce data
redundancy and high dimensionality by introducing a new
speech emotion recognition system. The system employs
Diffusion Map to reduce dimensionality and includes
Decision Trees and K-Nearest Neighbors(KNN)ensemble
classifiers. These strategies are suggested to increase voice
emotion recognition accuracy. Speech emotion recognition
is gaining popularity in affective computing for usage in
medical, industry, and academics. This project aims to
provide an efficient and robust real-time emotion
identification framework. In order to identify emotions
using supervised machine learning models, this work makes
use of paralinguistic factors such as intensity, pitch, and
MFCC. In order to classify data, experimental analysis
integrates prosodic and spectral information utilizing
methods like Random Forest, Multilayer Perceptron, SVM,
KNN, and Gaussian Naïve Bayes. Fast training times make
these machine learning models excellent for real-time
applications. SVM and MLP have the highest accuracy at
70.86% and 79.52%, respectively. Comparisons to
benchmarks show significant improvements over earlier
models.
Keywords :
Feature Extraction, KNN, Speech Emotions, Diffusion Map, MFCC, and Feature Engineering.