A Comparative Study of Some Selected Classifiers on an Imbalanced Dataset for Sentiment Analysis


Authors : Mohammed Ali Kawo; Dr. Garba Muhammad; Dr.Danlami Gabi; Dr. Musa Sule Argungu

Volume/Issue : Volume 9 - 2024, Issue 5 - May

Google Scholar : https://tinyurl.com/yhzh9ma4

Scribd : https://tinyurl.com/2yumt3by

DOI : https://doi.org/10.38124/ijisrt/IJISRT24MAY1751

Abstract : Extracting subjective data from online user generated text documents is made quite easy with the use of sentiment analysis. For a classification task different individual algorithms are applied to a review dataset in which most classifiers produce accurate results while others produce limited and inaccurate predictions. This research is to evaluate various machine learning algorithms for online dataset classification, where same set of data will be used to test four different machine learning algorithms: Naive Bayes, Support Vector machine, K-nearest neighbor and Decision tree. In order to determine which machine learning model will perform best in sentiment analysis as a constant issue. In this research, our primary goal is to identify the most effective machine learning model for sentiment analysis of English texts among the aforementioned classifiers. Their robustness will be tested and classified with an imbalanced dataset Kaggle.com a Machine learning repository. The dataset will first undergo data preprocessing in order to enable analysis, and then feature extraction for the base classifiers performance and accuracy which will be carried out in Jupyter notebook from Anaconda. Each machine learning algorithm performance scores will be calculated for higher accuracy using confusion matrix, F1-score, precision and recall respectively.

Keywords : Machine Learning Algorithms, Sentiment Analysis, Imbalanced, Confusion Matrix.

References :

  1. Agustini, T. (2021). Sentiment Analysis on Social Media using Machine Learning-Based Approach. June, 544437.
  2. Arya, P., Bhagat, A., & Nair, R. (2019). Improved Performance of Machine Learning Algorithms via Ensemble Learning Methods of Sentiment Analysis. 10(2), 110–116.
  3. Bahwari. (2019). Sentiment Analysis Using Random 
    Forest Algorithm - Online Social Media Based. 
    Journal Of Information Technology AND ITS UTILIZATION,
     2(2), 29–33. https://www.researchgate.
    net/publication/338548518_SENTIMENT_ANALYSIS_
    USING_RANDOM_FOREST_ALGORITHM_ONLINE_
    SOCIAL_MEDIA_BASED
  4. Feng, W., Gou, J., Fan, Z., & Chen, X. (2023). An ensemble machine learning approach for classification tasks using feature generation. Connection Science, 35(1). https://doi.org/10.1080/ 09540091.2023.2231168
  5. George, S., & Srividhya, V. (2022). Performance Evaluation of Sentiment Analysis on Balanced and Imbalanced Dataset Using Ensemble Approach. Indian Journal of Science and Technology, 15(17), 790–797. https://doi.org/10.17485/ijst/v15i17.2339
  6. Ghosh, S., Hazra, A., & Raj, A. (2020). A Comparative Study of Different Classification Techniques for Sentiment Analysis. International Journal of Synthetic Emotions, 11(1), 49–57. https://doi.org/10.4018/ijse.20200101.oa
  7. Jawale, S. (2019). Sentiment Analysis using Ensemble Learning. May.
  8. Jordan, M. I., & Mitchell, T. M. (2020). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260. https://doi.org/ 10.1126/science.aaa8415
  9. Kawade, D. R., & Oza, D. K. S. (2017). Sentiment Analysis: Machine Learning Approach. International Journal of Engineering and Technology, 9(3), 2183–2186. https://doi.org/10.21817/ijet/2017/v9i3/ 1709030151
  10. Kumar, S., Kaur, N., Kavita, & Joshi, A. (2023). Tweet sentiment analysis using logistic regression. July, 332–336. https://doi.org/10.1049/icp.2023.1801
  11. Lazrig, I., & Humpherys, S. L. (2022). Using Machine Learning Sentiment Analysis to Evaluate Learning Impact. Information Systems Education Journal (ISEDJ), 20(1), 20. https://isedj.org/; https://iscap.info
  12. Liakos, K. G., Busato, P., Moshou, D., Pearson, S., & Bochtis, D. (2018). Machine learning in agriculture: A review. Sensors (Switzerland), 18(8), 1–29. https://doi.org/10.3390/s18082674
  13. Meenu, S. G. (2019). 154. Sunila. International Journal of Electronics Engineering (ISSN: 0973-7383, Volumne 11(• Issue 1), 965–970.
  14. Mostafa, G., Ahmed, I., & Junayed, M. S. (2021). Investigation of Different Machine Learning Algorithms to Determine Human Sentiment Using Twitter Data. International Journal of Information Technology and Computer Science, 13(2), 38–48. https://doi.org/10.5815/ijitcs.2021.02.04
  15. Patel, R. (2017). Sentiment Analysis on Twitter Data Using Machine Learning by Ravikumar Patel A thesis submitted in partial fulfillment of the requirements for the degree of MSc Computational Sciences The Faculty of Graduate Studies.
  16. Tan, K. L., Lee, C. P., & Lim, K. M. (2023). A Survey of Sentiment Analysis: Approaches, Datasets, and Future Research. Applied Sciences (Switzerland), 13(7). https://doi.org/10.3390/app 13074550
  17. Theobald, O. (2017). Machine Learning For Absolute Beginners.
  18. Zishumba, K. (2019). Sentiment Analysis Based on Social Media Data. Journal of Information and Telecommunication, 1–48. http://repository.aust.edu. ng/xmlui/bitstream/handle/123456789/4901/Kudzai Zishumba.pdf?sequence=1&isAllowed=y

Extracting subjective data from online user generated text documents is made quite easy with the use of sentiment analysis. For a classification task different individual algorithms are applied to a review dataset in which most classifiers produce accurate results while others produce limited and inaccurate predictions. This research is to evaluate various machine learning algorithms for online dataset classification, where same set of data will be used to test four different machine learning algorithms: Naive Bayes, Support Vector machine, K-nearest neighbor and Decision tree. In order to determine which machine learning model will perform best in sentiment analysis as a constant issue. In this research, our primary goal is to identify the most effective machine learning model for sentiment analysis of English texts among the aforementioned classifiers. Their robustness will be tested and classified with an imbalanced dataset Kaggle.com a Machine learning repository. The dataset will first undergo data preprocessing in order to enable analysis, and then feature extraction for the base classifiers performance and accuracy which will be carried out in Jupyter notebook from Anaconda. Each machine learning algorithm performance scores will be calculated for higher accuracy using confusion matrix, F1-score, precision and recall respectively.

Keywords : Machine Learning Algorithms, Sentiment Analysis, Imbalanced, Confusion Matrix.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe