Sentiment Analysis of IMDb Movie Reviews


Authors : S.M. Yousuf Iqbal Tomal

Volume/Issue : Volume 9 - 2024, Issue 5 - May

Google Scholar : https://tinyurl.com/5mjr6peb

Scribd : https://tinyurl.com/mreuf7kn

DOI : https://doi.org/10.38124/ijisrt/IJISRT24MAY1625

Abstract : This paper presents a sentiment analysis project focusing on IMDb movie reviews, aimed at classifying reviews as either positive or negative based on their textual content. Utilizing a dataset of 50,000 IMDb movie reviews, sourced from Kaggle, the study addresses the binary classification challenge by employing pre- processing techniques such as TF-IDF vectorization. The dataset is split into training and testing sets, with models trained on the former and evaluated on the latter. Three machine learning algorithms—Logistic Regression, Random Forest, and Decision Tree—are implemented and compared using performance metrics including precision, recall, and F1-score. Results indicate that Logistic Regression outperforms other models in sentiment analysis classification. The report concludes by highlighting the project’s contributions and suggesting avenues for future research, emphasizing the potential benefits of expanding sentiment types and dataset size.

References :

  1. Svetlana Kiritchenko, Xiaodan Zhu, and Saif M. Mohammad. Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research, 50:723–762, 2014.
  2. Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedingsofthe49thAnnualMeetingoftheAssociation for Computational Linguistics: Human Language Technologies, pages 142–150, 2011.
  3. Soujanya Poria, Erik Cambria, and Alexander Gelbukh. Aspect extraction for opinion mining with a deep convolutional neural network. Knowledge-Based Systems, 108:42–49, 2016.
  4. Sonia Rodr´ıguez-Fernandez and Francisco Ortega. Analysis of the factors influencing sentiment analysis´ accuracy. Expert Systems with Applications, 77:185–200, 2017.
  5. Bogdan I. Vasilescu, Alexander Serebrenik, and Premkumar Devanbu. How social q&a sites are changing knowledge sharing in open source software communities. IEEE Transactions on Software Engineering, 41 (9):900–912, 2015.

This paper presents a sentiment analysis project focusing on IMDb movie reviews, aimed at classifying reviews as either positive or negative based on their textual content. Utilizing a dataset of 50,000 IMDb movie reviews, sourced from Kaggle, the study addresses the binary classification challenge by employing pre- processing techniques such as TF-IDF vectorization. The dataset is split into training and testing sets, with models trained on the former and evaluated on the latter. Three machine learning algorithms—Logistic Regression, Random Forest, and Decision Tree—are implemented and compared using performance metrics including precision, recall, and F1-score. Results indicate that Logistic Regression outperforms other models in sentiment analysis classification. The report concludes by highlighting the project’s contributions and suggesting avenues for future research, emphasizing the potential benefits of expanding sentiment types and dataset size.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe