Machine Learning Algorithm Based Email Spam Detection


Authors : N. Bhavana; Avulapalli Sowmya

Volume/Issue : Volume 10 - 2025, Issue 5 - May


Google Scholar : https://tinyurl.com/mr7kcknb

DOI : https://doi.org/10.38124/ijisrt/25may1564

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : Email has emerged as the main channel for both personal and professional interaction due to the quick expansion of digital communication. However, because email is so widely used, there is an increase in spam communications, which can be anything from malicious phishing attempts to innocuous adverts. In addition to filling consumers' inboxes, these spam emails present serious security risks. Conventional rule-based spam filters have not been able to keep up with the changing nature of spam. Because machine learning (ML) algorithms can learn from data and get better over time, they are increasingly being incorporated into spam detection systems. In this paper, a variety of machine learning algorithms, such as Random Forest, Support Vector Machine (SVM), and Naive Bayes, are used to detect email spam. Using a publically accessible dataset, we assess these models on the basis of accuracy, precision, recall, and F1-score. According to our findings, machine learning models—in particular, ensemble approaches—provide reliable and expandable spam detection systems.

Keywords : Machine Learning, Spam, Emails.

References :

  1. Androutsopoulos, I., Koutsias, J., Chandrinos, K. V., & Spyropoulos, C. D. (2000). An experimental comparison of naive Bayesian and keyword-based anti-spam filtering. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval.
  2. Sahami, M., Dumais, S., Heckerman, D., & Horvitz, E. (1998). A Bayesian approach to filtering junk e-mail. Learning for Text Categorization: Papers from the 1998 Workshop.
  3. Drucker, H., Wu, D., &Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10(5), 1048–1054.
  4. Carreras, X., & Marquez, L. (2001). Boosting trees for anti-spam email filtering. In Proceedings of the 4th International Conference on Recent Advances in Natural Language Processing (RANLP 2001).
  5. Metsis, V., Androutsopoulos, I., &Paliouras, G. (2006). Spam filteringwith naive Bayes – Which naive Bayes? In CEAS.
  6. Goodman, J., Cormack, G. V., & Heckerman, D. (2007). Spam and the ongoing battle for the inbox. Communications of the ACM, 50(2), 24-33.
  7. Blanzieri, E., & Bryl, A. (2008). A survey of learning-based techniques of email spam filtering. Artificial Intelligence Review, 29(1), 63–92.
  8. Zhang, L., Zhu, J., & Yao, T. (2004). An evaluation of statistical spam filtering techniques. ACM Transactions on Asian Language Information Processing (TALIP), 3(4), 243–269.
  9. Cormack, G. V. (2007). Email spam filtering: A systematic review. Foundations and Trends in Information Retrieval, 1(4), 335–455.
  10. Hidalgo, J. M. G., Bringas, G. C., Sánz, E. P., & García, F. C. (2006). Content based SMS spam filtering. In Proceedings of the 2006 ACM symposium on Document engineering, 107–114.

Email has emerged as the main channel for both personal and professional interaction due to the quick expansion of digital communication. However, because email is so widely used, there is an increase in spam communications, which can be anything from malicious phishing attempts to innocuous adverts. In addition to filling consumers' inboxes, these spam emails present serious security risks. Conventional rule-based spam filters have not been able to keep up with the changing nature of spam. Because machine learning (ML) algorithms can learn from data and get better over time, they are increasingly being incorporated into spam detection systems. In this paper, a variety of machine learning algorithms, such as Random Forest, Support Vector Machine (SVM), and Naive Bayes, are used to detect email spam. Using a publically accessible dataset, we assess these models on the basis of accuracy, precision, recall, and F1-score. According to our findings, machine learning models—in particular, ensemble approaches—provide reliable and expandable spam detection systems.

Keywords : Machine Learning, Spam, Emails.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe