Email spam detection using machine learning naive bayes theorem| International Journal of Innovative Science and Research Technology

E-Mail Spam Detection Using Machine Learning Naive Bayes Theorem

Authors : Karra.NAGA SIVA SURYA DILEEP; Krovvidi.KARTHIK SAI SRI RAMA RAJU; Karella.JOHNY; Kombathula.VENKAT; P.SRINU VASA RAO

Volume/Issue : Volume 9 - 2024, Issue 2 - February

Google Scholar : http://tinyurl.com/ys2thp5s

Scribd : http://tinyurl.com/2umktn79

DOI : https://doi.org/10.5281/zenodo.10725536

Abstract : Spam, sometimes called spam, is unsolicited email that is typically sent to large lists of recipients. Although real individuals can send spam, botnets (computer networks infected by an attacker known as a "bully") are often responsible for sending spam. While most people view spam as a problem, they believe it is a result of email communication. In addition to being annoying, spam can also be dangerous because it can clog email inboxes if not filtered properly and deleted frequently. Spammers or spammers often change their methods and content to trick victims into downloading malware, sharing personal information, or feeding money. Most spam is commercial in nature and financially motivated. Spammers attempt to deceive recipients by making false claims, selling questionable products, and promoting false information. Unwanted emails, such as phishing and spam, cost businesses and individuals billions of dollars each year. Many models and techniques for automatic spam detection have been introduced and developed, but 100% accuracy has not yet been found. Among all designs, machine and deep learning algorithms are more successful. Natural language processing (NLP) improves model accuracy. This study presents the effectiveness of word embedding in spam classification. Preliminary study Transformer model BERT (Bidirectional Encoder Represented by Transformers) is well tuned to accomplish the task of identifying spam from non-spam (HAM). BERT uses a color layer to place the content of the text into its perspective. The results were compared with the basic DNN (Deep Neural Network) model consisting of BiLSTM (Bidirectional Long Term Memory) layer and two thick layers.  Here are some of the most popular spam topics: Pharmaceuticals, financial services, working from home, porn, online courses and cryptocurrency.

Keywords : Machine Learning, Natural Language Processing, Spam, Ham, Email, Naive Bayes, Logistic Regression.

Spam, sometimes called spam, is unsolicited email that is typically sent to large lists of recipients. Although real individuals can send spam, botnets (computer networks infected by an attacker known as a "bully") are often responsible for sending spam. While most people view spam as a problem, they believe it is a result of email communication. In addition to being annoying, spam can also be dangerous because it can clog email inboxes if not filtered properly and deleted frequently. Spammers or spammers often change their methods and content to trick victims into downloading malware, sharing personal information, or feeding money. Most spam is commercial in nature and financially motivated. Spammers attempt to deceive recipients by making false claims, selling questionable products, and promoting false information. Unwanted emails, such as phishing and spam, cost businesses and individuals billions of dollars each year. Many models and techniques for automatic spam detection have been introduced and developed, but 100% accuracy has not yet been found. Among all designs, machine and deep learning algorithms are more successful. Natural language processing (NLP) improves model accuracy. This study presents the effectiveness of word embedding in spam classification. Preliminary study Transformer model BERT (Bidirectional Encoder Represented by Transformers) is well tuned to accomplish the task of identifying spam from non-spam (HAM). BERT uses a color layer to place the content of the text into its perspective. The results were compared with the basic DNN (Deep Neural Network) model consisting of BiLSTM (Bidirectional Long Term Memory) layer and two thick layers.  Here are some of the most popular spam topics: Pharmaceuticals, financial services, working from home, porn, online courses and cryptocurrency.

Keywords : Machine Learning, Natural Language Processing, Spam, Ham, Email, Naive Bayes, Logistic Regression.