A Hybrid Machine Learning Approach for Fake Job Posting Detection: Integrating Naive Bayes and Logistic Regression Models


Authors : Shubham Sonkar; Shreyash Yadav; Dr. Sunder R

Volume/Issue : Volume 10 - 2025, Issue 6 - June


Google Scholar : https://tinyurl.com/3pjfj2cz

DOI : https://doi.org/10.38124/ijisrt/25jun458

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : The recruitment sector's digital transformation greatly eased the process of job hunting, and the use of online job portals increased, but it also paved the way for malicious players to put up scam job listings on these websites. These scam listings can result in job scams, data breaches, and even emotional trauma to the prospective employees. In order to address the growing menace of job scam postings, this study presents a hybrid system to detect spam job postings in a more precise and dependable manner. The system couples the Naive Bayes-based probabilistic capabilities, especially useful when applied to text- intensive data, with Logistic Regression's predictive ability, which excels in the case of binary class problems. By using the two techniques combined, the system effectively encompasses linguistic traits typical of spam job postings, along with better performance in the task of classification. This study uses a high-quality, real-world dataset composed of heterogeneous job-related features, including unstructured text fields like job titles and company info, and structured attributes like employment type, locality, and business type. Preprocessing techniques are added to augment the analysis of the unstructured text, including tokenizing, stopword removal, stemming, and vectorization using TF-IDF. Both feature scaling and feature set gathering are also used to reduce the dimensionality of the data and increase the efficiency of the model. Multiple metrics of classification are assessed using the combined system, including precision, recall, F1-score, and ROC-AUC. Experiments exhibit the comparative superiority of the combined system, especially in precision and recall in detection of even slight indicators of spam postings. This study provides a viable and scalable measure to raise the security of the recruitment system online by facilitating preliminary detection of scam job opportunities and helping in augmenting security, trust, and, more importantly, the safety of millions of job seekers it is likely to integrate alongside existing job portal systems, thus enhancing safety and trust amongst its numerous users.

Keywords : Fake Job Posting, Fraud Detection, Machine Learning, Hybrid Model, Naive Bayes, Logistic Regression, Text Classification, Cybersecurity, Employment Scam, Online Recruitment, Natural Language Processing (NLP), Feature Extraction, Binary Classification, Job Portal Security, Scam Detection.

References :

  1. Tabassum H, Ghosh G, Atika A, Chakrabarty A (2021) Detecting online recruitment fraud using machine learning. In: 2021 9th international conference on information and communication technology (ICoICT), pp 472–477.
  2. Nindyati O, Bagus Baskara Nugraha IG (2019) Detecting scam in online job vacancy using behavioral features extraction. In: 2019 international conference on ICT for smart society (ICISS), vol 7, pp 1–4.
  3. Nasser IM, Alzaanin AH, Maghari AY (2021) Online recruitment fraud detection using ann. In: 2021 palestinian international conference on information   and         communication technology (PICICT), pp 13–17.
  4. Vidros S, Kolias C, Kambourakis G, Akoglu L (2017) Automatic detection of online recruitment frauds: characteristics, methods, and a public dataset. Futur Internet 9(1).
  5. Habiba SU, Islam MK, Tasnim F (2021) A comparative study on fake job post prediction using different data mining techniques. In: 2021 2nd international conference on robotics, electrical and signal processing techniques (ICREST), pp 543–546.
  6. Ranparia D, Kumari S, Sahani A (2020) Fake job prediction using sequential network, pp 339–343
  7. Pennington J, Socher R, Manning C (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP).  Association           for Computational Linguistics, Doha, Qatar, pp 1532–1543.
  8. Keerthana B, Reddy AR, Tiwari A (2021) Accurate prediction of fake job offers using machine learning. In: Bhattacharyya D, Thirupathi Rao N (eds) Machine intelligence and soft computing, vol 1280. Springer, Singapore, pp 101–112.
  9. Lal S, Jiaswal R, Sardana N, Verma A, Kaur A, Mourya R (2019) Orfdetector: ensemble learning based online recruitment fraud detection. In: 2019 twelfth international conference on contemporary computing (IC3), pp 1– 5.
  10. Mehboob A, Malik MS (2020) Smart fraud detection framework for job recruitments. Arab J Sci Eng 46.
  11. Amaar A, Aljedaani W, Rustam F, Ullah DS, Rupapara V, Ludi S (2022) Detection of fake job postings by utilizing machine learning and natural language processing approaches. Neural Process Lett 54:1–29.
  12. Naudé M, Adebayo K, Nanda R (2022) A machine learning approach to detecting fraudulent job types. AI SOCIETY.
  13. Real/fake job posting prediction.
  14. Mikolov T, Corrado GS, Chen K, Dean J (2013) Efficient estimation of word representations in vector space, pp 1– 12.
  15. Mikolov T, Corrado GS, Chen K, Dean J (2013) Efficient estimation of word representations in vector space, pp 1– 12.
  16. Qaiser S, Ali R (2018) Text mining: use  of  TF-IDF  to  examine  the relevance of words to documents. Int J Comput Appl 181.
  17. He H, Bai Y, Garcia E, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning, pp 1322–1328.
  18. Joulin A, Grave E, Bojanowski P, Mikolov T (2016) Bag of tricks for efficient text classification.

The recruitment sector's digital transformation greatly eased the process of job hunting, and the use of online job portals increased, but it also paved the way for malicious players to put up scam job listings on these websites. These scam listings can result in job scams, data breaches, and even emotional trauma to the prospective employees. In order to address the growing menace of job scam postings, this study presents a hybrid system to detect spam job postings in a more precise and dependable manner. The system couples the Naive Bayes-based probabilistic capabilities, especially useful when applied to text- intensive data, with Logistic Regression's predictive ability, which excels in the case of binary class problems. By using the two techniques combined, the system effectively encompasses linguistic traits typical of spam job postings, along with better performance in the task of classification. This study uses a high-quality, real-world dataset composed of heterogeneous job-related features, including unstructured text fields like job titles and company info, and structured attributes like employment type, locality, and business type. Preprocessing techniques are added to augment the analysis of the unstructured text, including tokenizing, stopword removal, stemming, and vectorization using TF-IDF. Both feature scaling and feature set gathering are also used to reduce the dimensionality of the data and increase the efficiency of the model. Multiple metrics of classification are assessed using the combined system, including precision, recall, F1-score, and ROC-AUC. Experiments exhibit the comparative superiority of the combined system, especially in precision and recall in detection of even slight indicators of spam postings. This study provides a viable and scalable measure to raise the security of the recruitment system online by facilitating preliminary detection of scam job opportunities and helping in augmenting security, trust, and, more importantly, the safety of millions of job seekers it is likely to integrate alongside existing job portal systems, thus enhancing safety and trust amongst its numerous users.

Keywords : Fake Job Posting, Fraud Detection, Machine Learning, Hybrid Model, Naive Bayes, Logistic Regression, Text Classification, Cybersecurity, Employment Scam, Online Recruitment, Natural Language Processing (NLP), Feature Extraction, Binary Classification, Job Portal Security, Scam Detection.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe