A Comparative Study of Malicious URL Detection Model: CNN vs. Logistic Regression and Gated Recurrent Units


Authors : Dr. Umejuru Danie; Dr. Ugbari Augustine

Volume/Issue : Volume 10 - 2025, Issue 7 - July


Google Scholar : https://tinyurl.com/bddctb33

Scribd : https://tinyurl.com/yukc9mez

DOI : https://doi.org/10.38124/ijisrt/25jul1306

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Note : Google Scholar may take 30 to 40 days to display the article.


Abstract : Malicious URLs pose a serious threat on the world wide web to users all over the world. The challenges emanating from URLs which are malicious are many and very worrisome to internet users globally. This has informed, and thus propels the development of newer models to solve the lingering challenge in the Cyber Security space. These newer notification and detection models are been developed in other to mitigate the gaps and also curb the challenges caused by unknowingly using or clicking further using a malicious URL. This study aims at developing a novel malicious URL detection and notification model by using CNN and further incorporating CNN with penalty term on kernel, weight and bias in other to increase models detection accuracy, reduce time complexity and also address misclassification issues as well as poor prediction accuracy. The CNN with penalty term is being used against Logistic Regression (LR) and Recurrent Gated Units (RNN- GRU) which increased the resilience of the suggested model as well as enhancing classification prediction. The diagnostic tools employed for the proposed model are accuracy, confusion matrix, precision, recall, F1 score, and AUC-ROC. This study outlined a novel method capable of identifying malicious URLs using features primarily obtained from the phishing and real URL addresses. A temporal tokenizer was generated and used for URL text processing which scanned, recognized characters, symbols and redundant tokens. This made it easier to separate specific features from the URL address and return as a list while also identifying directories, keyword arguments, and extensions. Summary of the experimental results shows that the proposed CNN with penalty term (98.2%) fared better than LR and RNN-GRU approaches which yielded a recommendable prediction accuracy of 89.85% and 91.5% respectively.

Keywords : CNN, Logistic Regression, Gated-Recurrent Network, URL, Penalty Term.

References :

  1. AL-Otaibi, A. F. and Alsuwat, E. S.(2020) A study on social engineering attacks: phishing attack, International Journal of Recent Advances in Multidisciplinary Research, 7(11), 6374-6380.
  2. Baig, M. S. Ahmed F. and Memon, A. M.(2021) Spear-Phishing campaigns: Link Vulnerability leads to phishing attacks, Spear Phishing electronic/UAV communication-scam targeted, 2021 4th International Conference on Computing & Information Sciences (ICCIS), 1-6, doi: 10.1109/ICCIS54243.2021.9676394.
  3. Basit, A. Zafar, M., Javed, A. R. and Jalil, Z.(2020) A Novel Ensemble Machine Learning Method to Detect Phishing Attack, Telecommunication System, 23(4), 1-20. doi: 10.1109/INMIC50486.2020.9318210.
  4. Basit, A., Zafar, M., Liu, X., Javed, A.R., Jalil, Z., Kifayat, K., (2021). A comprehensive survey of AI-enabled phishing attacks detection techniques. Telecommunication System. 76(1), 139–154. https://doi.org/10.1007/s11235-020-00733-2.
  5. Duffner, S. Garcia, C.(2021) An online back propagation algorithm with validation error based adaptive learning rate, in: Artificial Neural Networks, Porto, Portugal, 34.
  6. Dželila, M and Kevrić, J.(2020) Phishing Website Detection Using Machine Learning Classifiers Optimized by Feature Selection, Traitement du Signal, 37(4), 34-45
  7. Ferrag, M. A., Maglaras, L., Moschoyiannis, S., & Janicke, H. (2020). Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. Journal of Information Security and Applications, 50, 102419.
  8. Fu, A. Y., Liu, W. & Deng, X. T.(2021). Detecting Phishing web Pages with Visual Similarity Assessment based on Earth Mover’s Distance (EMD), IEEE Transactions on Dependable and Secure Computing, 3(4), 301-311.
  9. Jain, A. K., Parashar, S., Katare, P., & Sharma, I. (2020). Phishskape: A content based approach to escape phishing attacks. Procedia Computer Science171, 1102–1109.
  10. Javed, A. R., Jalil, Z., Moqurrab, S. A., Abbas, S., and Liu, X. (2020), Ensemble Ada Boost classifier for accurate and fast detection of botnet attacks in connected vehicles, Transactions on Emerging Telecommunications Technologies, 45.
  11. Kumar, A., Chatterjee, J. M., & Díaz, V. G. (2020). A novel hybrid approach of svm combined with nlp and probabilistic neural network for email phishing. International Journal of Electrical and Computer Engineering, 10(1), 486.
  12. Li, Y., Yang, Z., Chen, X., Yuan, H., & Liu, W. (2020). A stacking model using url and html features for phishing webpage detection. Future Generation Computer Systems, 94, 27–39
  13. Liu, X., Fu, J.,(2020). SPWalk: Similar Property Oriented Feature Learning for Phishing Detection. IEEE Access 8, 87031–87045. https://doi.org/10.1109/ ACCESS, 2992381
  14. Liu, T.,  Zheng, H., Zheng, P., Bao, J., Wang, J., Liu, X. and Yang, C.(2023) An expert knowledge-empowered CNN approach for welding radiographic image recognition, Advanced Engineering Informatics, 56, 101963, ISSN 1474-0346
  15. Maurya, S. and Jain, A. (2020). Deep learning to combat phishing, Journal of Statistics and Management Systems, 1–13.
  16. Mittal, M., Iwendi, C., Khan, S., and Rehman-Javed, A. (2020). Analysis of security and energy efficiency for shortest route discovery in low-energy adaptive clustering hierarchy protocol using Levenberg–Marquardt neural network and gated recurrent unit for intrusion detection system. Transactions on Emerging Telecommunications Technologies, p. e3997.
  17. Rashid, J., Mahmood, T., Nisar, M. W., Nazir, T.(2020) Phishing detection using machine learning technique, in: 2020 First International Conference of Smart Systems and Emerging Technologies (SMARTTECH), 43–46.
  18. Shie, E. W. S. (2020). Critical analysis of current research aimed at improving detection of phishing attacks, Selected computing research papers, p. 45.
  19. Sindhu, S., Patil, S.P., Sreevalsan, A., Rahman, F., Saritha, A.N., (2020). Phishing detection using random forest, SVM and neural network with back propagation. In: Proceedings of the International Conference on Smart Technologies in Computing, Electrical and Electronics(ICSTCEE), 391–394. https://doi. org/10.1109/ICSTCEE49637.2020.9277256.
  20. Verma, R., Shashidhar, N., & Hossain, N. (2020). Detecting Phishing Emails the Natural Language Way. In Computer Security–ESORICS, 824-841.
  21. Yuan, L., Zeng, Z.,  Lu, Y., Ou, X. and Feng, T.(2020) A character-level bigru-attention for phishing classification. In Information and Communications Security: 21st International Conference, ICICS 2020, springer, 15–17.
  22. Zamir, A, Hu, K., Iqbal, T., Yousaf,  N., and Aslam F.(2020), Phishing web site detection using diverse machine learning algorithms, The Electronic Library, 38(1), 65–80
  23. Zhu, E., Ju, Y., Chen, Z., Liu, F., Fang, X.,( 2020). DTOF-ANN: an artificial neural network phishing
  24. detection model based on decision tree and optimal features. Application of Soft Computing J. 95,. https://doi.org/10.1016/j.asoc.2020.106505 106505

Malicious URLs pose a serious threat on the world wide web to users all over the world. The challenges emanating from URLs which are malicious are many and very worrisome to internet users globally. This has informed, and thus propels the development of newer models to solve the lingering challenge in the Cyber Security space. These newer notification and detection models are been developed in other to mitigate the gaps and also curb the challenges caused by unknowingly using or clicking further using a malicious URL. This study aims at developing a novel malicious URL detection and notification model by using CNN and further incorporating CNN with penalty term on kernel, weight and bias in other to increase models detection accuracy, reduce time complexity and also address misclassification issues as well as poor prediction accuracy. The CNN with penalty term is being used against Logistic Regression (LR) and Recurrent Gated Units (RNN- GRU) which increased the resilience of the suggested model as well as enhancing classification prediction. The diagnostic tools employed for the proposed model are accuracy, confusion matrix, precision, recall, F1 score, and AUC-ROC. This study outlined a novel method capable of identifying malicious URLs using features primarily obtained from the phishing and real URL addresses. A temporal tokenizer was generated and used for URL text processing which scanned, recognized characters, symbols and redundant tokens. This made it easier to separate specific features from the URL address and return as a list while also identifying directories, keyword arguments, and extensions. Summary of the experimental results shows that the proposed CNN with penalty term (98.2%) fared better than LR and RNN-GRU approaches which yielded a recommendable prediction accuracy of 89.85% and 91.5% respectively.

Keywords : CNN, Logistic Regression, Gated-Recurrent Network, URL, Penalty Term.

CALL FOR PAPERS


Paper Submission Last Date
31 - December - 2025

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe