Authors :
Dr. Umejuru Danie; Dr. Ugbari Augustine
Volume/Issue :
Volume 10 - 2025, Issue 7 - July
Google Scholar :
https://tinyurl.com/bddctb33
Scribd :
https://tinyurl.com/yukc9mez
DOI :
https://doi.org/10.38124/ijisrt/25jul1306
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Note : Google Scholar may take 30 to 40 days to display the article.
Abstract :
Malicious URLs pose a serious threat on the world wide web to users all over the world. The challenges emanating
from URLs which are malicious are many and very worrisome to internet users globally. This has informed, and thus propels
the development of newer models to solve the lingering challenge in the Cyber Security space. These newer notification and
detection models are been developed in other to mitigate the gaps and also curb the challenges caused by unknowingly using
or clicking further using a malicious URL. This study aims at developing a novel malicious URL detection and notification
model by using CNN and further incorporating CNN with penalty term on kernel, weight and bias in other to increase
models detection accuracy, reduce time complexity and also address misclassification issues as well as poor prediction
accuracy. The CNN with penalty term is being used against Logistic Regression (LR) and Recurrent Gated Units (RNN-
GRU) which increased the resilience of the suggested model as well as enhancing classification prediction. The diagnostic
tools employed for the proposed model are accuracy, confusion matrix, precision, recall, F1 score, and AUC-ROC. This
study outlined a novel method capable of identifying malicious URLs using features primarily obtained from the phishing
and real URL addresses. A temporal tokenizer was generated and used for URL text processing which scanned, recognized
characters, symbols and redundant tokens. This made it easier to separate specific features from the URL address and
return as a list while also identifying directories, keyword arguments, and extensions. Summary of the experimental results
shows that the proposed CNN with penalty term (98.2%) fared better than LR and RNN-GRU approaches which yielded a
recommendable prediction accuracy of 89.85% and 91.5% respectively.
Keywords :
CNN, Logistic Regression, Gated-Recurrent Network, URL, Penalty Term.
References :
- AL-Otaibi, A. F. and Alsuwat, E. S.(2020) A study on social engineering attacks: phishing attack, International Journal of Recent Advances in Multidisciplinary Research, 7(11), 6374-6380.
- Baig, M. S. Ahmed F. and Memon, A. M.(2021) Spear-Phishing campaigns: Link Vulnerability leads to phishing attacks, Spear Phishing electronic/UAV communication-scam targeted, 2021 4th International Conference on Computing & Information Sciences (ICCIS), 1-6, doi: 10.1109/ICCIS54243.2021.9676394.
- Basit, A. Zafar, M., Javed, A. R. and Jalil, Z.(2020) A Novel Ensemble Machine Learning Method to Detect Phishing Attack, Telecommunication System, 23(4), 1-20. doi: 10.1109/INMIC50486.2020.9318210.
- Basit, A., Zafar, M., Liu, X., Javed, A.R., Jalil, Z., Kifayat, K., (2021). A comprehensive survey of AI-enabled phishing attacks detection techniques. Telecommunication System. 76(1), 139–154. https://doi.org/10.1007/s11235-020-00733-2.
- Duffner, S. Garcia, C.(2021) An online back propagation algorithm with validation error based adaptive learning rate, in: Artificial Neural Networks, Porto, Portugal, 34.
- Dželila, M and Kevrić, J.(2020) Phishing Website Detection Using Machine Learning Classifiers Optimized by Feature Selection, Traitement du Signal, 37(4), 34-45
- Ferrag, M. A., Maglaras, L., Moschoyiannis, S., & Janicke, H. (2020). Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. Journal of Information Security and Applications, 50, 102419.
- Fu, A. Y., Liu, W. & Deng, X. T.(2021). Detecting Phishing web Pages with Visual Similarity Assessment based on Earth Mover’s Distance (EMD), IEEE Transactions on Dependable and Secure Computing, 3(4), 301-311.
- Jain, A. K., Parashar, S., Katare, P., & Sharma, I. (2020). Phishskape: A content based approach to escape phishing attacks. Procedia Computer Science, 171, 1102–1109.
- Javed, A. R., Jalil, Z., Moqurrab, S. A., Abbas, S., and Liu, X. (2020), Ensemble Ada Boost classifier for accurate and fast detection of botnet attacks in connected vehicles, Transactions on Emerging Telecommunications Technologies, 45.
- Kumar, A., Chatterjee, J. M., & Díaz, V. G. (2020). A novel hybrid approach of svm combined with nlp and probabilistic neural network for email phishing. International Journal of Electrical and Computer Engineering, 10(1), 486.
- Li, Y., Yang, Z., Chen, X., Yuan, H., & Liu, W. (2020). A stacking model using url and html features for phishing webpage detection. Future Generation Computer Systems, 94, 27–39
- Liu, X., Fu, J.,(2020). SPWalk: Similar Property Oriented Feature Learning for Phishing Detection. IEEE Access 8, 87031–87045. https://doi.org/10.1109/ ACCESS, 2992381
- Liu, T., Zheng, H., Zheng, P., Bao, J., Wang, J., Liu, X. and Yang, C.(2023) An expert knowledge-empowered CNN approach for welding radiographic image recognition, Advanced Engineering Informatics, 56, 101963, ISSN 1474-0346
- Maurya, S. and Jain, A. (2020). Deep learning to combat phishing, Journal of Statistics and Management Systems, 1–13.
- Mittal, M., Iwendi, C., Khan, S., and Rehman-Javed, A. (2020). Analysis of security and energy efficiency for shortest route discovery in low-energy adaptive clustering hierarchy protocol using Levenberg–Marquardt neural network and gated recurrent unit for intrusion detection system. Transactions on Emerging Telecommunications Technologies, p. e3997.
- Rashid, J., Mahmood, T., Nisar, M. W., Nazir, T.(2020) Phishing detection using machine learning technique, in: 2020 First International Conference of Smart Systems and Emerging Technologies (SMARTTECH), 43–46.
- Shie, E. W. S. (2020). Critical analysis of current research aimed at improving detection of phishing attacks, Selected computing research papers, p. 45.
- Sindhu, S., Patil, S.P., Sreevalsan, A., Rahman, F., Saritha, A.N., (2020). Phishing detection using random forest, SVM and neural network with back propagation. In: Proceedings of the International Conference on Smart Technologies in Computing, Electrical and Electronics(ICSTCEE), 391–394. https://doi. org/10.1109/ICSTCEE49637.2020.9277256.
- Verma, R., Shashidhar, N., & Hossain, N. (2020). Detecting Phishing Emails the Natural Language Way. In Computer Security–ESORICS, 824-841.
- Yuan, L., Zeng, Z., Lu, Y., Ou, X. and Feng, T.(2020) A character-level bigru-attention for phishing classification. In Information and Communications Security: 21st International Conference, ICICS 2020, springer, 15–17.
- Zamir, A, Hu, K., Iqbal, T., Yousaf, N., and Aslam F.(2020), Phishing web site detection using diverse machine learning algorithms, The Electronic Library, 38(1), 65–80
- Zhu, E., Ju, Y., Chen, Z., Liu, F., Fang, X.,( 2020). DTOF-ANN: an artificial neural network phishing
- detection model based on decision tree and optimal features. Application of Soft Computing J. 95,. https://doi.org/10.1016/j.asoc.2020.106505 106505
Malicious URLs pose a serious threat on the world wide web to users all over the world. The challenges emanating
from URLs which are malicious are many and very worrisome to internet users globally. This has informed, and thus propels
the development of newer models to solve the lingering challenge in the Cyber Security space. These newer notification and
detection models are been developed in other to mitigate the gaps and also curb the challenges caused by unknowingly using
or clicking further using a malicious URL. This study aims at developing a novel malicious URL detection and notification
model by using CNN and further incorporating CNN with penalty term on kernel, weight and bias in other to increase
models detection accuracy, reduce time complexity and also address misclassification issues as well as poor prediction
accuracy. The CNN with penalty term is being used against Logistic Regression (LR) and Recurrent Gated Units (RNN-
GRU) which increased the resilience of the suggested model as well as enhancing classification prediction. The diagnostic
tools employed for the proposed model are accuracy, confusion matrix, precision, recall, F1 score, and AUC-ROC. This
study outlined a novel method capable of identifying malicious URLs using features primarily obtained from the phishing
and real URL addresses. A temporal tokenizer was generated and used for URL text processing which scanned, recognized
characters, symbols and redundant tokens. This made it easier to separate specific features from the URL address and
return as a list while also identifying directories, keyword arguments, and extensions. Summary of the experimental results
shows that the proposed CNN with penalty term (98.2%) fared better than LR and RNN-GRU approaches which yielded a
recommendable prediction accuracy of 89.85% and 91.5% respectively.
Keywords :
CNN, Logistic Regression, Gated-Recurrent Network, URL, Penalty Term.