This research paper outlines a methodology to determine URL legitimacy and detect phishing attempts. Python
modules like who is, socket, re, IP address, and BeautifulSoup are employed to extract features such as IP address, URL
length, domain name, subdomains, and favicon presence. These values are stored as a list and used to train classifiers.
Kernel SVM, KNN, Random Forest, and decision tree classifiers are implemented.
The Kernel SVM classifier (sklearn.svm.SVC) with the "rbf" kernel handles nonlinearity. Decision tree classification
is based on the "entropy" criterion using the sklearn.tree module. Random Forest combines multiple decision trees, with
final classification based on majority voting.The paper presents a user-friendly UI design for websites focused on phishing
detection. These websites utilize machine learning algorithms to assess URL authenticity and provide user feedback.
Integration of frameworks like Bootstrap and Particles.js enhances visual appeal and user experience.
The machine learning algorithm analyzes website content, structure, and other factors to determine legitimacy,
presenting results with a legitimacy percentage and associated risks.The study explores Flask, a flexible Python web
framework for rapid development of online applications. Flask provides built-in routing, templating, and supports
machine learning integration, enabling user input and result retrieval. It simplifies machine learning model deployment as
web services through APIs, facilitating integration with other applications.
Additionally, the research emphasizes Anaconda as an essential tool for data science and machine learning projects.
Anaconda offers efficient package management, simplifying installation, removal, and updating of required libraries. It
provides a comprehensive set of tools for the complete data science workflow, including data exploration, cleaning, model
construction, and deployment. Integration with Jupyter Notebook further enhances its capabilities.
In conclusion, this research paper presents a comprehensive approach for URL legitimacy assessment and phishing
detection, combining Python modules, machine learning classifiers, user-friendly UI design, Flask framework, and the
benefits of Anaconda for data science and machine learning projects