ISSN: 2182-2069 (printed) / ISSN: 2182-2077 (online)
Next-Gen Phishing Defense: Enhancing Detection with Machine Learning and Expert Whitelisting/Blacklisting
Machine learning has become ubiquitous across industries for its ability to uncover insights from data. This research explores the application of machine learning for identifying phishing websites. The efficiency of different algorithms at classifying malicious sites is evaluated and contrasted. By exposing the risks of phishing, the study aims to develop reliable systems for fake website detection. The results showcase machine learning’s capabilities for augmented cybersecurity through automated threat intelligence. Phishing employs social engineering techniques to disguise malicious links as trusted entities, tricking victims into revealing sensitive information. This work investigates phishing detection leveraging curated lists and machine learning for adaptive defense. Whitelists of legitimate sites and blacklists of known threats establish a baseline for classification. Influential discriminating website features are distilled to train machine learning models using datasets with over 11,000 examples. Multiple learning algorithms are assessed including k-nearest neighbors, decision trees, Naive Bayes, logistic regression, support vector machines, and random forests. Feature selection methods optimize the input space for enhanced prediction. Models are evaluated on AUC, F1-score, precision, recall and Matthew's correlation highlighting random forest’s superiority with accuracy exceeding 97%. The integration of expert knowledge through whitelisting/blacklisting with machine learning provides an agile framework for identifying fraudulent websites. This study demonstrates machine learning's prospects for amplifying human expertise in addressing cybersecurity issues like phishing. The techniques developed could aid security analysts and empower end users against dynamic digital threats through automation.