Learning to Detect Phishing Webpages – Journal of Internet Services and Information Security

Volume 4 - Issue 3

Learning to Detect Phishing Webpages

Ram B. Basnet Colorado Mesa University Grand Junction, Colorado, USA
rbasnet@coloradomesa.edu
Andrew H. Sung University of Southern Mississippi Hattiesburg, Mississippi, USA
andrew.sung@usm.edu

DOI: 10.22667/JISIS.2014.08.31.021

Keywords: phishing attack, phishing webpages, content-based approach, batch learning, online learning

Abstract

Phishing has become a lucrative business for cyber criminals whose victims range from end users to large corporations and government organizations. Though Internet users are generally becoming more aware of phishing websites, cyber scammers come up with novel schemes that circumvent phishing filters and often succeed in fooling even savvy users. Recent studies to detect phishing and malicious webpages using features from URLs alone show promise. The approach, however, may not be reliable and robust enough to detect evolving sophisticated phishing webpages. For examples, phishers can use URL shortening services to masquerade their phishing URLs, or use compromised legitimate websites to host their phishing campaign. Along with the features from URLs, we propose many novel content based features and apply cutting-edge machine learning techniques to demonstrate that our approach can detect phishing webpages with error rates 0.04-0.44%, false positive and false negative rates of 0.0-0.30% and 0.06-0.73% respectively on real-world data sets using Random Forests classifier, thereby improving previous results on the important problem of phishing detection.

Date

August 2014

Page Number

21-39