Volume 11 - Issue 2
Preventing Data Loss by Harnessing Semantic Similarity and Relevance
- Hanan Alhindi
King Saud University, Riyadh, Saudi Arabia
halhindi@ksu.edu.sa
- Issa Traore
University of Victoria, Victoria, BC, Canada
itraore@ece.uvic.ca
- Isaac Woungang
Ryerson University, Toronto, ON, Canada
iwoungan@cs.ryerson.ca
Keywords: Data loss prevention, Threat actors, Malicious insiders, Similarities, Data leakage, Detection rate
Abstract
Malicious insiders are considered among the most dangerous threat actors faced by organizations
that maintain security sensitive data. Data loss prevention (DLP) systems are designed primarily to
detect and/or prevent any illicit data loss or leakage out of the organization by both authorized and
unauthorized users. However, exiting DLP systems face several challenges related to performance
and efficiency, especially when skillful malicious insiders transfer critical data after altering it syntactically
but not semantically. In this paper, we propose a new approach for matching and detecting
similarities between monitored and transferred data by employing the conceptual and relational
semantics, including extracting explicit relationships and inferring implicit relationships. In our
novel approach, we detect altered sensitive data leakage effectively by combining semantic similarity
and semantic relevance metrics, which are based on an ontology. Our experimental results show that
our system generates on average relatively high detection rate (DR) and low false positive rate (FPR).