Preventing Data Loss by Harnessing Semantic Similarity and Relevance – Journal of Internet Services and Information Security

Volume 11 - Issue 2

Preventing Data Loss by Harnessing Semantic Similarity and Relevance

Hanan Alhindi King Saud University, Riyadh, Saudi Arabia
halhindi@ksu.edu.sa
Issa Traore University of Victoria, Victoria, BC, Canada
itraore@ece.uvic.ca
Isaac Woungang Ryerson University, Toronto, ON, Canada
iwoungan@cs.ryerson.ca

DOI: 10.22667/JISIS.2021.05.31.078

Keywords: Data loss prevention, Threat actors, Malicious insiders, Similarities, Data leakage, Detection rate

Abstract

Malicious insiders are considered among the most dangerous threat actors faced by organizations that maintain security sensitive data. Data loss prevention (DLP) systems are designed primarily to detect and/or prevent any illicit data loss or leakage out of the organization by both authorized and unauthorized users. However, exiting DLP systems face several challenges related to performance and efficiency, especially when skillful malicious insiders transfer critical data after altering it syntactically but not semantically. In this paper, we propose a new approach for matching and detecting similarities between monitored and transferred data by employing the conceptual and relational semantics, including extracting explicit relationships and inferring implicit relationships. In our novel approach, we detect altered sensitive data leakage effectively by combining semantic similarity and semantic relevance metrics, which are based on an ontology. Our experimental results show that our system generates on average relatively high detection rate (DR) and low false positive rate (FPR).

Date

May 2021

Page Number

78-99