Volume 2 - Issue 3 – 4
Detecting Information Leakage via a HTTP Request Based on the Edit Distance
- Kazuki Chiba
Institute of Systems, Information Technologies and Nanotechnologies / Kyushu University Fukuoka, Japan
chiba@itslab.inf.kyushu-u.ac.jp,
- Yoshiaki Hori
Institute of Systems, Information Technologies and Nanotechnologies / Kyushu University Fukuoka, Japan
hori@itslab.inf.kyushu-u.ac.jp,
- Kouichi Sakurai
Institute of Systems, Information Technologies and Nanotechnologies / Kyushu University Fukuoka, Japan
sakurai@csce.kyushu-u.ac.jp
Keywords: HTTP, information leakage, edit distance, behavior based detection
Abstract
Recently, we often face the problem of information leakage. In a lot of routes of leakage, the number
of leakage victims via the Internet makes up approximately the half of all leakage victims. The cause
of leakage via the Internet is divided into human action and malware such as spyware. For example,
it occurs when human writes on the bulletin board and spyware works. Especially a technical coun-
termeasure against spyware is needed. In any event, we cannot trust countermeasures for information
leakage via the Internet completely.
When a web browser communicates with a server, it sends a HTTP request. The server replies
with the information specified in the HTTP request. Some spyware takes advantage of the HTTP
request. Installed spyware collects user’s information and embeds it in the HTTP request, then sends
it to an attacker’s server. Filtering packets by a port number of TCP or UDP is not a good way
because HTTP is a main communication protocol. A signature based technique is often used as a
countermeasure against these spyware. If data of some software matches with signatures stored in the
database, it is regarded as spyware. This technique has an advantage that it can detect most spyware
if data of spyware is stored, however, it loses effects if data of spyware is not stored.
Then, we propose a leakage detection system which is independent of a database. This system
focuses on the leakage caused by human action and malware. In an existing research, researchers cal-
culate an edit distance between the last HTTP request and the new HTTP request. The edit distance
is much smaller than the number of characters because a lot of HTTP requests have common char-
acters. We can detect leakage easily because the information which is sent repeatedly is disregarded
and the new information which is sent suddenly is digitized and its value stands out. We propose and
evaluate a technique that uses not only the just previous HTTP request but further previous HTTP
requests to further ignore unnecessary information. Furthermore, we propose a system which raises
an alert when it is in danger of information leakage. When an abnormal value is detected in a con-
tinuous numerical value, this system judges that there is some possibility of leakage. Assuming that
certain quantity information is leaked, some of the detection rate is higher than 90%.