ISSN: 2182-2069 (printed) / ISSN: 2182-2077 (online)
Detecting Information Leakage via a HTTP Request Based on the Edit Distance
Recently, we often face the problem of information leakage. In a lot of routes of leakage, the number of leakage victims via the Internet makes up approximately the half of all leakage victims. The cause of leakage via the Internet is divided into human action and malware such as spyware. For example, it occurs when human writes on the bulletin board and spyware works. Especially a technical coun- termeasure against spyware is needed. In any event, we cannot trust countermeasures for information leakage via the Internet completely. When a web browser communicates with a server, it sends a HTTP request. The server replies with the information specified in the HTTP request. Some spyware takes advantage of the HTTP request. Installed spyware collects user’s information and embeds it in the HTTP request, then sends it to an attacker’s server. Filtering packets by a port number of TCP or UDP is not a good way because HTTP is a main communication protocol. A signature based technique is often used as a countermeasure against these spyware. If data of some software matches with signatures stored in the database, it is regarded as spyware. This technique has an advantage that it can detect most spyware if data of spyware is stored, however, it loses effects if data of spyware is not stored. Then, we propose a leakage detection system which is independent of a database. This system focuses on the leakage caused by human action and malware. In an existing research, researchers cal- culate an edit distance between the last HTTP request and the new HTTP request. The edit distance is much smaller than the number of characters because a lot of HTTP requests have common char- acters. We can detect leakage easily because the information which is sent repeatedly is disregarded and the new information which is sent suddenly is digitized and its value stands out. We propose and evaluate a technique that uses not only the just previous HTTP request but further previous HTTP requests to further ignore unnecessary information. Furthermore, we propose a system which raises an alert when it is in danger of information leakage. When an abnormal value is detected in a con- tinuous numerical value, this system judges that there is some possibility of leakage. Assuming that certain quantity information is leaked, some of the detection rate is higher than 90%.