Implementation and Performance of Distributed Text Processing System Using Hadoop for e-Discovery Cloud Service – Journal of Internet Services and Information Security

Volume 4 - Issue 1

Implementation and Performance of Distributed Text Processing System Using Hadoop for e-Discovery Cloud Service

Taerim Lee Pukyoung National University, Busan, Republic of Korea
taeri@pknu.ac.kr
Hun Kim Pukyoung National University, Busan, Republic of Korea
mybreathing@pknu.ac.kr
Kyung Hyune Rhee Pukyoung National University, Busan, Republic of Korea
khrhee@pknu.ac.kr
Sang Uk Shiny Pukyoung National University, Busan, Republic of Korea
shinsu@pknu.ac.kr

DOI: 10.22667/JISIS.2014.02.31.012

Keywords: Electronic Discovery, e-Discovery, Digital Forensics, Evidence Search, Hadoop Performance, MapReduce Programming, Distributed Text Processing

Abstract

Big Data brings new challenges to the field of e-Discovery or digital forensics and these challenges are mostly connected to the various methods of data processing. Considering that the most important factors are time and cost in determining success or failure of digital investigation, development of search method comes first to more quickly and accurately find relevant evidence in Big Data. This paper, therefore, introduces a Distributed Text Processing System based on Hadoop called DTPS and explains about the distinctions between DTPS and other similar researches to emphasize the necessity of it. In addition, this paper describes experimental results to find the best architecture and implementation strategy for using Hadoop MapReduce as a major part of the future e-Discovery cloud service.

Date

February 2014

Page Number

12-24