Volume 4 - Issue 1
Implementation and Performance of Distributed Text Processing System Using Hadoop for e-Discovery Cloud Service
- Taerim Lee
Pukyoung National University, Busan, Republic of Korea
taeri@pknu.ac.kr
- Hun Kim
Pukyoung National University, Busan, Republic of Korea
mybreathing@pknu.ac.kr
- Kyung Hyune Rhee
Pukyoung National University, Busan, Republic of Korea
khrhee@pknu.ac.kr
- Sang Uk Shiny
Pukyoung National University, Busan, Republic of Korea
shinsu@pknu.ac.kr
Keywords: Electronic Discovery, e-Discovery, Digital Forensics, Evidence Search, Hadoop Performance, MapReduce Programming, Distributed Text Processing
Abstract
Big Data brings new challenges to the field of e-Discovery or digital forensics and these challenges
are mostly connected to the various methods of data processing. Considering that the most important
factors are time and cost in determining success or failure of digital investigation, development of
search method comes first to more quickly and accurately find relevant evidence in Big Data. This
paper, therefore, introduces a Distributed Text Processing System based on Hadoop called DTPS
and explains about the distinctions between DTPS and other similar researches to emphasize the
necessity of it. In addition, this paper describes experimental results to find the best architecture
and implementation strategy for using Hadoop MapReduce as a major part of the future e-Discovery
cloud service.