ISSN: 2182-2069 (printed) / ISSN: 2182-2077 (online)
Multi-Constraints Weighted Matrix Driven Outlier Removal Assisted Fractional Evolutionary Swarm Intelligence Model for K-Means Clustering Based Big Data Analytics
The last few years have witnessed exponential rise in software, internet technologies and allied application environment generating humongous data to serve varied real-time analytical services. In BigData where data heterogeneity, diversity and non-linear distribution remains unavoidable, the likelihood of outliers as the redundant instances or malicious entities can’t be ruled out. On the other hand, despite robustness over other machine learning models, K-Means clustering algorithm might result false-positive or false-negative outputs due to the outlier(s). Though, in the past numerous efforts were made towards outlier detection in BigData analytics; yet, most of the state-of-arts applied distance, density or mass information as standalone decision criteria that doesn’t guarantee generalizability of the solution. Additionally, none of these methods addressed both outlier detection as well as K-Means clustering optimization as coupled problem. With this gap and motivation, this paper proposed a multi-constraints weighted matrix driven outlier removal assisted fractional evolutionary swarm intelligence (MCIMO-FESI) driven K-Means clustering model for the Bigdata analytics. The MCIMO-FESI model at first performs multi-constraints weighted matrix that applied multiple spatio-relational and instance-wise rank parameters like distance, inter-instance correlation, rank sum information for outlier detection. After dropping the outlier instances, it executes fractional evolutionary swarm intelligence (FESI) model encompassing adaptive genetic algorithm (AGA) and particle swarm optimization (PSO) to perform centroid update and allied cluster optimization. More specifically, AGA was applied in conjunction with the multi-objective optimization cost function to update centroid details, while PSO was applied to improve clustering and allied instance mapping. The use of inter-instance distance, stability, DB-Index, Sym-Index values as MOO cost function helped FESI optimize clustering. Simulation of the proposed MCIMO-FESI driven K-Means algorithm resulted outlier detection and allied clustering accuracy of 98.25%, precision 97.30%, recall 96.82% and F-measure of 0.971, which is superior over major state-of-art outlier models. The overall performance confirms its suitability towards unsupervised learning driven highly accurate analytics.