ISSN: 2182-2069 (printed) / ISSN: 2182-2077 (online)
IoT-Traffic Networks Effective Features Based on NSGA-II Technique
Applying approaches such as crowding distance between samples that belong to the Non/Dominated Sorting version of the Genetic Algorithm type II can gradually improve the feature subset by selecting the most relevant based on their distance values. The number of subsets has been decreased for the applied dataset from (41) features to (29) features with accuracy maintained after applying machine learning techniques. K-NN, SVM, and DT provided an improvement in predicted accuracy from (80.81, 76.6, and 86.7) to (81.9, 81.6, and 87.6) with features minimizations based on the proposed model. In addition, a study was done on the applied dataset for level and basic types of IoT regarding its affection on prediction accuracy with specifying a value of these features. This study was also done after applying the crowding distance with info gain for the first 10 features of 29 in total plus a comparison with four feature selection methods. The packet as a level type as well as the basic type was the most dominant feature of all other IoT features on the prediction accuracy. The accuracy of overall utilized Machine learning was maintained or even increased regarding these types only, especially for SVM. Connection level features and traffic types (such as the same service or same host) were less effective features on the overall machine learning prediction accuracy. This study was done for the NSL-KDD dataset to record reasonable numbers with more variety to protect machine learning methods from frequent records affection. In addition to accuracy, other evaluated parameters were done such as Precision, Recall, ROC, and PRC area to demonstrate the privileges of features minimization. This study was compared to relative studies based on feature specification or not with accuracy evaluations based on state-of-the-art methods. Packet level had a 3.8% effective ratio on the SVM method, while Basic type had a 4.5% effective ratio on the same overall SVM accuracy. Comparisons have been provided to demonstrate the validity of these results with previously utilized methods.