ISSN: 2182-2069 (printed) / ISSN: 2182-2077 (online)
Enhanced Web Server Log Anomaly Detection Using Hybrid Clustering and Machine Learning for Time Series Data
Web server logs has been an integral part of daily lives for engaging in social media, online platforms, messaging services, search engine queries, mobile apps, online banking, streaming services and so on as they provide insights into online communication patterns. Regularly monitoring and analyzing web server logs can help in promptly identifying and addressing anomalies that refers to unexpected or irregular pattern or event. Early detection of anomalies is important for maintaining a secure, reliable, and high-performance web presence. The main objective of this research work is to build a real-time anomaly detection system to detect and potentially predict issues within monitored servers for offline analysis prompting necessary system adjustments. The study employs offline analysis of historical data and utilizes a hybrid clustering algorithm based on GMM, K-means, and hierarchical clustering to detect new or unknown issues on web server logs that earliest automated log analysis tools were unable to detect. The research involves the classification of web server log data using various machine learning classifiers. Notably, the performance of these classifiers is compared, revealing that Logistic Regression and Multi-layer Perceptron outperform others like Decision Tree, Naive Bayes variants, K-Nearest Neighbors, and ensemble methods. The methodology's effectiveness is demonstrated using a web server dataset collected using Wireshark tool. The proposed approach achieves remarkable results with 100% Accuracy and Recall. However, Precision and Sensitivity are comparatively lower at 1%.