M.S. Applied Data Science - Capstone Chronicles 2025
16
The performance gap between the gradient boosting methods and the linear models underscores the importance of capturing non-linear interactions in network traffic patterns. For example, DDOS and DOS attacks may share similar packet-rate characteristics but differ in target-distribution patterns, a non-linear relationship decision trees and boosted ensembles can model through sequential splits, whereas linear models struggle to represent such structure with a single separating hyperplane. Class-specific performance was broadly consistent across models. All top-tier ensembles achieved exceptional results on MIRAI attacks, with F1-scores exceeding 99%, indicating MIRAI botnet traffic is highly distinctive and easily separable from other classes. DDOS attacks were also reliably detected, with F1-scores around 81–82% for the strongest models. BENIGN traffic, RECON, and SPOOFING attacks exhibited moderate to strong performance, with F1-scores typically between the mid-60s and high-80s depending on the model. The critical challenge emerged with the minority classes. WEB attacks achieved F1-scores ranging from 4.54% (logistic regression) to 13.43% (LightGBM), while BRUTE_FORCE attacks reached F1-scores between 2.13% (logistic regression) and 9.98% (LightGBM). Even the best-performing ensembles struggled with these rare attack types, suggesting 8,764 training examples per minority class were insufficient to support robust generalization when evaluated against test sets containing millions of flows from other families. Based on the comprehensive evaluation, LightGBM was selected as the final production model. It achieved the highest macro F1-score (63.29%) among all algorithms, indicating superior balanced performance across attack families. LightGBM also demonstrated strong
computational both cross-validation and hyperparameter tuning, completing experiments substantially faster than XGBoost and CatBoost while delivering nearly identical performance. In addition, LightGBM exhibited low cross-validation variance, suggesting stable generalization, and its relatively modest memory footprint makes it attractive for deployment in resource-constrained network monitoring environments where real-time inference and periodic retraining are required as attack patterns evolve. Although XGBoost achieved a nearly identical macro F1-score (63.01%) and may be favored in some settings due to its widespread adoption and extensive documentation, LightGBM’s superior speed-to-performance ratio made it the more practical choice for operational deployment in this context. Its ability to process large volumes of network traffic with low latency while maintaining high detection accuracy for most attack types positions it well for real-world intrusion detection scenarios. At the same time, the persistent difficulty in detecting WEB and BRUTE_FORCE attacks indicates further methodological improvements are needed. Future work could explore alternative sampling strategies, such as Synthetic Minority Oversampling Technique, cost-sensitive learning approaches that penalize minority-class errors more heavily, or ensemble methods specifically designed for extreme class imbalance. Incorporating temporal features or sequence-based models may also help capture multi-step attack patterns not fully expressed in single-flow features. 4.4.4.1 Test design, i.e. training and validation datasets. The data partitioning strategy was designed to address class imbalance while ensuring realistic model evaluation. The full dataset of 21,035,896 observations was first split efficiency during
254
Made with FlippingBook flipbook maker