M.S. Applied Data Science - Capstone Chronicles 2025
19
6 Discussion The results of this study determined how seven supervised models performed on the CIC-IoT 2023 dataset. This section will describe the different aspects of what these results mean for IoT intrusion detection. One insight the results yielded early on was that all tree-based ensemble algorithms used in this study, LightGBM, XGBoost, CatBoost, and Random Forest, outperformed the linear models with consistent and reliable results on a large IoT traffic dataset. As shown in Table 1, the range in the percentages between these four models is extremely narrow, suggesting tree-based models excel at dealing with this kind of data, with the other models lagging behind by 7-22%. LightGBM maintained a competitive accuracy score (78%) and achieved the highest macro F1-score (63.29%) on the imbalanced test set, suggesting it is an efficient and scalable alternative to linear models. Even though the training dataset was well-balanced with over 8,000 examples of each class, all the models struggled to detect the minority attack types such as WEB and BRUTE_FORCE. This indicates more examples are needed for the models to learn to catch and correctly identify the more obscure attacks. When these models were applied to a real-world imbalanced dataset, they all resulted in low F1-scores. LightGBM at the top achieved only 13.4% on WEB and 9.9% on BRUTE_FORCE. Alternatively, detecting high volume attack families, such as MIRAI, DOS, and DDOS, as shown in Figure 3, yielded the highest F1-scores as these occur more frequently and have clearer signatures to identify easier. Lastly, after seeing how well the models performed on the test and validation sets, this training strategy of undersampling indicated the models generalized well and did not overfit. This approach demonstrated the data was handled
helping to discriminate between benign and malicious traffic. 5.2.2 Class Level Feature Importance While the global mean SHAP values identified which features contribute most towards the LightGBM decision process overall, family-specific SHAP beeswarm visualizations (Figures 16 - 23) provided a more granular view of how specific features influenced the prediction for each attack family. These visualizations reveal directional impact in terms of the family-level predictions along with consistency of the effect by visualizing how spread out the SHAP values are along the x-axis and how feature values (blue = low, red = high) cluster on either side of the zero. When high SHAP values are found to the right of zero, this indicates the feature increases the likelihood of the model predicting family. Alternatively, if high-value points appear to the left, the feature suppresses family classification. Features with wide SHAP distributions across both sides indicate context dependent influence rather than a consistent directional effect. Across all eight classification families, TCP flag fields and overall packet rate consistently emerged as high-impact features with varying effects by family. The benign network traffic typically showed high SYN activity pushing predictions away, while moderate header length occasionally favored predictions. The Brute Force, DDOS, and DOS families were dominated by packet rate, indicating high rates increased the probability of classification into these attack families. By contrast, Spoofing was most strongly associated with high FIN flag counts, and Recon attacks showed intermittent positive influence from Reset flags and Acknowledgement flags. The MIRAI and Web family of attacks exhibited weaker directional separation, but broader variance across the features, such as header length and packet rate. This suggests more dynamic and heterogeneous behavior within these attack families.
257
Made with FlippingBook flipbook maker