M.S. Applied Data Science - Capstone Chronicles 2025
20
appropriately for this project and objectives in the context of real-world scenarios. 6.1 Conclusion This study focused on evaluating seven supervised machine learning models for intrusion detection using the CIC-IoT2023 dataset, with an emphasis on the LightGBM algorithm as a lightweight, scalable alternative to more computationally intensive deep learning architectures. Using a clean and deduplicated version of the dataset, models were trained on a class balanced subset of the data using majority undersampling techniques so the model could learn minority class characteristics and then be evaluated on stratified but highly imbalanced validation and tests set with the goal of approximating real world IoT traffic conditions. The objective of this study was not solely focused on the maximization of accuracy, but to understand how well the models would generalize across minority attack families and to assess whether LightGBM offered a practical balance between performance, efficiency, and interpretability for deployment in anomaly-based intrusion detection system frameworks. Across all seven algorithms, the tree-based ensemble models outperformed the linear baselines as summarized in Table 1. LightGBM, XGBoost, CatBoost, and Random Forest formed the core of the best models, achieving very similar results within test set performance metrics. These models achieved test accuracies hovering around 78% along with macro F1 scores between 62.86% and 63.29%. The LightGBM model edged out all models with the highest macro F1 score (63.29%) while also maintaining a weighted F1 score of 79.63%. These results indicate the LightGBM model provided the best-balanced performance across the eight families when both majority and minority classes are considered in their stratified representation. The Linear SVC offered reasonable mid-tier baseline performance,
outperforming the Logistic Regression model but still fell short against ensemble models. An important distinction between this work and the prior CIC-IoT2023 studies lies in the evaluation procedures. Previous research papers commonly reported results on balanced training and testing sets where minority and the majority classes were balanced, with no discussion of how this was performed. These setups, while useful for controlled comparison, tend to inflate performance metrics and understate the difficult nature of detecting minority attack families within production environments. To contrast this, this study emphasized the deliberate evaluation of all models on the stratified imbalanced validation and test sets, which maintained the distribution found within the original dataset. This resulted in more conservative, but simultaneously more representative, performance metrics of the conditions these models are likely to produce when applied to real deployments, where a focus is placed on still detecting minority classes. The discovery of over half of the raw CSV files contained duplicate observations, and their removal prior to modeling ensured the present study evaluated models on a more diverse and challenging subset of unique flows. One key finding of this study is the contrast between threshold-based classification metrics and ranking-based discrimination metrics. From the perspective of threshold-based classification metrics, all models, including the dominant LightGBM model, showed low F1 scores for the minority families, despite balancing training data. On the other hand, the multiclass ROC analysis revealed the tuned LightGBM model achieved strong probabilistic separation with macro-AUC of 0.977 and micro-AUC of 0.984. The MIRAI attack family even achieved perfectly separable performance with AUC = 1.000, with minority classes also achieving strong AUC values despite lower F1-scores. This finding indicates the LightGBM model learned meaningful signals for the
258
Made with FlippingBook flipbook maker