M.S. Applied Data Science - Capstone Chronicles 2025
2
information, service outages and almost anything they can conceive. The average financial cost to companies of a successful cyberattack in the United States in 2025 costs $10.22 million dollars (Hill, 2023) which does not consider additional loss of consumer trust, which can cripple a company’s prospective future in doing business. Due to the connectedness of these environments, IoT networks present themselves as the perfect target for cybercriminals and require the utmost attention when it comes to developing adaptive and efficient processes to secure them. In the past, Intrusion Detection Systems (IDS) acted as the backbone of network security monitoring where they leveraged rule-based detection methods to identify threats, where security professionals hand craft manual rules which capture signatures of common attack types, patterns, and heuristics to identify these threats. However, in an ever-evolving landscape, modern attacks can often bypass these rule-based systems, especially when they are not maintained or updated to reflect the latest trends within the cybersecurity space. Rule-based IDS fail to adapt to new and evolving threats within the landscape, causing undue pressure on security researchers to keep the rules updated while simultaneously responding to threats in real time and often come with substantial false positives, which take time away from security professionals to respond to actual malicious network traffic (Hero et al., 2023). However, with the advancements in both machine learning (ML) and deep learning (DL) models have led to a shift in detection capabilities, helping to enable IDS to automatically detect malicious network traffic with both known and unknown signature types, allowing security professionals to respond to threats quicker and with lower false positive rates. This data-driven approach is imperative within the cybersecurity industry, where threats
are constantly evolving and network traffic is found in abundance and only expected to grow as the number of devices connected to the internet continues to increase. Another issue which arises when dealing with network traffic is it is inherently personal and secure, which is especially true when applied within the corporate setting. The issue of developing rule-based and ML/DL algorithms is the data, and traffic is unique to the entity. These entities do not openly share their most highly sensitive information with outsiders to enhance the development and identification of these models to aid in the identification of threats. This limitation makes the development of effective and generalizable intrusion detection models challenging, as researchers must then rely on publicly available and sometimes even synthetically generated datasets to train and evaluate their models, with no guarantee they will be applicable within a real-world scenario. To address this need, the Canadian Institute of Cybersecurity (CIC) developed the CIC-IoT2023 Dataset , which is a large-scale, realistic benchmark dataset which captures network traffic from 105 different IoT devices and runs 33 distinct attack types along with normal network traffic. These distinct attack types make up 7 broad categories of attack classification. This dataset is then made public for researchers to develop models with high confidence for testing intrusion detection algorithms within a diverse and dynamic environment which mirrors real-world IoT networks. 2 Background Cybersecurity is an integral pillar for corporations and personal IoT networks in maintaining digital resilience and safeguarding of data as the number of IoT devices continues to rise. The benefits of this vastly interconnected network of data transmissions have provided society with many benefits. Specifically, regarding automation and efficiency, while
240
Made with FlippingBook flipbook maker