AAI_2025_Capstone_Chronicles_Combined

4

bounding-box localization to support clinician decision-making. Introduced by Ren et al., (2015), Faster R-CNN is a two-stage object detection widely used in both medical imaging research and practical applications. Its main innovation is separating the detection pipeline into two stages. In the first stage, a Region Proposal Network (RPN) generates candidate bounding boxes. In the second stage these proposed locations are classified (fracture vs background) and its coordinates are refined. This two-stage approach provides high localization accuracy, making it sensitive to small or subtle objects, which is essential for detecting fine cortical disruptions that characterize many cervical fractures (Paik et al., 2024). The ability to offer bounding boxes offers meaningful clinical localization and interpretability for medical professionals. Faster R-CNN has established an extensive precedent in the medical field, with successful applications in vertebral fracture detection, pulmonary nodule detection (Zhao et al., 2022), and intracranial hemorrhage localization (Le et al., 2020). These examples support its use for complex trauma detection tasks. Automated identification of spinal fractures faces the historical challenges of manual diagnosis, which requires high levels of expertise and time (Liu et al., 2022). AI methods usually fall into two categories: traditional segmentation or classification methods and modern object detection models (Jeong & Lee, 2025), with vision transformers often outperforming traditional methods. The YOLO architecture is ideal in clinical practice because its real-time speed offers the ability to look through large-scale hospital image datasets. The newest iteration, YOLOv11 (He et al., 2024), retains this speed by performing object localization and

3 Literature Review

Several machine learning approaches attempt to address the same problem of fracture identification in CT and X-ray data. Early work used binary classifiers, particularly convolutional neural networks (CNNs), to classify 2D images as “fracture” or “normal” (Kim et al., 2020). While computationally simple, these approaches do not provide spatial localization, which limits clinical interpretability. We chose to explore CNNs as a baseline model as this type of model remains foundational in computer vision research. It is straightforward, stable during training, and capable of learning lower-level spatial features effectively (LeCun, Bengio, & Hinton, 2015). On the other hand, Zech et al. (2018) demonstrated that CNNs trained for pneumonia detection performed well internally but degraded sharply when tested on external hospital data. Their work revealed that models sometimes latch onto acquisition-specific cues rather than true clinical features; this can cause reliability issues if the deployment environment differs from the training environment. More recent systems employ object detection architectures, including YOLO (You Only Look Once) (Redmon et al., 2016), RetinaNet (Lin et al., 2017), SSD (Liu et al., 2016), transformer-based models like DETR (Carion et al., 2020), and two-stage detectors such as Faster R-CNN (Ren et al., 2015). These models have been applied successfully to a variety of medical tasks, including rib fracture detection (Yao et al., 2021), vertebral compression fracture localization (Paik et al., 2024), pulmonary nodule detection (Zhao et al., 2022), and hemorrhage detection (Le et al., 2020). Commercial computer-aided detection (CAD) tools in radiology often build on variation of these same algorithms, which emphasizes

307

Made with FlippingBook - Share PDF online