AAI_2025_Capstone_Chronicles_Combined

First page Table of contents Previous page 19 Next page Last page

validation performance continued to improve throughout, which justified the continued fine-tuning.

After applying per-class threshold tuning (based on validation-set F1 scores), the model achieved a macro-averaged F1 score of 0.43, a micro-averaged F1 of 0.47, and a micro recall of 0.70 on the final test set, substantially outperforming the untuned baseline (which had a macro-F1 near 0.23). These metrics, shown in Figure 8 , suggest the model learned clinically meaningful patterns while maintaining generalization.

Fig.5 EfficientNetB0 Fine-Tuning results

Performance was strongest for categories with larger class support:

● Fluid Related Issues : F1 = 0.60 at threshold 0.40 Lung Structure Issues : F1 = 0.57 at threshold 0.40 ● Infection/Infiltration : F1 = 0.46 at threshold 0.35

● No Finding : F1 = 0.56 at threshold 0.40 ● Nodule/Mass : F1 = 0.42 at threshold 0.35

While underrepresented conditions like Hernia yielded very low precision (0.01), threshold tuning helped raise recall to 0.57. This tradeoff reflects our deliberate design choice to

Made with FlippingBook - Share PDF online