AAI_2025_Capstone_Chronicles_Combined

First page Table of contents Previous page 21 Next page Last page

Final performance metrics (see Figure 7) reflected this distinction. The single-task models achieved strong recall values—often above 0.80—for well-represented categories like Fluid Related Issues (F1 = 0.52), Infection/Infiltration (F1 = 0.43), and Lung Structure Issues (F1 = 0.40). For low-support classes like Cardiac Issues and Hernia , recall remained high, but precision dropped significantly (as low as 0.00), resulting in high false-positive rates. This tradeoff aligned with our recall-first objective: in clinical applications, it is often safer to raise false alarms than to miss a true pathology. One notable limitation of the hybrid model was its inability to enforce mutual exclusivity of the No Finding label, which occasionally co-occurred with other diagnoses despite being intended as a stand-alone class. Still, the system proved effective as a sensitive screening mechanism, surfacing even subtle or borderline findings for clinical review.

Fig.7 Multitask classifier loss

Made with FlippingBook - Share PDF online