AAI_2025_Capstone_Chronicles_Combined
generalization helped reduce sparsity, balance class frequencies, and improve learnability. The categories are as follows:
● Cardiac Issues: Cardiomegaly ● Fluid-Related Issues: Edema, Effusion, Pleural Thickening ● Hernia: Hernia ● Infection/Infiltration: Pneumonia, Consolidation, Infiltration ● Lung Structure Issues: Atelectasis, Pneumothorax, Fibrosis, Emphysema ● Nodule/Mass: Nodule, Mass ● No Finding: No Finding
This 7-class mapping was informed by both clinical rationale and co-occurrence trends in the data. For example, conditions like effusion and pleural thickening often appear together and are related to fluid accumulation within the chest cavity, thereby justifying their grouping. Our co-occurrence matrix (Fig. 4) confirmed several label pairs that commonly overlap, underscoring the need for a multi-label prediction strategy. In terms of data quality, we found and removed thousands of exact or near-duplicate image entries to avoid inflating model performance. We also allowed for limited missingness in metadata fields like age or gender, since these variables were not critical for our image-based models. Demographic variables were primarily used in our hybrid CNN (see Experimental Methods), which could optionally incorporate tabular input.
We observed wide variability in image resolution, with most images far exceeding the 1024×1024 range (see Fig. 3). To ensure consistent training speed and adequate compute
9
Made with FlippingBook - Share PDF online