AAI_2025_Capstone_Chronicles_Combined
Data Issues and Preprocessing Considerations
Several characteristics observed in the dataset could negatively impact CNN
classification performance. The class imbalance, with most images labeled as pneumonia, can
bias models toward the majority class, leading to poor generalization and lower recall due to
false positives (Valova et al., 2020).
Inconsistent image resolutions and sizes further affect CNN performance by creating
discrepancies in the scale and detail of features the model learns (Sabottke et al., 2019).
Feeding images of differing resolutions without proper standardization can impair pattern
recognition and reduce both accuracy and generalization. Similarly, variations in image contrast
negatively impact classification accuracy, as extremely bright or dark images can obscure
discriminative texture and edge information essential for CNNs (Akbarinia & Gegenfurtner,
2019). Variations in patient pose, body size, image zoom, and chest cavity size may also
introduce bias if insufficient diversity exists in the training set (Valova et al., 2020).
The disparity in image sizes will be addressed by resizing all X-ray images to a
standardized resolution of 128 × 128 pixels during preprocessing. This resizing ensures
consistent input dimensions, reduces variability, and preserves critical structural details
necessary for accurate classification.
A GAN model will be used to generate additional images to augment the CNN training
dataset. GAN-generated images can help correct class imbalance by supplementing the minority
"Normal" class without requiring new patient data. Moreover, synthetic images introduce
99
Made with FlippingBook - Share PDF online