AAI_2025_Capstone_Chronicles_Combined

Data Issues and Preprocessing Considerations

Several characteristics observed in the dataset could negatively impact CNN

classification performance. The class imbalance, with most images labeled as pneumonia, can

bias models toward the majority class, leading to poor generalization and lower recall due to

false positives (Valova et al., 2020).

Inconsistent image resolutions and sizes further affect CNN performance by creating

discrepancies in the scale and detail of features the model learns (Sabottke et al., 2019).

Feeding images of differing resolutions without proper standardization can impair pattern

recognition and reduce both accuracy and generalization. Similarly, variations in image contrast

negatively impact classification accuracy, as extremely bright or dark images can obscure

discriminative texture and edge information essential for CNNs (Akbarinia & Gegenfurtner,

2019). Variations in patient pose, body size, image zoom, and chest cavity size may also

introduce bias if insufficient diversity exists in the training set (Valova et al., 2020).

The disparity in image sizes will be addressed by resizing all X-ray images to a

standardized resolution of 128 × 128 pixels during preprocessing. This resizing ensures

consistent input dimensions, reduces variability, and preserves critical structural details

necessary for accurate classification.

A GAN model will be used to generate additional images to augment the CNN training

dataset. GAN-generated images can help correct class imbalance by supplementing the minority

"Normal" class without requiring new patient data. Moreover, synthetic images introduce

99

Made with FlippingBook - Share PDF online