AAI_2025_Capstone_Chronicles_Combined

9 standard ImageNet practices (Deng et al., 2009). Training was performed using the standard AdamW optimizer (Loshchilov & Hutter, 2019), using Cross Entropy Loss as is the most appropriate for classification tasks. Other hyper-parameters, like Learning Rate or AdamW Weight Decay, were fine-tuned for each specific model. First, I performed a separate ablation testing for each of the proposed architectures with the goal of performing a grid search to obtain the best combination of hyper-parameters. For each possible combination, I run a brief 5 epoch training on a small random subset of the training split, and tested the accuracy on the evaluation split. Parameters checked for the CNN model include the initial channel size, dropout chance, learning rate and AdamW weight decay values. For the ViT model, different hidden layer dimensions, number of layers, learning rate and AdamW weight decay values are tested. After checking every hyper-parameter combination, I selected the best model for each architecture and moved to the full training process. The model chosen was the one that achieved the highest accuracy during the evaluation process. The number of epochs and the size of the ablation test training dataset was chosen based on the performance limitations of the NVIDIA T4 GPU available for model training in Google Colab. The best CNN configuration uses 48 base channels in the first convolutional layer, no dropout, a learning rate of 10 -4 and a weight decay per epoch of 10 -4 . This model achieved a 85% accuracy in the evaluation split in the ablation test. Regarding the ViT, the best configuration uses image patches of 16 by 16 pixels, 256 neurons in

366

Made with FlippingBook - Share PDF online