M.S. AAI Capstone Chronicles 2024
12
The training process for the ViT model reveals that the best classification performance is
obtained with an epoch and batch size of 3, a learning rate of 0.0002, and the Adam optimizer. Table 2
summarizes the final training performance of the model after optimization. It can be seen that neither
the training loss nor the testing accuracy change significantly during the model training indicating that
the model is not learning the desired relationships.
Table 2
ViT model training performance
Epoch
Training loss
Testing Accuracy
1 2 3
0.642 0.572 0.639
0.67 0.67 0.67
Despite the poor loss and accuracy results, the ViT model improves in the object detection
results once it is fine-tuned using the training dataset when compared to the initial model. Figure 8
shows the expected bounding box for an input image on the left and the model results after
optimization on the right. It can be seen that the model learned during the training set and is able to
successfully locate the object in the image. Implementing both the classification and object detection
aspects of the fine-tuned model results in the predictions seen in Figure 9.
126
Made with FlippingBook - professional solution for displaying marketing and sales documents online