M.S. AAI Capstone Chronicles 2024

12

The training process for the ViT model reveals that the best classification performance is

obtained with an epoch and batch size of 3, a learning rate of 0.0002, and the Adam optimizer. Table 2

summarizes the final training performance of the model after optimization. It can be seen that neither

the training loss nor the testing accuracy change significantly during the model training indicating that

the model is not learning the desired relationships.

Table 2

ViT model training performance

Epoch

Training loss

Testing Accuracy

1 2 3

0.642 0.572 0.639

0.67 0.67 0.67

Despite the poor loss and accuracy results, the ViT model improves in the object detection

results once it is fine-tuned using the training dataset when compared to the initial model. Figure 8

shows the expected bounding box for an input image on the left and the model results after

optimization on the right. It can be seen that the model learned during the training set and is able to

successfully locate the object in the image. Implementing both the classification and object detection

aspects of the fine-tuned model results in the predictions seen in Figure 9.

126

Made with FlippingBook - professional solution for displaying marketing and sales documents online