M.S. AAI Capstone Chronicles 2024

10

To optimize the model developed, various aspects are modified and tested. The size and the

number of layers in the model architecture are both modified. A simple model is preferred over a more

complex one due to computational efforts and as a result a model with a few larger layers is tested

before a model with multiple hidden layers. During the training process the number of epochs and the

batch size are adjusted depending on the results seen. If the model appears to be overfitting and

learning patterns specific to the training dataset, the number of epochs is reduced. Conversely, if the

model appears to not be learning the relationship between the images and the labels, the number of

epochs is increased.

ViT

The ViT pretrained model is designed for video classification, detection, and segmentation and uses an

image as its input type (“Getting started with transforms v2”, n.d.). Figure 6 illustrates the bounding

boxes generated by the model for the intended purpose of object detection and it can be seen that the

model is not trained to detect objects in images taken by a UAV for the SAA task. The expected result is

the object detection and bounding box from Figure 1 however, the model is incorrectly detecting clouds

as one of the objects.

Figure 6

ViT object detection and segmentation

124

Made with FlippingBook - professional solution for displaying marketing and sales documents online