M.S. AAI Capstone Chronicles 2024
10
To optimize the model developed, various aspects are modified and tested. The size and the
number of layers in the model architecture are both modified. A simple model is preferred over a more
complex one due to computational efforts and as a result a model with a few larger layers is tested
before a model with multiple hidden layers. During the training process the number of epochs and the
batch size are adjusted depending on the results seen. If the model appears to be overfitting and
learning patterns specific to the training dataset, the number of epochs is reduced. Conversely, if the
model appears to not be learning the relationship between the images and the labels, the number of
epochs is increased.
ViT
The ViT pretrained model is designed for video classification, detection, and segmentation and uses an
image as its input type (“Getting started with transforms v2”, n.d.). Figure 6 illustrates the bounding
boxes generated by the model for the intended purpose of object detection and it can be seen that the
model is not trained to detect objects in images taken by a UAV for the SAA task. The expected result is
the object detection and bounding box from Figure 1 however, the model is incorrectly detecting clouds
as one of the objects.
Figure 6
ViT object detection and segmentation
124
Made with FlippingBook - professional solution for displaying marketing and sales documents online