AAI_2025_Capstone_Chronicles_Combined
10
Draw, Detect, Navigate
with the tool Label Studio (Label Studio, n.d.). The automatically generated images from Unity
were created by adding an alpha channel turning the white portions of the drawing into a
transparency mask, then using C# placing a randomized number of drawings in randomized
positions with randomized scales over a randomized background. Start and end drawings, in
this case helicopters and hospitals, were prioritized with all other obstacle classes randomized.
Drawings were selected by randomly pulling from one of the first 2,200 example drawings from
that class, with the remaining 800 preserved for later testing. 10,000 images were generated
for testing in the span of approximately 30 minutes, and an additional 1,000 images were
created for validation and testing. The pipeline would support exchanges of class types or
increased class numbers with minimal modification. When either training images or live images
are processed and passed to the model, images are resized to the expected input of 640x640
and then converted to tensors. The choice to avoid further preprocessing was deliberate to
prioritize performance speed.
Initial modeling approaches used Convolutional Neural Networks created through the
use of TorchVision on the original single class images. Two CNNs were developed. The first
used the following two convolution layers, Rectified Linear Unit (ReLU) activations, max pooling
layers of 2x2, and was flattened to a linear layer to ultimately predict one of ten classes. An
overall F1 score of 0.94 was achieved with generally even performance across the classes. The
second model used twice as many convolutional layers, batch normalization, dropout, and use
of Leaky ReLU on twelve classes and achieved a validation accuracy of 89%. Neither CNN
supported bounding-box predictions; however, the classification results validate that the
selected classes are learnable from Quick, Draw! Images. In preliminary tests on hand-drawn
inputs, the models correctly classified most samples, motivating the transition to detection
architectures for localization.
37
Made with FlippingBook - Share PDF online