AAI_2025_Capstone_Chronicles_Combined

10

Draw, Detect, Navigate ​

with the tool Label Studio (Label Studio, n.d.). The automatically generated images from Unity

were created by adding an alpha channel turning the white portions of the drawing into a

transparency mask, then using C# placing a randomized number of drawings in randomized

positions with randomized scales over a randomized background. Start and end drawings, in

this case helicopters and hospitals, were prioritized with all other obstacle classes randomized.

Drawings were selected by randomly pulling from one of the first 2,200 example drawings from

that class, with the remaining 800 preserved for later testing. 10,000 images were generated

for testing in the span of approximately 30 minutes, and an additional 1,000 images were

created for validation and testing. The pipeline would support exchanges of class types or

increased class numbers with minimal modification. When either training images or live images

are processed and passed to the model, images are resized to the expected input of 640x640

and then converted to tensors. The choice to avoid further preprocessing was deliberate to

prioritize performance speed.

Initial modeling approaches used Convolutional Neural Networks created through the

use of TorchVision on the original single class images. Two CNNs were developed. The first

used the following two convolution layers, Rectified Linear Unit (ReLU) activations, max pooling

layers of 2x2, and was flattened to a linear layer to ultimately predict one of ten classes. An

overall F1 score of 0.94 was achieved with generally even performance across the classes. The

second model used twice as many convolutional layers, batch normalization, dropout, and use

of Leaky ReLU on twelve classes and achieved a validation accuracy of 89%. Neither CNN

supported bounding-box predictions; however, the classification results validate that the

selected classes are learnable from Quick, Draw! Images. In preliminary tests on hand-drawn

inputs, the models correctly classified most samples, motivating the transition to detection

architectures for localization.

37

Made with FlippingBook - Share PDF online