AAI_2025_Capstone_Chronicles_Combined

8

Draw, Detect, Navigate ​

Data Summary

In training for prediction based on human-drawn symbolic representations, the Google

Quick, Draw! Doodle dataset was used (Jongejan et al.). The dataset consisted of human-drawn

representations of the word prompt, drawn under the time constraint of 20 seconds to force

simplicity, with 3,000 samples per class. All samples included were those that were correctly

identified, and were closely cropped to the area of the drawing. Each labeled grayscale image

.png file was accompanied by the country code of the user who generated the image, a

vectorized version of the image preserving stroke data, and a unique identifier. The dataset was

created through self-selected user participation in the Quick, Draw! Game, where users

volunteered their drawings as training data (Jongejan et al.). The dataset contained no missing

values except for country codes, which were likely unavailable due to factors such as VPN usage

or IP detection failures. While the cause of the missing country code was not disclosed in the

source of the dataset, due to the automatic detection using the user’s internet protocol (IP)

address and the potential for some users to use virtual private network (VPNs) multiple

technical sources of this are possible. In the chosen modeling approach, the country code was

discarded as non-relevant data.

​ The 345 image classes present in the dataset were of a wide assortment of those known

to a general audience, ranging from “bat” to “The Eiffel Tower” to “diving board”. Images were

all stretched to even squares if not already in that form, with an even 36 pixels of padding of

white space on each side. For the purposes of computational resource management and scope,

this project uses twelve of the 345 categories present in the Quick, Draw! dataset. The number

of necessary classes when taken into business contexts would likely be highly dependent on the

domain and usage. Data augmentation was performed to further pad images for the purpose of

bounding box prediction and providing the mode bounding boxes which shifted from image to

image, but this was found to be insufficient to help all models attempting to learn accurate box

predictions.

35

Made with FlippingBook - Share PDF online