AAI_2025_Capstone_Chronicles_Combined
16
Draw, Detect, Navigate
the current bounding boxes are used to mathematically raycast to the spatial plane determined
by the detected angle of the ArUco marker, and a 3D model corresponding to the detected class
is anchored there relative to the tracking marker. Upon additional button press, the space
above the detected plane is decomposed into orthogonally organized nodes, and a
3-dimensional implementation of A* using Euclidean distance determines the most efficient
route from the helicopter to the hospital and visualizes this route. While this particular
implementation is a fairly simplistic implementation of the constituent components, it
demonstrates the successful integration and viability to pave the way for more complex
AI-enabled interactions, training, or 3D simulations from live hand-drawn maps and pictograms.
Conclusion
When developing the most optimal model for the project, multiple concerns needed to
be addressed when determining the base architecture of the model. Frontrunner pretrained
models that presented best chances for the task suitability were Torchvision’s Faster R-CNN and
Ultralytics YOLOv8 nano and small editions. Training the two models on the basis of
performance metrics, training time, and inference speed, highlighted differences that warranted
consideration in choosing the final model.
One of the core considerations for this project’s application is real time inference
speed, and which requires not only accurate models but optimal speed performance as well.
Both nano and small have fewer parameters and faster inference than their larger edition
counterparts. Nano’s optimal performance also lends itself well for AR devices using edge
computers with small batteries and smaller and smaller available processing power. The small
model, double the size of the nano, would likely be suitable for inference on connected devices,
rather than edge where the extra cost associated with the gain to accuracy can be justified.
When deployed in real time on previously unseen, unprocessed inputs, the fine-tuned
YOLOv8 model accurately identified both the class and location of each doodle. Testing with a
fine-tuned Faster R-CNN model was also attempted, but was not supported by Unity. This hurdle
43
Made with FlippingBook - Share PDF online