AAI_2025_Capstone_Chronicles_Combined

16

Draw, Detect, Navigate ​

the current bounding boxes are used to mathematically raycast to the spatial plane determined

by the detected angle of the ArUco marker, and a 3D model corresponding to the detected class

is anchored there relative to the tracking marker. Upon additional button press, the space

above the detected plane is decomposed into orthogonally organized nodes, and a

3-dimensional implementation of A* using Euclidean distance determines the most efficient

route from the helicopter to the hospital and visualizes this route. While this particular

implementation is a fairly simplistic implementation of the constituent components, it

demonstrates the successful integration and viability to pave the way for more complex

AI-enabled interactions, training, or 3D simulations from live hand-drawn maps and pictograms.

Conclusion

When developing the most optimal model for the project, multiple concerns needed to

be addressed when determining the base architecture of the model. Frontrunner pretrained

models that presented best chances for the task suitability were Torchvision’s Faster R-CNN and

Ultralytics YOLOv8 nano and small editions. Training the two models on the basis of

performance metrics, training time, and inference speed, highlighted differences that warranted

consideration in choosing the final model.

One of the core considerations for this project’s application is real time inference

speed, and which requires not only accurate models but optimal speed performance as well.

Both nano and small have fewer parameters and faster inference than their larger edition

counterparts. Nano’s optimal performance also lends itself well for AR devices using edge

computers with small batteries and smaller and smaller available processing power. The small

model, double the size of the nano, would likely be suitable for inference on connected devices,

rather than edge where the extra cost associated with the gain to accuracy can be justified.

​ When deployed in real time on previously unseen, unprocessed inputs, the fine-tuned

YOLOv8 model accurately identified both the class and location of each doodle. Testing with a

fine-tuned Faster R-CNN model was also attempted, but was not supported by Unity. This hurdle

43

Made with FlippingBook - Share PDF online