AAI_2025_Capstone_Chronicles_Combined
1.) Introduction
In many of my academic and personal machine learning projects, I have frequently encountered a common inefficiency: modern GPUs are often underutilized when training small neural networks. Although increasing batch size can help improve hardware utilization, excessively large batches can degrade model performance, especially in tasks sensitive to data ordering or gradient diversity. This inefficiency is particularly noticeable when working with lightweight models that do not saturate the available compute capacity of the GPU. A similar challenge arises in the context of game development, where thousands of autonomous agents must be simulated in real time. While rendering is typically the initial bottleneck, optimizations often shift the performance burden to the evaluation of behavioral models for these entities. Traditionally, these behaviors are governed by hand-crafted rules or decision trees, but lightweight neural networks offer a promising alternative—enabling more individualized behavior without sacrificing performance, provided they can be evaluated in parallel. Together, these two use cases—training many small networks and simulating many autonomous agents—highlight a broader limitation in mainstream AI frameworks. Most deep learning libraries, including PyTorch and TensorFlow, are optimized for training large, monolithic models efficiently. They are not well suited for applications where running or training thousands of small neural networks in parallel would be more effective. This project aims to address that gap by developing and benchmarking a framework that scales small neural networks efficiently to fully utilize GPU hardware. The ability to parallelize training and inference across many small models has wide-reaching implications. Areas such as evolutionary algorithms, swarm intelligence, and multi-agent systems could benefit significantly from frameworks optimized for this paradigm. Traditional optimization techniques like stochastic gradient descent are designed for large models with massive parameter spaces. In contrast,
126
Made with FlippingBook - Share PDF online