AAI_2025_Capstone_Chronicles_Combined

First page Table of contents Previous page 127 Next page Last page

approaches like genetic algorithms and other population-based methods are naturally well suited to ensembles of small networks. However, their scalability is currently constrained by computational overhead. By enabling the training of tens or even hundreds of thousands of small neural networks in parallel, this project opens the door to new strategies in scalable AI model development. The primary goal of this research is to demonstrate the performance benefits of the proposed approach compared to standard training procedures in PyTorch. While the long-term vision includes developing a reusable library, the current work focuses on proving the viability of the method and quantifying its impact. The MNIST digit classification dataset is used for benchmarking, as it provides a well-known baseline and is simple enough to support high-volume parallel training. Ultimately, this work targets AI researchers and engineers interested in evolutionary computing, large-scale simulations, and parallel model training. If successful, it could lay the foundation for future tools that make large-scale multi-agent learning and population-based optimization practical and efficient on modern hardware. There are several existing methods and technologies aimed at addressing the problem of training multiple neural networks efficiently. Traditional deep learning frameworks like TensorFlow and PyTorch offer distributed training capabilities, but these are generally optimized for training large individual models across multiple GPUs or TPUs rather than training many small models simultaneously. One notable example of an approach to parallel training is the work done by Will Whitney in a blog post where he demonstrated how to use the JAX library to train multiple networks at the same time. Whitney's method, however, relies on vectorization rather than true parallelization, leveraging JAX's vectorized mapping (vmap) functionality to efficiently apply operations across multiple networks. By using vmap, calculations can be structured to better utilize GPU memory, effectively filling the GPU's 2.) Background Information

127

Made with FlippingBook - Share PDF online