AAI_2025_Capstone_Chronicles_Combined

First page Table of contents Previous page 134 Next page Last page

To first demonstrate the functionality of the TurbaNet library, Whitney’s implementation was replicated to verify TurbaNet’s ability to train (Whitney, 2021). This step ensured that the framework operated as expected before conducting further experiments. To assess the computational efficiency of TurbaNet, key model parameters were systematically varied and their effect on training time measured. The primary variables included the number of hidden nodes, the number of hidden layers, and the size of the swarm trained in parallel. These parameters were adjusted independently across different experiments. In the first experiment, the number of hidden nodes was varied from 8 to 512, increasing in increments of two while maintaining a fixed swarm size of 128 and a single hidden layer. The second experiment focused on the number of hidden layers, ranging from one to ten in single-step increments, with the number of hidden nodes fixed at 128. The third experiment explored the impact of swarm size, testing values from one to 128, first doubling the count from one to two and then increasing in increments of two. To better understand how network complexity affected performance, two versions of the experiment were conducted, one using a small network consisting of 16 hidden nodes and a single hidden layer, the other using a larger network with 256 hidden nodes and two layers. The expectation was that PyTorch would exhibit a linear increase in training time proportional to the number of networks being trained, meaning that doubling the number of networks would double the required compute time. This expectation aligns with previous findings on deep learning scaling laws, where computational cost grows linearly with model size and training iterations (Kaplan et al., 2020). In contrast, TurbaNet was expected to scale more efficiently, with training time increasing at a much lower rate. Specifically, it was anticipated that training two networks would take slightly longer than training one but not a full twofold increase, due to the framework’s ability to leverage parallel computation. However, that performance gain should diminish when memory limitations are reached, at which point training efficiency would degrade, eventually reducing to a state where only one network could be processed at a time.To ensure statistical accuracy, each trial was repeated ten times, with batch size and epoch count held constant throughout all runs. The total runtime for each training session was recorded along with the running conditions.

134

Made with FlippingBook - Share PDF online