AAI_2025_Capstone_Chronicles_Combined

capacity when training small networks. By instead training many small networks at once, we can increase utilization and improve overall efficiency. Training multiple models in a loop with PyTorch still leads to inefficient GPU usage, as each model is processed sequentially with repeated overhead for each step (the middle of Figure 2.1). A more efficient approach would batch models together and parallelize their computation to better leverage GPU hardware and reduce runtime costs. leading to faster convergence times for certain machine learning workflows (right side of Figure 2.1).

Figure 2.1: Levels of GPU Utilization from Model Vectorization

One common method for improving GPU utilization is increasing the batch size. Larger batch sizes allow for better parallelization, as more computations can be performed simultaneously, keeping the GPU fully occupied. However, excessively large batch sizes can lead to diminishing returns or even degraded model performance. When the batch size becomes too large, models may suffer from poor generalization, as updates become less frequent, leading to less noise in gradient updates, which can prevent the model from escaping local minima. Additionally, very large batches can result in memory saturation, causing inefficiencies due to memory swapping or requiring gradient accumulation techniques to manage memory constraints (Keskar, Mudigere, Nocedal, Smelyanskiy, & Tang, 2017).

129

Made with FlippingBook - Share PDF online