AAI_2025_Capstone_Chronicles_Combined
Evaluating Deep Learning Model Convergence in Chess via Nash Equilibria
16 implemented to estimate the Maxent Nash using the winrate matrix computed for the round robin tournament (Ortiz, 2007). The results seen in Figure 9 show the steady underperformance of models as they continue with training. Albeit a few models that completely underperformed with respect to the rest of the sample model snapshots, there is a steady trend of models losing their robustness against earlier versions. The ideal Maxent Nash would be a degenerate distribution for the final model. A more favorable result would be a Maxent Nash that trends higher probabilities for model’s later in training; however the Maxent Nash seen in this experiment is the opposite. Earlier models are the more robust performers at depth-1 play. In most machine learning and deep learning environments, hundreds of millions of high quality examples are often thought to be sufficient for training models to generalize onto the problem space. However, this experiment showcases the important relationship between a training set and the out of distribution universe that exists outside of it. Even with over 300 million training examples from 1800+ elo games, the dataset space proves hard to generalize over and moreso, the universe of states. High quality examples in the case of deep learning in large state spaces does not necessarily entail positions produced by the upper 90th percentile of human players. Instead, the notion of quality data is one that relates to how instructive a training set is for a model. The idea that performance is actually degrading despite no changes in metrics on the test set, stresses the importance of using examples that are relevant but out of distribution. Future chess engines and zero-sum learning agents may benefit from dynamic evaluation schemes like Maxent Nash, particularly in domains with expansive state spaces. I hypothesize Conclusion
90
Made with FlippingBook - Share PDF online