AAI_2025_Capstone_Chronicles_Combined

First page Table of contents Previous page 89 Next page Last page

Evaluating Deep Learning Model Convergence in Chess via Nash Equilibria

Figure 9: The figure showcases the estimated Maximum Entropy Nash Equilibrium among 10 sampled models. The winrate matrix among all 10 sampled models was used in a Gradient Ascent Implementation to compute the Maxent Nash. The Maxent Nash denotes the least exploitable meta strategy over models. Maxent Nash asserts that any other distribution of models, unless it is a Nash Equilibrium, would have lower payout in a meta-game against Maxent Nash. The figure shows that although training performance improved, and test performance remained stable, the robust playing strength of the model decreases with training. A Nash equilibrium is a game theory term for a strategy, or meta strategy in the case of the paper, that maximizes payout against all strategies. The maximum entropy Nash Equilibrium (Maxent Nash) is the Nash Equilibrium of the game that has the maximum entropy. In the game of rock paper scissors, the Maxent Nash strategy is ⅓ probability over all three actions. Any distribution over actions that deviates from the Maxent Nash is by definition more exploitable and has lower payout than the Maxent Nash (Balduzzi et al., 2020). A gradient ascent algorithm was

Made with FlippingBook - Share PDF online