AAI_2025_Capstone_Chronicles_Combined

First page Table of contents Previous page 237 Next page Last page

iterative DEC training process then optimized the model using KL divergence as the loss function and a learning rate of 0.001. Training was run for 100 epochs with a batch size of 32. LSTM-DEC Model Optimization During model optimization, the LSTM-based autoencoder efficiently learned to reconstruct the original text embeddings. However, the subsequent LSTM-DEC clustering phase presented a critical challenge. The model converged very rapidly (approximately 10 epochs) to a low loss, but this unfortunately resulted in the collapse of all data points into a single cluster. Consequently, standard clustering evaluation metrics (e.g. Silhouette Score, Davies-Bouldin Index) could not be calculated. We were unable to get the LSTM model to a state where it could create distinct, usable clusters. The complete collapse of the LSTM-DEC model was an unexpected result, suggesting a limitation in our design’s ability to support clustering for this task. Final Model Optimization For the final model, hyperparameter tuning was performed against the base DEC model using the Optuna library (Akiba et al., 2019), which uses a Bayesian tree-structured Parzen estimator (TPE) tuner. Hyperparameter tuning was performed in two phases: autoencoder tuning and DEC tuning. The autoencoder tuning phase targeted only the initial autoencoder training and tuned the dimensions of the hidden layers, the dimensions of the encoder output layer, batch size, learning rate, dropout rate, and weight decay. Minimization of reconstruction loss was used as the evaluation metric for this phase of tuning. The second phase of hyperparameter tuning kept the optimized auto-encoder hyperparameters constant and focused on the clustering portion of the algorithm. The tuned hyperparameters were: learning rate, update interval for the target distribution, alpha parameter for soft assignments, and threshold for convergence. In early model iterations, we found that the clusters frequently collapsed, resulting in all conversations being assigned to a single cluster. To address this, we incorporated reconstruction loss into the clustering loss function, based on findings by Guo et al. (2017) that it helped prevent

237

Made with FlippingBook - Share PDF online