AAI_2025_Capstone_Chronicles_Combined
14
distortion in the embedding space in the original DEC algorithm. Although this stabilized the clusters somewhat, the models still frequently resulted in only one or two clusters. As an additional incentive to balance cluster sizes, we added entropy to the loss calculation, resulting in the final loss function:
[ KLDivergence ]+ Γ ∗ [ ReconstructionLoss ]+ κ ∗ [ Entropy ]
where Γ and κ are tunable constants. Table 1 shows the final configuration of the tuned model. Table 1 DEC Model Configuration Comparison
Original DEC Model
Final Model
Phase 1: Autoencoder Pre-training
Text Embedding
TF-IDF
text-embedding-004
Hidden Layers
[500, 500, 2000]
[350, 750, 1200]
Latent Dimension Activation Function
10
15
ReLU
ELU
Epochs
50,000 for layerwise pre training, 100,000 for end-to end fine-tuning
160 with early stopping
BatchSize
256
64
Loss Function
MSE SGD
MSE Adam 0.0005
Optimizer
Learning Rate
0.1, divided by 10 every 20000 iterations
Dropout
0.2
0.35
Weight Decay
0
0.009
Phase 2: Clustering
Convergence Threshold
0.1%
10%
BatchSize
256
64
Loss Function
KL Divergence
KL Divergence + Recon struction Loss + Entropy
Optimizer
SGD 0.01
Adam 0.0015 1 epoch
Learning Rate
Target Distribution Update Interval 1 epoch
Alpha Parameter
1
0.12
In addition to hyperparameter tuning, we modified the initial text embedding process to
238
Made with FlippingBook - Share PDF online