AAI_2025_Capstone_Chronicles_Combined

14

distortion in the embedding space in the original DEC algorithm. Although this stabilized the clusters somewhat, the models still frequently resulted in only one or two clusters. As an additional incentive to balance cluster sizes, we added entropy to the loss calculation, resulting in the final loss function:

[ KLDivergence ]+ Γ ∗ [ ReconstructionLoss ]+ κ ∗ [ Entropy ]

where Γ and κ are tunable constants. Table 1 shows the final configuration of the tuned model. Table 1 DEC Model Configuration Comparison

Original DEC Model

Final Model

Phase 1: Autoencoder Pre-training

Text Embedding

TF-IDF

text-embedding-004

Hidden Layers

[500, 500, 2000]

[350, 750, 1200]

Latent Dimension Activation Function

10

15

ReLU

ELU

Epochs

50,000 for layerwise pre training, 100,000 for end-to end fine-tuning

160 with early stopping

BatchSize

256

64

Loss Function

MSE SGD

MSE Adam 0.0005

Optimizer

Learning Rate

0.1, divided by 10 every 20000 iterations

Dropout

0.2

0.35

Weight Decay

0

0.009

Phase 2: Clustering

Convergence Threshold

0.1%

10%

BatchSize

256

64

Loss Function

KL Divergence

KL Divergence + Recon struction Loss + Entropy

Optimizer

SGD 0.01

Adam 0.0015 1 epoch

Learning Rate

Target Distribution Update Interval 1 epoch

Alpha Parameter

1

0.12

In addition to hyperparameter tuning, we modified the initial text embedding process to

238

Made with FlippingBook - Share PDF online