AAI_2025_Capstone_Chronicles_Combined

First page Table of contents Previous page 236 Next page Last page

representations (embeddings)of all input conversations. These lower-dimensional representations are then given initial cluster assignments using a standard k-means algorithm. Subsequently, the Student’s t-distribution is employed to calculate soft assignment probabilities for each conversation belonging to each cluster. A target distribution is then created on each iteration; this target distribution is a refined version of the probability distribution that reinforces high-confident assignments, which helps the model move samples more confidently towards their most probable clusters. During the iterative training of the clustering phase, the Kullback-Leibler (KL) divergence between the current soft assignments and the target distribution acts as the loss function, which is used to simultaneously update both the weights of the encoder and the positions of the cluster centroids. LSTM-DEC Based Model Our LSTM-based Deep Embedded Clustering (LSTM-DEC) model variant was developed to leverage the strengths of Long Short-Term Memory (LSTM) networks in handling the sequential and conversational nature of student-chatbot dialogues. Its architecture includes an input layer for raw text strings, a text vectorization layer to convert text into integer sequences (maximum length: 250 tokens), an embedding layer (64 dimensions, configured to ignore zero-padding values), two stacked LSTM encoder layers (each with 128 LSTM units), and a dense latent embedding layer (64 units, linear activation). The decoder component comprises a decoder input later, a dense expansion layer (32,000 nodes, ReLU activation), a reshaper layer, two stacked LSTM decoder layers (each with 128 units), and a time-distributed dense output later (predicting vocabulary probabilities with softmax activation). The LSTM autoencoder was pretrained for 30 epochs (batch size: 32, 10% validation split) using an Adam optimizer and reconstruction loss. This loss penalizes discrepancies between predicted word probabilities and actual target word IDs. After pretraining, a clustering layer (dense layer with a total number of distinct math levels units and softmax activation) was added to the encoder’s latent embedding output. The

236

Made with FlippingBook - Share PDF online