AAI_2025_Capstone_Chronicles_Combined
15
use the transformer-based text-embedding-004 model (Lee et al., 2025). This more modern embedding model is better able to capture contextual nuance in the source text than the TF-IDF method used in the original model. Human Evaluation As a final check for our top performing models, we performed a human evaluation of the clusters. This involved using an LLM to create a summary of each cluster, then evaluating each conversation to see if the reviewer felt that it correctly matched the cluster description. The proportion of correct assignments was used as a final accuracy score for the clustering. The human evaluation step helped validate that the clustering and subsequent LLM interpretation were aligned with how a human user might view the conversations relative to their clusters. Alternative Models Overview We created two additional clustering models for comparison with our DEC model: a basic k-means model and an LLM-based classifier. The LLM-based classifier was designed based on the work of Huang and He (2024) and comprises a three-phase approach. In the first phase, the inputs are split into batches of 16 conversations and passed to an LLM with instructions to create representative labels that would cover all conversations in the batch. As each subsequent batch is passed to the LLM, the list of existing labels is provided and the LLM is prompted to provide any additional new labels that need to be added to cover all of the conversations in the batch. The first phase often results in a very large number of labels that are too narrow for useful classification. The second phase involves iteratively providing the full list of labels to an LLM and prompting it to consolidate the most similar labels to reduce the number of items in the list. This process is repeated until the original labels have consolidated down to the target number of clusters. In the third and final phase, each input conversation is provided to an LLM along with the list of labels. The LLM is prompted to assign one of the available labels to the conversation. This assignment acts as the final clustering result for the algorithm.
239
Made with FlippingBook - Share PDF online