AAI_2025_Capstone_Chronicles_Combined

20

proved valuable for its ability to easily extract informative metadata and guide the clustering towards a desired focus. The greatest limitation of LLM-driven clustering is its scalability. The LLM-as-classifier technique mitigated this somewhat through conversation batching and iterative refinement of the cluster labels. However, the limited context window, slow processing speed, and monetary cost of LLMs would make the approach nonviable for use with large datasets. In further exploration of this task, we would focus on techniques for using LLMs in clustering in more scalable ways. Several techniques explored in recent research focus on this issue. Anthropic’s Clio platform uses alternating rounds of facet extraction and k-means clustering to handle very large datasets effectively (Tamkin et al., 2024), while techniques such as LLMEdgeRefine (Feng et al., 2024) and TECL (Wang et al., 2025) use LLMs to efficiently find cluster boundaries using only a small subset of the input data. Exploring these techniques would enable further enhancements to the existing system by allowing large datasets to be analyzed more quickly for users and more cost-effectively for application maintainers.

244

Made with FlippingBook - Share PDF online