M.S. Applied Data Science - Capstone Chronicles 2025

23

Test Design Community detection is an unsupervised graph-based method, as such train-test splits were not applicable. This evaluation framework instead focuses on validating the quality and robustness of community outputs. To do so, we produced several different graph configurations. We first configured full graphs with all nodes and edges based for each cross shopping column, including all non-zero weighted edges. We altered edge weights thresholds such that edges below certain weight percentiles were removed to test whether low magnitude edges influenced community and cluster detection. Lastly, we analyzed symmetrized graphs where directed edges are converted to undirected edges. This allowed us to confirm whether directed edges materially impacted downstream community modeling, as current literature suggests. In our community evaluation, we used two widely used community detection algorithms, Louvain and Leiden, across graph variants. Louvain communities correspond to groups of stores that share many customers relative to the broader network, highlighting patterns of customer behavior that may not align with obvious categories (Blondel et al., 2008). Leiden communities produce more stable, internally connected, and interpretable communities, making it particularly useful for real-world networks such as cross-shopping networks where loosely connected nodes and disconnected subgroups are common (Traag et al., 2019). Accordingly, Leiden is used as our primary model while Louvain is used as a benchmark. Community quality was evaluated using the modularity score, which measures the extent to which the partitioning reflects strong in community connectivity relative to the overall network structure. The final graph and community model is determined by selecting variants that produce the height modularity score and interpretable results relative to our expected clusters and known geographic patterns.

231

Made with FlippingBook flipbook maker