M.S. Applied Data Science - Capstone Chronicles 2025

21

Constructing the Graphs The modeling approach constructed directed graphs using the physical cross-shopping data, where each business location is modeled as a single node with attributes (i.e., category, customer counts, transactions, total spend, geographic coordinates) and weighted edges representing the percentage of customers shopping at other brands. Local, same_category, and online were evaluated but ultimately excluded due to limited practical value and technical constraints. Focusing exclusively on physical cross-shopping flows provided a cleaner, behavior driven graph structure that more accurately reflects real-world interactions among brands. We chose to model the network of businesses as directed graphs so that we could distinguish between “senders” and “receivers” of business traffic, which would have otherwise been obscured using an undirected graph. To accomplish this, we leveraged the popular networkx library, which provided the utilities for constructing and analyzing our graphs. For latent community discovery, we also utilize the igraph backend to take advantage of the Leiden algorithm, which is discussed further in the Methodology section. Cross-shopping data was originally stored as JSON strings, so we first transformed it into a list of tuples where the first value held the name of the brand and the second value referred to the percent of customers who also shopped at that brand. The cross-shopping data only attributed spend at the brand level rather than physical location. To account for this, all locations with attributed brands (e.g., Target, Walmart, Starbucks) were assigned edges to all location nodes based on their assigned brand. When multiple locations for a brand were found, the max weight was assigned for all edges. After graph construction, they were exported to GEXF format for external visualization in Gephi.

229

Made with FlippingBook flipbook maker