M.S. Applied Data Science - Capstone Chronicles 2025

First page Table of contents Previous page 217 Next page Last page

explainable and scalable frameworks that not only capture the complexity of market relationships but also empower stakeholders to act on these insights with confidence.

Methodology We utilized Python 3.13.7 in Jupyter notebooks to conduct our study on local PCs. Major Python modules utilized included networkx, pandas, geopandas, duckdb, matplotlib, shapely, and folium. Both the Spend Patterns and Global Places datasets were used to build our network graph upon which the following analysis was based. Data Acquisition and Aggregation The SafeGraph Spend Patterns and Global Places dataset consisted of a collection of anonymized debit and credit card transactions aggregated to individual places or places of interest (POI) in the United States. The study focused on a monthly cross-section of the dataset (i.e., July 2025) and localized to POIs located in San Diego County. After geographic filtering, the dataset consisted of 8,537 POIs. Each POI in SPEND_PATTERNS was joined to GLOBAL_PLACES on the placekey ID as shown in the entity-relationship diagram in Figure 1. Each record contained location-based attributes, including its region, city, street address, postal code, latitude, longitude, and polygon shape. In terms of business size and type, we also utilized raw customer count, transaction count, and total spent, and the top category associated with the first 4 digits of the POI’s NAICS code. A NAICS code is a six-digit number used by federal and state agencies to classify businesses by industry for statistical, administrative, regulatory, and contracting purposes. In order to build the network of edges between POIs, we utilized cross shopping data that described the percent of customers that also shopped at other brands in the same month. We utilize these percentages as the weight of the directed edges connecting each

217

Made with FlippingBook flipbook maker