ADS Capstone Chronicles Revised

First page Table of contents Previous page 275 Next page Last page

The xBD dataset provides annotated high resolution satellite imagery for assessing building damage, consisting of JSON files and image files. This project focuses on analyzing pre- and post disaster imagery related to hurricanes which any bias, ethics, and privacy should not arise during this project.

4.3 Exploratory Data Analysis

Exploration of the hurricane disaster data began by combining hurricane_pre_df and hurricane_post_df into a single dataset, hurricane_df, resulting in 2,438 entries. Exploratory data analysis techniques were applied to the hurricane dataset prior to preprocessing to uncover insights on its structure, patterns, and relationships. Both univariate and multivariate analyses, encompassing graphical and non graphical approaches, were used. Data distributions, variable relationships, and correlations were visualized through plots and heatmaps. This streamlined identification of trends, patterns, and potential issues such as multicollinearity, outliers, and missing data, all of which were critical for guiding subsequent preprocessing steps. The analysis began with a fundamental understanding of how damage distribution varies by disaster type. There was a strong desire placed on post-disaster data to better comprehend the patterns of building impact. Using a bar plot to depict the number of buildings affected by two primary disaster types which were flooding and wind. This chart highlighted varying building damage levels by disaster type, with flooding as a severe hazard. Overall emphasizing the need for tailored strategies based on each disaster's characteristics. 4.3.1 Distribution Analysis

4.1.1

Data

Preparation

and

Transformation

To access and process the stored JSON files, Python libraries such as JSON, pandas, and os are used to transform and aggregate the data into table-formatted data frames. The primary goal is to analyze conditions before and after the disaster, so the downloaded files are organized accordingly into pre- and post-disaster datasets. The JSON files contain an img_name attribute that distinguishes pre- and post-disaster data. During this stage, a mapping exercise is performed based on Table 1 column names to align the data structure. Each data frame, hurricane_pre_df and hurricane_post_df, consists of 1,219 entries with 20 columns, where xy and lng_lat contain polygon arrays representing coordinates. At this point, neither data frame includes the pre and post image file Converting categorical data for disaster_type (flooding: 0, wind: 1), damage_type (no-damage: 0, minor-damage: 1, major-damage: 2, destroyed: 3) and status (pre: 0, post: 1) to numerical simplifies data exploration and integration in ML tasks. Furthermore it was best to extract the well-known text (WKT) string in the lng_lat column and save it in a new column, wkt_lnglat. After that it was important to convert each WKT string into a shapely polygon object and store it in another new column called polygon_shape. 4.2 Preliminary Feature Engineering

4.3.2 Correlation Matrix

The two correlation matrices provide valuable information about various satellite observations related to disaster types and statuses of natural disasters. The color and values as seen in Figure 2, show correlations between each of the respective features. Red represents positive

correlations, while blue is distinguishing the negative correlations.

273

Made with FlippingBook - Online Brochure Maker