ADS Capstone Chronicles Revised

12

damage categories. This balanced subset was then combined with the remaining samples from other categories, ensuring equal representation across all classes. This strategy mitigated the impact of class imbalances and allowed the model to train more effectively which improved its ability to accurately classify the different damage categories.

sigmoid activation function generates a binary mask for the segmented regions, such as buildings. The FCN architecture avoids fully connected layers, preserving spatial information throughout the process for effective pixel-wise segmentation.

U-Net

The U-Net model was used for building localization due to its proven effectiveness in semantic segmentation tasks, particularly in applications where precise delineation of object boundaries is critical. The U-Net model employs a U-shaped encoder -decoder architecture, where the encoder extracts hierarchical features, and the decoder reconstructs spatial details using upsampling layers (Akhyar et al., 2024). The skip connections then link the encoder and decoder layers, effectively preserving fine-grained details essential for precise pixel-wise segmentation. U Net has been widely applied in various fields which includes satellite image analysis, automated industrial inspection, and intelligent disaster monitoring. Hierarchical features were extrapolated through repeated convolutional and downsampling operations. Each encoder block consists of two convolutional layers with 3×3 kernels, followed by batch normalization and ReLU activation, with max-pooling for spatial reduction and increased feature depth. The encoder has four levels, progressively doubling the filter count from 64 to 512. At the bottleneck, two additional convolutional layers with 1,024 filters capture high-level features, bridging the encoder and decoder. The decoder reconstructs the mask by upsampling the feature maps using transposed convolutions and skip connections, combining encoder features with refined upsampled maps. The final layer uses a 1×1 convolution with a sigmoid activation to generate a binary mask for building localization.

4.5.1 Building Localization Models

Fully Convolutional Network

Fully convolutional network (FCN) was chosen for building localization due to its ability to perform dense pixel-wise predictions which made it highly suitable for semantic segmentation tasks like identifying building regions in pre-disaster images. The FCN modifies the traditional encoder-decoder architecture by replacing fully connected layers with convolutional layers, enabling the model to handle variable input sizes (Akhyar et al., 2024). The choice of FCN was driven by its efficiency in leveraging convolutional layers to process images of varying sizes and its ability to maintain spatial hierarchies without requiring a fixed-size input. This enables effective capturing of both high-level contextual information and detailed spatial features where both are paramount for accurately localizing buildings in pre-disaster images. Additionally, the encoder extracts hierarchical features by applying a series of convolutional layers, starting with two 64-filter layers with 3×3 kernels followed by downsampling with a convolutional layer of Stride 2 and 128 filters. Subsequent convolutional layers refine the feature representation, capturing contextual information while progressively reducing spatial dimensions. The decoder is upsampling the feature maps using transposed convolutions to restore its original spatial dimensions. This is then followed by further convolutional layers to refine the features for improved segmentation accuracy. Furthermore, a final 1×1 convolution with a

278

Made with FlippingBook - Online Brochure Maker