AAI_2025_Capstone_Chronicles_Combined
9
well suited for this project because they combine volumetric feature extraction, multi-scale representation learning, and automated configuration with the ability to integrate patient-space geometric metadata in a DICOM-faithful manner. Several studies demonstrate the suitability of similar architectures for voxel-level segmentation - a prerequisite for voxel-accurate volumetry - while broader literature highlights persistent gaps in consistency, generalization, and volumetric evaluation that this project is designed to address through an end-to-end segmentation and tumor volume estimation pipeline. 4 Methodology This project uses a three-dimensional convolutional neural network for semantic segmentation of lung tumors on CT, implemented with the nnU-Net v2 framework. nnU-Net is a self-configuring segmentation system that automatically determines preprocessing parameters, architectural depth, patch size, and training schedules based on dataset-specific characteristics, enabling standardized and reproducible experimentation (Isensee et al., 2021). This design choice aligns with the project goal of building a repeatable volumetry framework: rather than hand-tuning a bespoke model, the segmentation component is anchored to a well-documented, automatically derived baseline that can be reproduced and compared across future model variants. In this work, nnU-Net was configured in the 3d_fullres setting for the Lung1 dataset (Aerts et al., 2019). The planner selected a target voxel spacing of 3.0 mm in the cranio–caudal direction and 0.9766 mm in-plane, and an input patch size of 64 × 320 × 256 voxels, with a batch size of 2 to balance field-of-view
coverage against GPU memory limits. The resulting network is a seven-stage residual encoder–decoder U-Net with 3D convolutions, using 32, 64, 128, 256, and 320 feature maps across successive encoder stages. The first stage uses anisotropic 1 × 3 × 3 kernels to respect the coarser through-plane resolution, and later stages use 3 × 3 × 3 kernels. Downsampling is implemented via strided convolutions in the encoder, and upsampling is performed in the decoder with symmetric skip connections that fuse coarse semantic features with higher-resolution spatial features. Each block uses InstanceNorm3d with affine parameters and LeakyReLU nonlinearities, without dropout, and the architecture employs deep supervision from intermediate decoder outputs to stabilize gradient flow (Isensee et al., 2021). All CT images and associated RTSTRUCT contours were preprocessed using the DICOM-native pipeline described earlier in this report. CT series were loaded in Hounsfield Units and resampled to the nnU-Net target spacing using trilinear interpolation for the images. nnU-Net’s CTNormalization scheme was applied to each case, which normalizes intensities based on foreground intensity statistics suitable for thoracic CT (Isensee et al., 2021). RTSTRUCT tumor contours were converted into voxel-aligned binary masks in patient space and resampled to the same grid using SimpleITK with first-order interpolation for labels to preserve geometric fidelity. These masks served as the ground-truth segmentation labels for training and evaluation. All nnU-Net preprocessing components—resampling, normalization, and foreground-aware patch sampling—were left at their defaults to support consistency and reproducibility across experiments on Lung1 (Aerts et al., 2019).
406
Made with FlippingBook - Share PDF online