AAI_2025_Capstone_Chronicles_Combined
subsequent years. Determining the initial train–validation–test split required careful consideration: simple class ‑ based stratification risked data leakage because images from the same capture sequence could end up in different sets, while purely temporal splits posed other limitations. Ultimately, we adopted the split defined in the CCT Benchmark study—using the training and validation sets for model tuning, the test set for in ‑ distribution evaluation, and first ‑ year images from unseen locations for out ‑ of ‑ distribution evaluation. We evaluated the models using both supervised (accuracy, F1-score, recall, precision) and unsupervised (confidence scores) metrics. We experimented with a custom loss function designed to improve prediction confidence by incorporating confidence margins, ensuring better separation between the most likely and second-most likely predictions. Three primary image-classifier models are developed and evaluated in this project. The first one is a pre-trained ResNet18 fine-tuned to our training set of 14 classes. ResNet-18 is a common choice for wild animal image classification especially for images captured from camera traps in the wild because of its strong performance, computational efficiency, and accessibility for transfer learning. The second model is another ResNet18 customized to include temporal feature information since by intuition, animal habits and activities have time/seasonal dependencies. It has been shown that including the cyclical time encoding of the “date captured” information significantly improved wildlife image classification performance. Finally, a model is built from scratch for further model comparative analysis The custom ScratchResNet model was built entirely from the ground up to evaluate the performance of a fully original architecture without reliance on pre-trained weights or transfer learning. This convolutional neural network was designed specifically to address the challenges of wildlife image classification, where class variability, environmental factors (lighting, occlusion) introduce added complexity. The architecture consists of four residual stages, each with two residual blocks, enabling the network to progressively capture more complex spatial and semantic features at increasing levels of abstraction. Residual connections were chosen to mitigate the vanishing gradient problem and improve training stability
287
Made with FlippingBook - Share PDF online