AAI_2025_Capstone_Chronicles_Combined
8
Lastly, it is worth noting that some researchers consider the possibility of generated images to have such high quality to be actually impossible to detect, and instead aim at watermarking solutions to label an image as fake through an invisible identifier, like Google DeepMind’s SynthID (Google DeepMind, 2023), but such systems are out of scope for this paper. Experimental Methods Two different model architectures were trained: CNN and ViT (Dosovitskiy et al., 2020). The CNN architecture design was based on the ResNet (He et al., 2016) architecture: an initial stage applies a small 3x3 convolution, followed by Batch Normalization (Ioffe & Szegedy, 2015) and a ReLU activation layer. Then, three stages subsequently duplicate the channel depth while halving the input shape size. These stages use strided convolutions, Batch Normalization, ReLU activation layers and a skip connection. Lastly, a classification head performs Global Average Pooling (Lin et al., 2013b), an optional Dropout layer (Srivastava et al., 2014) and a linear layer that reduces the embeddings shape to the desired output size. For the ViT model, I used the standard implementation provided in the PyTorch library (Paszke et al., 2019). Most of the parameters of these models were selected through the ablation testing process described below. The original dataset was already divided into three splits: training (70%), test (5%) and evaluation (25%), with each category containing the same amount of “Real” and “Fake” labeled samples. All the samples were then normalized to have values with mean (0.485, 0.456, 0.406) and standard deviation of (0.229, 0.224, 0.225) following
365
Made with FlippingBook - Share PDF online