M.S. AAI Capstone Chronicles 2024

First page Table of contents Previous page 228 Next page Last page

scores 15.7, 21.3, and, 30.1 and the METEOR scores 15.3, 20, 23 (Karpathy & Fei-Fei, 2015) (Cornia et al., 2018) (Zhou et al., 2019). Unfortunately, it is not possible to directly compare our scores to these because BLEU and similar metrics can vary greatly depending on implementation details such as choice of tokenizer and normalization techniques (Papineni et al., 2002), and this information is not available in the papers for these models. Since we can’t directly compare our models to other published models using objective metrics, as a subjective metric we ran a small set of images from the test set through both our DenseNet121 Visual Attention model and ViT GPT2, which is a popular pretrained model for image captioning (nlpconnect, 2023). Figure 10 shows a sample of the images tested and the predicted captions from each model.

Figure 10

Predicted Caption Comparison: DenseNet121 Visual Attention vs. ViT GPT2

228

Made with FlippingBook - professional solution for displaying marketing and sales documents online