M.S. AAI Capstone Chronicles 2024

First page Table of contents Previous page 208 Next page Last page

Data Summary

The copy of the Flickr30k dataset used in this project was retrieved from Hugging Face (nlphuji, 2023). There are 31,014 images in the dataset with 5 text captions each for a total of 155,070 unique pairs. The images span a wide range of subjects including people, animals, and inanimate objects. Figure 1 shows an example of an image from the dataset and its captions. Figure 1 Sample from the Flickr30k Dataset

Exploratory Data Analysis To familiarize ourselves with the data, we used random sampling to examine small samples of the images and their captions. Additionally, we defined a search function to search the images by keywords in their captions. We found that there is a high amount of variability in the images, both in terms of content and attributes of the images such as their dimensions, orientation, and zoom level. Similarly, there is also a high amount of variability in the captions both in length and quality. One of the challenges posed by this dataset is the ambiguity and variation in quality of the image captions. Take for instance the captions for the image in Figure 1 above. Two of the captions identify the child in the image as a girl, two of the captions don’t reference the gender of the child, and the other caption identifies the child as a boy. There are

208

Made with FlippingBook - professional solution for displaying marketing and sales documents online