M.S. AAI Capstone Chronicles 2024

time likely hosted primarily personal photography shot with traditional cameras. MS COCO is periodically updated, so training the models with this data could rectify this problem. ​ Another approach that could boost the performance of the models is augmenting the images to increase variation in the training data. One way to do this is by adding random transformations to the images before pre-computing the image features, such as flipping the images vertically or rotating them slightly. Another technique would be to add Gaussian noise to the images. The increased variation in the images from this process could help the models generalize to unseen images and avoid overfitting during training. ​ Finally, we could try other architectures not covered in this project, such as the architecture described in the 2016 paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (Xu et al.). This paper describes an alternative way of implementing visual attention from what was done in this project, by integrating an attention mechanism into the time step loop of an LSTM layer.

230

Made with FlippingBook - professional solution for displaying marketing and sales documents online