M.S. AAI Capstone Chronicles 2024

Vision Encoder Decoder Models . (n.d.). Hugging Face. Retrieved December 7, 2024, from https://huggingface.co/docs/transformers/v4.47.0/en/model_doc/vision-encoder-decoder nlpconnect. (2023). nlpconnect/vit-gpt2-image-captioning ยท Hugging Face . https://huggingface.co/nlpconnect/vit-gpt2-image-captioning Wong, W. (2019, October 15). What is Teacher Forcing? Medium. https://towardsdatascience.com/what-is-teacher-forcing-3da6217fed1c Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., Le, Q. V., & Adam, H. (2019). Searching for MobileNetV3 (No. arXiv:1905.02244). arXiv. https://doi.org/10.48550/arXiv.1905.02244 Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition (No. arXiv:1409.1556). arXiv. https://doi.org/10.48550/arXiv.1409.1556 He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition (No. arXiv:1512.03385). arXiv. https://doi.org/10.48550/arXiv.1512.03385 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023). Attention Is All You Need (No. arXiv:1706.03762). arXiv. https://doi.org/10.48550/arXiv.1706.03762 Zakka, C. (2023). Positional Embeddings - The Large Language Model Playbook . https://cyrilzakka.github.io/llm-playbook/pos-embed.html Huang, G., Liu, Z., Maaten, L. van der, & Weinberger, K. Q. (2018). Densely Connected Convolutional Networks (No. arXiv:1608.06993). arXiv. https://doi.org/10.48550/arXiv.1608.06993

232

Made with FlippingBook - professional solution for displaying marketing and sales documents online