This repository aims to perform the captioning task for 1 image using Transformer
architecture and VGG16
pretrained model to conduct this task.
In this source code, i use self-attention mechanism to build my own Transformer
and use VGG16
to extract some informations of images before giving them to encoder component of Transformer
- Crawling dataset
I use the above website and Selenium
library of Python
to crawl images and titles of them