Everything is written in Final_Report.pdf, if you're interested!
Base on the following: https://github.com/xuwangyin/pytorch-tutorial/tree/master/tutorials/03-advanced/image_captioning
YOLO v2 come from: https://github.com/longcw/yolo2-pytorch
Google Transformer come from: https://github.com/jadore801120/attention-is-all-you-need-pytorch