Overview

This reposirtory contains the code used for translating about 13 million of image-caption pairs.

Main dataset

The dataset has been downloaded from BLIP repository as:

CC3M+CC12M+SBU, Filtered synthetic caption by ViT-L, here.

After downloaded the dataset, I have made chunks, just for managing the data, but it is not necessary at all.

python3 -m venv .transltion_venv
source .transltion_venv/bin/activate

pip3 install requirements.txt

The main code used for translation exists on nllb_multi_gpus_inference file. The code initially adopted from here.

I have used MIT LICENSE for this code, but for the dataset used you should refer to BLIP LICENSE here