This reposirtory contains the code used for translating about 13 million of image-caption pairs.
The dataset has been downloaded from BLIP repository as:
CC3M+CC12M+SBU, Filtered synthetic caption by ViT-L, here.
After downloaded the dataset, I have made chunks, just for managing the data, but it is not necessary at all.
python3 -m venv .transltion_venv
source .transltion_venv/bin/activate
pip3 install requirements.txt
The main code used for translation exists on nllb_multi_gpus_inference file
. The code initially adopted from here.
- I have used a cluster with 4 A10 GPUs, each A10 GPU has 24GB of RAM.
- I have used MIT LICENSE for this code, but for the dataset used you should refer to BLIP LICENSE here