jacobswan1/ViTCAP
Implementation for CVPR 2022 paper " Injecting Semantic Concepts into End-to-End Image Captionin".
Python
Issues
- 0
- 0
How can I find the pt model named "Logit_Vilt_captioning_testing_batch-size_512_encoder_vit_base_patch16_384_lr_1e-4_iter_60_vitbfocal20_bert_tokenizer_tags_ENC-DEC_multiplier_0.1_expand_tag-classifier_emb.pt"
#10 opened by Shunli-W - 0
Thank you for your code and paper, I have gained a lot, there is a question how to implement distributed training, can you please share your PyTorch DDP mode, thank you very much.
#9 opened by Markkk111 - 0
Checkpoint model cannot be loaded
#8 opened by Faiail - 0
- 0
Freezing CTN and ViT during captioning
#7 opened by thilinen - 3
The training code of concept classification
#1 opened by ShiYaya - 2
- 3
Thanks for your code. Could you elaborate the implementation details of ViLT-CAP you used as one of the baselines?
#4 opened by meiling-fdu - 2
Problem running loading script
#3 opened by letitiabanana - 0
What machine to use
#2 opened by Gary-code