jacobswan1/ViTCAP

Implementation for CVPR 2022 paper " Injecting Semantic Concepts into End-to-End Image Captionin".

Python

Issues

I can't find where the negative log-likelihood function is used in the code
#11 opened a year ago by LT156
0
How can I find the pt model named "Logit_Vilt_captioning_testing_batch-size_512_encoder_vit_base_patch16_384_lr_1e-4_iter_60_vitbfocal20_bert_tokenizer_tags_ENC-DEC_multiplier_0.1_expand_tag-classifier_emb.pt"
#10 opened a year ago by Shunli-W
0
Thank you for your code and paper, I have gained a lot, there is a question how to implement distributed training, can you please share your PyTorch DDP mode, thank you very much.
#9 opened a year ago by Markkk111
0
Checkpoint model cannot be loaded
#8 opened a year ago by Faiail
0
Use semantic segmentation map as channel as additional input
#6 opened 2 years ago by thilinen
0
Freezing CTN and ViT during captioning
#7 opened 2 years ago by thilinen
0
The training code of concept classification
#1 opened 2 years ago by ShiYaya
3
Thanks for your code. How can i make my own dataset in tsv format?
#5 opened 2 years ago by eenzeenee
2
Thanks for your code. Could you elaborate the implementation details of ViLT-CAP you used as one of the baselines?
#4 opened 2 years ago by meiling-fdu
3
Problem running loading script
#3 opened 2 years ago by letitiabanana
2
What machine to use
#2 opened 2 years ago by Gary-code
0