salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Jupyter NotebookBSD-3-Clause
Issues
- 1
blip_vqa error
#211 opened by AIWASS23 - 3
ITM Loss Stuck at 0.63
#200 opened by bfan1256 - 1
FileNotFoundError: [Errno 2] No such file or directory: 'export/share/datasets/vision/coco/images/val2014/COCO_val2014_000000184613.jpg'
#218 opened by Jingut - 5
No scores of VQA evaluation
#181 opened by p1k0pan - 0
How to train itm from itc
#217 opened by Raion-Shin - 1
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group (train_caption.py)
#194 opened by Y-HuiMing-Y - 0
- 0
I have created a multimodal large model technology exchange group,welcome to join us.
#215 opened by feihuamantian - 0
Caption on ImageNet-Dogs
#214 opened by LouisDong95 - 2
- 0
knowledge distillation
#212 opened by sssssshf - 4
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
#176 opened by TheOneTrueGuy - 1
Error while running Colab demo
#202 opened by staru09 - 18
The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0
#165 opened by Peter-D-James - 0
How to use roberta as the decoder
#209 opened by xiweideng - 7
Can BLIP generate more words image caption?
#175 opened by uestcMeng - 0
- 0
Question or bug in blip_pretrain.py
#207 opened by LiGuo12 - 1
stable-diffusion RuntimeError: Couldn't fetch BLIP.
#201 opened by saiheitor - 0
How to retrive the raw attention scores or logits from blip model ( image captioning)
#206 opened by umme17 - 0
I want to use the existing image-text pedestrian dataset and finetune the BLIP model. Should I use pre-trained checkpoints weights or finetuned checkpoints weights?
#205 opened by shams2023 - 0
Image-Text Retrieval
#204 opened by mjjc111 - 0
LAION 115M dataset has 11164.tar?
#203 opened by jacob-kang - 1
Convert BLIP model to TensorRT
#169 opened by Frostbite22 - 1
Blip Replicate Interface Is Down
#198 opened by hashnimo - 0
How to use the retrival large model for image-text prediction (model_large_retrieval_coco) ?
#199 opened by caydenwei - 0
web demo issue
#196 opened by hhzhao0525 - 5
I am having trouble running evaluation code
#189 opened by jyrana - 0
- 1
- 0
- 0
About the ViT of BLIP
#191 opened by LWShowTime - 0
Need clearly Understand of each checkpoint
#190 opened by p1k0pan - 0
相似图像生成的caption一样,该如何解决?
#188 opened by shams2023 - 2
一张3090卡去微调COCO检索,需要多长时间?
#186 opened by shams2023 - 0
Video subtitle generation
#187 opened by Levi-arch1 - 0
The pre-trained BLIP model is used directly to perform caption operation, but the generated caption effect is not good
#183 opened by shams2023 - 0
- 0
New ViT findings via registers (2309.16588)
#184 opened by Infinitay - 0
- 0
This error indicates that your module has parameters that were not used in producing loss
#180 opened by ericosmic - 0
- 6
demo.ipynb : RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0
#173 opened by Taiga10969 - 0
what is mean of 'question_states += [question_output.last_hidden_state[b]]*n'
#178 opened by ericosmic - 0
retrieve output not fix
#177 opened by ltm920716 - 0
Cosine between image_features and text_features taken from BLIP_Extractor_Features gives bad results
#174 opened by aTunass - 0
- 0
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
#168 opened by HWH-2000 - 0
Image-Text Matching result werid
#167 opened by jucic - 0