mbzuai-oryx/Video-LLaVA

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Python

Issues

Comparison between running the model with grounding and without Grounding.
#19 opened 8 months ago by sykuann
0
Weight link not available
#18 opened 8 months ago by xizaoqu
0
requirments conflicts for whisper-at and torch 2.1.0 during installation
#9 opened a year ago by zhaozhenyu-newsbreak
4
License
#17 opened 9 months ago by Tortoise17
0
Time codes
#16 opened 9 months ago by Tortoise17
0
Is 8 cards 4090gpu (24g) enough to train your model?
#15 opened 9 months ago by longmalongma
0
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
#14 opened 9 months ago by meongeun
0
Demo on Gradio
#12 opened 9 months ago by Kamakshi8104
0
CLI Demo can be me made much simpler by adding more instructions in the README.md section
#13 opened 9 months ago by manishkumart
0
Flash Attention
#11 opened a year ago by ekazakos
0
Using ASR caption instead of heavy audio encoder can be more efficient
#10 opened a year ago by lucasjinreal
0
Error while loading tokenizer
#8 opened a year ago by mvish7
1
Training Details
#5 opened a year ago by Tanveer81
1
Segmentation Error
#7 opened a year ago by shrinivasait
0
Could you early release the evaluation scripts with vicuna model.
#1 opened a year ago by KerolosAtef
6
When will the code available?
#2 opened a year ago by Dantong88
1