DAMO-NLP-SG/VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
PythonApache-2.0
Pinned issues
Issues
- 1
Can VideoLLaMA2.1-7B-AV perform inference on images?
#125 opened by sjghh - 0
Finetune model inference error
#124 opened by thisurawz1 - 7
- 11
The base model used from hugging face for Audio Visual question answering is not at all working
#119 opened by asmit203 - 9
Can't hear the audio
#112 opened by sjghh - 4
- 1
output strangeness
#123 opened by babyta - 1
Missing comparsion
#75 opened by LiquidAmmonia - 27
- 2
ValueError: The following `model_kwargs` are not used by the model: ['images_or_videos', 'modal_list']
#90 opened by CaffeyChen - 2
- 1
Can I run VideoLLaMA 1 in this repo?
#94 opened by jun297 - 2
No module named 'transformers'
#117 opened by 0sATs0 - 2
vision_tower load error?how to correctly load ckpt?
#118 opened by Cece1031 - 1
Error When Running Multiple-model Version Demo
#120 opened by shmooel28 - 3
Error while loading custom finetuned QLoRA model in 4 bit : size mismatch for model.mm_projector.readout.0.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([8388608, 1]).
#71 opened by ApoorvFrontera - 2
Can I use a WAV file as input for inference?
#73 opened by FanBu02 - 4
Problem about processor in load_pretrained_model
#74 opened by ShuyUSTC - 5
Error while loading Mixtral based SFT MoE model VideoLLaMA2-8x7B: SafetensorError: Error while deserializing header: InvalidHeaderDeserialization
#77 opened by ApoorvFrontera - 2
Unable to load *ANY BASE MODEL* in 4bit
#78 opened by ApoorvFrontera - 3
train and fine tune for audio-video
#79 opened by trahman8 - 3
Model keeps output "there is no sound/ I can not hear anything" when there is actual sound
#80 opened by qixueweigitbub - 4
- 12
- 2
Deployment on huggingface endpoints
#86 opened by aliayub40995 - 6
Could you please advise when the checkpoint for the audio branch will be made public?
#87 opened by ymxyll - 4
- 5
Problem: Segmentation fault (core dumped)
#95 opened by CamellIyquitous - 1
code for batch inference
#97 opened by zhangjic22 - 3
When will the audio branch be released?
#99 opened by XuecWu - 6
Request for Inference Code on Custom Datasets
#101 opened by dongqi-me - 1
Forward pass of the model - how to pass videos?
#105 opened by esh04 - 1
QLoRA fin-tunes a custom model with 4-bits, and inference the video, then we got :
#106 opened by BUAACY - 2
AV ckpt inference error
#109 opened by kk94wang - 1
- 3
Inference code does not work for videos
#114 opened by marvlyngkhoi - 0
🚀 [Release Notes] 2024.10
#116 opened by lixin4ever - 2
Audio branch
#69 opened by Morgott-The-Omen-King - 1
- 2
Json files of the MVSD-QA dataset
#100 opened by Hou9612 - 1
audio information
#102 opened by sjghh - 1
What are the GPUs used in the fine-tuning stage?
#104 opened by BUAACY - 4
Demo Question
#103 opened by sjghh - 1
You are using a model of type mistral to instantiate a model of type videollama2_mistral. This is not supported for all configurations of models and can yield errors.
#107 opened by hufflepuff0596 - 4
Can we do the only text, image and text and video and text finetuning with lora in a one run
#84 opened by thisurawz1 - 2
- 2
What is the difference between the 'base' and 'chat' versions of a model type?
#85 opened by Lanbai-eleven - 0
Weird "Invalid base64-encoded" Error
#76 opened by lucasxu777 - 2
Maybe a bug on data preprocess
#70 opened by Weili-NLP - 1
Minor typo in arxiv paper
#68 opened by QAQdev