DAMO-NLP-SG/VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

PythonApache-2.0

Pinned issues

🚀 [Release Notes] 2024.10

#116 opened 2 months ago by lixin4ever

Open0

Issues

Can VideoLLaMA2.1-7B-AV perform inference on images?
#125 opened a month ago by sjghh
1
Finetune model inference error
#124 opened a month ago by thisurawz1
0
An error occurred while loading video.json and audio.json
#122 opened a month ago by sjghh
7
The base model used from hugging face for Audio Visual question answering is not at all working
#119 opened a month ago by asmit203
11
Can't hear the audio
#112 opened 2 months ago by sjghh
9
System Message Not Affecting VideoLLaMA2-7B's Responses
#121 opened a month ago by yuripetralia
4
output strangeness
#123 opened a month ago by babyta
1
Missing comparsion
#75 opened 4 months ago by LiquidAmmonia
1
Cannot reproduce results on vllava datasets
#81 opened 4 months ago by williamium3000
27
ValueError: The following `model_kwargs` are not used by the model: ['images_or_videos', 'modal_list']
#90 opened 3 months ago by CaffeyChen
2
Videochatgpt_gen link for Test_Human_Annotated_Captions is not valid
#91 opened 3 months ago by jun297
2
Can I run VideoLLaMA 1 in this repo?
#94 opened 3 months ago by jun297
1
No module named 'transformers'
#117 opened 2 months ago by 0sATs0
2
vision_tower load error?how to correctly load ckpt?
#118 opened 2 months ago by Cece1031
2
Error When Running Multiple-model Version Demo
#120 opened 2 months ago by shmooel28
1
Error while loading custom finetuned QLoRA model in 4 bit : size mismatch for model.mm_projector.readout.0.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([8388608, 1]).
#71 opened 5 months ago by ApoorvFrontera
3
Can I use a WAV file as input for inference?
#73 opened 2 months ago by FanBu02
2
Problem about processor in load_pretrained_model
#74 opened 4 months ago by ShuyUSTC
4
Error while loading Mixtral based SFT MoE model VideoLLaMA2-8x7B: SafetensorError: Error while deserializing header: InvalidHeaderDeserialization
#77 opened 4 months ago by ApoorvFrontera
5
Unable to load *ANY BASE MODEL* in 4bit
#78 opened 4 months ago by ApoorvFrontera
2
train and fine tune for audio-video
#79 opened 2 months ago by trahman8
3
Model keeps output "there is no sound/ I can not hear anything" when there is actual sound
#80 opened 2 months ago by qixueweigitbub
3
UnboundLocalError: local variable "video_path" referenced before assigment
#82 opened 2 months ago by acDante
4
how to do the inference with the finetune weights / model
#83 opened 2 months ago by thisurawz1
12
Deployment on huggingface endpoints
#86 opened 4 months ago by aliayub40995
2
Could you please advise when the checkpoint for the audio branch will be made public?
#87 opened 4 months ago by ymxyll
6
After fine-tuning, the model outputs repetitive phrases
#89 opened 3 months ago by Jackyzjz
4
Problem: Segmentation fault (core dumped)
#95 opened 2 months ago by CamellIyquitous
5
code for batch inference
#97 opened 3 months ago by zhangjic22
1
When will the audio branch be released?
#99 opened 2 months ago by XuecWu
3
Request for Inference Code on Custom Datasets
#101 opened 3 months ago by dongqi-me
6
Forward pass of the model - how to pass videos?
#105 opened 2 months ago by esh04
1
QLoRA fin-tunes a custom model with 4-bits, and inference the video, then we got :
#106 opened 2 months ago by BUAACY
1
AV ckpt inference error
#109 opened 2 months ago by kk94wang
2
How to load model model that was finetuned using qlora or lora?
#113 opened 2 months ago by marvlyngkhoi
1
Inference code does not work for videos
#114 opened 2 months ago by marvlyngkhoi
3
🚀 [Release Notes] 2024.10
#116 opened 2 months ago by lixin4ever
0
Audio branch
#69 opened 2 months ago by Morgott-The-Omen-King
2
VideoLLaMA2 performance gap on video benchmarks
#92 opened 2 months ago by zhuqiangLu
1
Json files of the MVSD-QA dataset
#100 opened 2 months ago by Hou9612
2
audio information
#102 opened 2 months ago by sjghh
1
What are the GPUs used in the fine-tuning stage？
#104 opened 2 months ago by BUAACY
1
Demo Question
#103 opened 2 months ago by sjghh
4
You are using a model of type mistral to instantiate a model of type videollama2_mistral. This is not supported for all configurations of models and can yield errors.
#107 opened 2 months ago by hufflepuff0596
1
Can we do the only text, image and text and video and text finetuning with lora in a one run
#84 opened 4 months ago by thisurawz1
4
Can videollama2 continue finetuning on my own dataset using 32 frames?
#93 opened 3 months ago by zhengrongz
2
What is the difference between the 'base' and 'chat' versions of a model type?
#85 opened 4 months ago by Lanbai-eleven
2
Weird "Invalid base64-encoded" Error
#76 opened 4 months ago by lucasxu777
0
Maybe a bug on data preprocess
#70 opened 5 months ago by Weili-NLP
2
Minor typo in arxiv paper
#68 opened 5 months ago by QAQdev
1