huangb23/VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

PythonNOASSERTION

Issues

About training.
#34 opened 4 months ago by EdenGabriel
0
RuntimeError: The size of tensor a (147) must match the size of tensor b (293) at non-singleton dimension 3
#33 opened 4 months ago by Vijaysivadas
0
Code to generate Stage 3 dataset from ActivityNet or DiDeMo
#32 opened 4 months ago by anilbatra2185
0
stage2 features
#31 opened 5 months ago by simplewhite9
2
How much time does it take to extract features in stage 2 and what is the hardware used?
#25 opened 5 months ago by Maulog
4
Feature extraction code requirement
#30 opened 5 months ago by L4zyy
1
About Activitynet eval process
#29 opened 5 months ago by lixuefenfen
3
About evaluation
#28 opened 5 months ago by EdenGabriel
0
Id corrspondence
#27 opened 6 months ago by wayne3771
2
Low accuracy rate
#26 opened 6 months ago by wayne3771
5
Why did you use the only subset?
#23 opened 6 months ago by MSungK
1
NVIDIA Driver error - could not parse ModelProto
#24 opened 6 months ago by simran0112
2
Linking id to DiDeMo video path
#21 opened 6 months ago by ZhangYuanhan-AI
3
Training Warning
#20 opened 7 months ago by Tanveer81
1
Missing intern_clip_feat
#22 opened 7 months ago by Tanveer81
0
About lora duplication
#19 opened 7 months ago by yeppp27
6
can I simply query the model to locate the `highlight moment or the best moment` in the video?
#17 opened 7 months ago by dragen1860
2
You are using a model of type llama to instantiate a model of type VTimeLLM. This is not supported for all configurations of models and can yield errors. ?
#18 opened 7 months ago by dragen1860
1
Moment Localization Evaluation
#16 opened 7 months ago by Tanveer81
0
Are you working on exposing an inference endpoint on huggingface or replicate?
#15 opened 9 months ago by nwaughachukwuma
1
Will the test data and code in the paper be released?
#9 opened 9 months ago by hlz0606
1
Running VTimeLLM inference Offline
#14 opened 9 months ago by dengandong
0
chatglm3的中文理解能力怎么样？
#12 opened 9 months ago by lucasjinreal
3
Main differences between VTimeLLM and LLaVA
#13 opened 9 months ago by itruonghai
1
Tokenization mismatch
#10 opened 10 months ago by weiyuan-c
6
13B model ？
#8 opened 10 months ago by vhzy
1
RuntimeError: cu_seqlens_q must have shape (batch_size + 1)
#11 opened 10 months ago by KlayMa527
1
InternVID training dataset
#4 opened 10 months ago by LengSicong
2
question about missing features?
#3 opened 10 months ago by vhzy
1
when training available?
#1 opened a year ago by vhzy
2