Issues
- 0
About training.
#34 opened by EdenGabriel - 0
RuntimeError: The size of tensor a (147) must match the size of tensor b (293) at non-singleton dimension 3
#33 opened by Vijaysivadas - 0
- 2
stage2 features
#31 opened by simplewhite9 - 4
How much time does it take to extract features in stage 2 and what is the hardware used?
#25 opened by Maulog - 1
Feature extraction code requirement
#30 opened by L4zyy - 3
About Activitynet eval process
#29 opened by lixuefenfen - 0
About evaluation
#28 opened by EdenGabriel - 2
Id corrspondence
#27 opened by wayne3771 - 5
Low accuracy rate
#26 opened by wayne3771 - 1
Why did you use the only subset?
#23 opened by MSungK - 2
- 3
Linking id to DiDeMo video path
#21 opened by ZhangYuanhan-AI - 1
Training Warning
#20 opened by Tanveer81 - 0
Missing intern_clip_feat
#22 opened by Tanveer81 - 6
About lora duplication
#19 opened by yeppp27 - 2
can I simply query the model to locate the `highlight moment or the best moment` in the video?
#17 opened by dragen1860 - 1
You are using a model of type llama to instantiate a model of type VTimeLLM. This is not supported for all configurations of models and can yield errors. ?
#18 opened by dragen1860 - 0
Moment Localization Evaluation
#16 opened by Tanveer81 - 1
Are you working on exposing an inference endpoint on huggingface or replicate?
#15 opened by nwaughachukwuma - 1
- 0
Running VTimeLLM inference Offline
#14 opened by dengandong - 3
chatglm3的中文理解能力怎么样?
#12 opened by lucasjinreal - 1
Main differences between VTimeLLM and LLaVA
#13 opened by itruonghai - 6
Tokenization mismatch
#10 opened by weiyuan-c - 1
13B model ?
#8 opened by vhzy - 1
- 2
InternVID training dataset
#4 opened by LengSicong - 1
question about missing features?
#3 opened by vhzy - 2
when training available?
#1 opened by vhzy