RenShuhuai-Andy/TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

PythonBSD-3-Clause

Issues

Ask for reproducing
#40 opened 6 months ago by HYOJINPARK
6
How to resume training?
#45 opened 3 months ago by GroundMoRe
0
Experiments on ActivityNet-Captions
#44 opened 4 months ago by minjoong507
0
q-former和llama的词典不一样，一个是bert，一个是llama。能通用吗？
#43 opened 4 months ago by guantao18
1
Question about the Text Input to the LLM
#42 opened 5 months ago by ShramanPramanick
2
不能回答中文
#41 opened 6 months ago by wublubdubdaxml
1
Inference with audio
#29 opened 6 months ago by lakshya-frontera
2
Long video test results did not meet expectations
#38 opened 6 months ago by ffiioonnaa
2
Discussion : Steps for swapping Llama 2 with Llama 3
#37 opened 6 months ago by rahulkrprajapati
1
Data type not aligned
#39 opened 6 months ago by KKKLeon
1
用自己的数据集finetune，如何在train的过程中进行eval？
#32 opened 7 months ago by changqinyao
1
Can this model do qa tasks？
#26 opened 7 months ago by leexinhao
2
Could you test TimeChat on the EgoShema dataset?
#34 opened 7 months ago by EricLina
2
Asking for the Fine-tuned Checkpoint
#35 opened 7 months ago by minjoong507
2
Weight for QA benchmarks
#36 opened 7 months ago by NIneeeeeem
3
Why the result of temporal video grounding is always the multiple of 5?
#33 opened 7 months ago by zhengrongz
5
Question about the output of the time-aware frame encoder
#28 opened 7 months ago by Mingxiao-Li
2
Questions about the provided fine-tuning model parameters
#30 opened 7 months ago by LanXingXuan
1
Based transformers version needed for modifying models/modeling_llama.py
#31 opened 7 months ago by yeahjack
3
Question about fune-tune
#25 opened 7 months ago by zhengxingmao
7
Question about prompt
#20 opened 8 months ago by Ironieser
5
What is the relationship between segment and timetoken？
#17 opened 9 months ago by sunwhw
3
When conducting SFT experiments, setting batch_size_train to 1 or 2 has the same memory usage.
#27 opened 8 months ago by tiesanguaixia
0
For different video datasets, is the frame density always drawn at intervals of 1 second?
#2 opened a year ago by DuoLong
5
the performance is very low on my own dataset.
#22 opened 8 months ago by onlyonewater
5
Subset of YT-Temporal
#24 opened 8 months ago by patrick-tssn
1
Question about batch size
#23 opened 8 months ago by gyxxyg
1
Question about the tokenizer
#19 opened 9 months ago by gyxxyg
5
Experiment-related question
#21 opened 9 months ago by zhaodongliang678
3
RAM and VRAM requirement
#13 opened 9 months ago by Coronal-Halo
2
Question about prompts.
#18 opened 9 months ago by gyxxyg
2
Inquiry on training cost
#16 opened 9 months ago by HenryHZY
2
Demo can‘t show the same desult
#15 opened 9 months ago by xiaoxiaoli666
1
Bad performance of Charades
#14 opened 9 months ago by soyeonhong
1
Do we need to crop the HiREST videos?
#10 opened 10 months ago by yeliudev
14
Seeking Clarification about Fine-tuning Datasets
#12 opened 10 months ago by ShramanPramanick
2
Details of sliding qformer operation
#11 opened 10 months ago by jihwanp
1
the generalization performance is bad when testing on custom videos.
#8 opened 10 months ago by dragen1860
1
torch.load raise TypeError: 'strict' is an invalid keyword argument for Unpickler()
#9 opened a year ago by wwq66
4
Error in loading Video-LLaMA-2-7b_Finetuned
#7 opened a year ago by dragen1860
1
When will the checkpoint and demo scripts be released?
#5 opened a year ago by Hugh0120
2
how to evaluation on activitynet-DVC?
#6 opened a year ago by TXH-mercury
3
UnsatisfiableError
#4 opened a year ago by LarryLeeee
4
Checkpoints to run demo and dataset
#3 opened a year ago by fazliimam
1
A very good video-related work, it is convenient to open source the data set？
#1 opened a year ago by Xujianzhong
1