DAMO-NLP-SG/Video-LLaMA

Frame-aware?

jayavanth opened this issue · 1 comments

Hello! I wanted to know if this model is frame aware? Can I ask questions like "when does the person wearing yellow jacket appear in this video?" The demo on hugginface is giving me inaccurate results for such queries

Thank you for your attention. Technically it is frame-aware because we add absolute frame positional embeddings over the frame tokens, however, as the training data for teaching the model to be aware of different frames is rare, this capability is supposed to be very weak.