[Question] How do you fine-tune LLaVA-NeXT on video data?

Question

[Question] How do you fine-tune LLaVA-NeXT on video data?

Opened this issue a month ago · 1 comments

Question

I have a collection of videos and annotations. How do I fine-tune one of the LLaVA-NeXT models? I see the instructions for how to do so with traditional LLaVA but the directions for LLaVA-NeXT with video data are unclear. Thank you very much

Answer 1 · 2024-12-11T05:21:18.000Z

Ay after spending some time digging around, I came across this tutorial in case anyone else is searching for an answer: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LLaVA-NeXT-Video/Fine_tune_LLaVa_NeXT_Video_with_HFTrainer.ipynb

I haven't tried it yet but I will