[Question] How do you fine-tune LLaVA-NeXT on video data?
Opened this issue · 1 comments
DrVictorBenjamin commented
Question
I have a collection of videos and annotations. How do I fine-tune one of the LLaVA-NeXT models? I see the instructions for how to do so with traditional LLaVA but the directions for LLaVA-NeXT with video data are unclear. Thank you very much
DrVictorBenjamin commented
Ay after spending some time digging around, I came across this tutorial in case anyone else is searching for an answer: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LLaVA-NeXT-Video/Fine_tune_LLaVa_NeXT_Video_with_HFTrainer.ipynb
I haven't tried it yet but I will