PKU-YuanGroup/Video-LLaVA

Repository- transformers config missmatch

orrzohar opened this issue · 3 comments

I get the following error:

AttributeError: 'LlavaConfig' object has no attribute 'mm_use_im_start_end'. Did you mean: 'mm_use_x_start_end'?

When running your newly updated repository.
Note that the config.json file in transformers does not have 'mm_use_im_start_end':
https://huggingface.co/LanguageBind/Video-LLaVA-7B/blob/main/config.json.

It is unclear if under your setup, you would like to use the video start/end tokens or use the video tokens.

Also note that in the same function, you are utilizing the now-depreciated DEFAULT_X_START_TOKEN.

def get_model_output(model, video_processor, tokenizer, video, qs, args):
if model.config.mm_use_im_start_end:
qs = DEFAULT_X_START_TOKEN['VIDEO'] + ''.join([DEFAULT_IMAGE_TOKEN]*8) + DEFAULT_X_END_TOKEN['VIDEO'] + '\n' + qs
else:
qs = ''.join([DEFAULT_IMAGE_TOKEN]*8) + '\n' + qs

Best,
Orr

Sorry, we fixed that. Could you try it again? We do not use the video start/end tokens.

Yeah, I made the same edit on my local repo.
When do you use the _act eval inference file?
Also, I am getting a module import error (videollava) trying to resolve atm.
Have you tried running instruction tuning with the current repository?

When eval activitynet dataset we use _act.py to eval here.
pip install -e . to install videollava.
Yes, I have tested the training scripts.