Repository- transformers config missmatch

I get the following error:

AttributeError: 'LlavaConfig' object has no attribute 'mm_use_im_start_end'. Did you mean: 'mm_use_x_start_end'?

When running your newly updated repository.
Note that the config.json file in transformers does not have 'mm_use_im_start_end':
https://huggingface.co/LanguageBind/Video-LLaVA-7B/blob/main/config.json.

It is unclear if under your setup, you would like to use the video start/end tokens or use the video tokens.

Also note that in the same function, you are utilizing the now-depreciated DEFAULT_X_START_TOKEN.

Video-LLaVA/videollava/eval/video/run_inference_video_qa.py

Lines 49 to 53 in e93f492

    
           def get_model_output(model, video_processor, tokenizer, video, qs, args): 
        
               if model.config.mm_use_im_start_end: 
        
                   qs = DEFAULT_X_START_TOKEN['VIDEO'] + ''.join([DEFAULT_IMAGE_TOKEN]*8) + DEFAULT_X_END_TOKEN['VIDEO'] + '\n' + qs 
        
               else: 
        
                   qs = ''.join([DEFAULT_IMAGE_TOKEN]*8) + '\n' + qs

Best,
Orr

Sorry, we fixed that. Could you try it again? We do not use the video start/end tokens.

Yeah, I made the same edit on my local repo.
When do you use the _act eval inference file?
Also, I am getting a module import error (videollava) trying to resolve atm.
Have you tried running instruction tuning with the current repository?

When eval activitynet dataset we use _act.py to eval here.
pip install -e . to install videollava.
Yes, I have tested the training scripts.

	def get_model_output(model, video_processor, tokenizer, video, qs, args):
	if model.config.mm_use_im_start_end:
	qs = DEFAULT_X_START_TOKEN['VIDEO'] + ''.join([DEFAULT_IMAGE_TOKEN]*8) + DEFAULT_X_END_TOKEN['VIDEO'] + '\n' + qs
	else:
	qs = ''.join([DEFAULT_IMAGE_TOKEN]*8) + '\n' + qs