showlab/UniVTG

model size mismatch

Xt117 opened this issue · 10 comments

Xt117 commented

Hi, I have finished ran the inference code. I can run the first model (pt) successful but when I used the second model(pt+ft), i got several errors like below:

截屏2023-08-07 20 28 40

Would you please update it? Thanks.

Hi @Xt117 ,
would you mind provide more details regarding your settings? e.g., which checkpoints you are use? (clip / slowfast + clip)
which script you use?
thank.

Xt117 commented

I used this checkpoint
截屏2023-08-07 20 58 06
and ran with 'main_gradio.py'.

okay, I am investigating this issue now.

@Xt117 , have found the reason, there is because I upload the mismatch checkpoint for pt+ft, sorry for the mistake, i have re-upload accordingly.
In the same google drive, with download link: https://drive.google.com/file/d/1gy8wKqA9gcYbk3tHewXX5qZ9SQAFhk6J/view?usp=drive_link.

The checkpoint size is ~150mb, can you please take a try and let me know?

Xt117 commented

Hi, the model is ready.
But I ran the code using your example video, it got error when extract video feature like below

截屏2023-08-08 11 04 16

and the terminal showed "FileNotFoundError: [Errno 2] No such file or directory: './tmp/vid.npz'" while everything is fine in your demo.

Hi @Xt117 , i have uploaded the repo with a tmp dir. can you manually create this tmp under univtg and try it again?

Xt117 commented

Thanks. Now it ran successful.
I used my own video to generate the top5 interval, but the results show 5 similar shots
截屏2023-08-08 20 09 30
how can i get the top5 different shots of total video like your visualization?

Otherwise, when i used the slowfast+clip finetune model, it got error like below, seems that the shape is mismatch
截屏2023-08-08 20 00 24

@Xt117 glad you can run successfully.
currently, the top similar shots are not under NMS, if you want to show five different shots, you should input these predicted windows into the nms function with a threshold. you can find my nms function in the utils i.e., https://github.com/showlab/UniVTG/blob/main/utils/temporal_nms.py
or implement it by yourself. I may update this detail later.

my current codes only support clip video features.
while the slowfast+clip finetune model is a more strong model, which inputs slowfast + clip features for video, thus we need to extract slowfast features based on https://github.com/linjieli222/HERO_Video_Feature_Extractor,
which are not included so far, I plan to include this in the next phase.

Xt117 commented

OK, I will try the nms function. Thanks.
Looking forward to your update.

close since have resolved the issues of mismatch model size.