model size mismatch

Question

model size mismatch

Xt117 opened this issue a year ago · 10 comments

Hi, I have finished ran the inference code. I can run the first model (pt) successful but when I used the second model(pt+ft), i got several errors like below:

Would you please update it? Thanks.

Answer 1 · 2023-08-07T12:56:58.000Z

Hi @Xt117 ,
would you mind provide more details regarding your settings? e.g., which checkpoints you are use? (clip / slowfast + clip)
which script you use?
thank.

Answer 2 · 2023-08-07T13:00:43.000Z

I used this checkpoint

and ran with 'main_gradio.py'.

Answer 3 · 2023-08-07T13:26:09.000Z

okay, I am investigating this issue now.

Answer 4 · 2023-08-07T13:38:25.000Z

@Xt117 , have found the reason, there is because I upload the mismatch checkpoint for pt+ft, sorry for the mistake, i have re-upload accordingly.
In the same google drive, with download link: https://drive.google.com/file/d/1gy8wKqA9gcYbk3tHewXX5qZ9SQAFhk6J/view?usp=drive_link.

The checkpoint size is ~150mb, can you please take a try and let me know?

Answer 5 · 2023-08-08T03:11:16.000Z

Hi, the model is ready.
But I ran the code using your example video, it got error when extract video feature like below

and the terminal showed "FileNotFoundError: [Errno 2] No such file or directory: './tmp/vid.npz'" while everything is fine in your demo.

Answer 6 · 2023-08-08T03:54:19.000Z

Hi @Xt117 , i have uploaded the repo with a tmp dir. can you manually create this tmp under univtg and try it again?

Answer 7 · 2023-08-08T12:11:39.000Z

Thanks. Now it ran successful.
I used my own video to generate the top5 interval, but the results show 5 similar shots

how can i get the top5 different shots of total video like your visualization?

Otherwise, when i used the slowfast+clip finetune model, it got error like below, seems that the shape is mismatch

Answer 8 · 2023-08-08T12:19:23.000Z

@Xt117 glad you can run successfully.
currently, the top similar shots are not under NMS, if you want to show five different shots, you should input these predicted windows into the nms function with a threshold. you can find my nms function in the utils i.e., https://github.com/showlab/UniVTG/blob/main/utils/temporal_nms.py
or implement it by yourself. I may update this detail later.

my current codes only support clip video features.
while the slowfast+clip finetune model is a more strong model, which inputs slowfast + clip features for video, thus we need to extract slowfast features based on https://github.com/linjieli222/HERO_Video_Feature_Extractor,
which are not included so far, I plan to include this in the next phase.

Answer 9 · 2023-08-08T12:40:45.000Z

OK, I will try the nms function. Thanks.
Looking forward to your update.

Answer 10 · 2023-08-09T16:19:27.000Z

close since have resolved the issues of mismatch model size.