How to use demo.py

Question

How to use demo.py

Zhentao-Liu opened this issue 2 years ago · 8 comments

I download your repo and the pretrained model mae_pretrain_vit_base.pth, and run demo.py, however, after I load the model, it does not exist 'config' key and 'state_dict' in the dict. how to fix it?

Answer 1 · 2022-11-05T13:58:37.000Z

Did you run the following code?

python3 demo.py --checkpoint=./weights/simpleclick_models/cocolvis_vit_huge.pth --gpu 0

It has no issue for me. Could you provide the detailed error info?

Answer 2 · 2023-01-17T14:32:00.000Z

@qinliuliuqin I can use negative click point for this model ?

Answer 3 · 2023-01-17T14:51:03.000Z

@qinliuliuqin I can use negative click point for this model ?

Yes, of course.

Answer 4 · 2023-01-17T16:40:46.000Z

@qinliuliuqin
Thank you. I just test demo. Right click is nagative point.
I have some other question :
1, How you create positive , negative , Prev. Mask and gt for training ?
2, In paper, Input are image + (Clicks + Prev. Mask). What is Prev. Mask . Is binary image of previous predict ?
3, What are ['NoBRS', 'RGB-BRS', 'DistMap-BRS', 'f-BRS-A', 'f-BRS-B', 'f-BRS-C'] ?

Answer 5 · 2023-01-18T03:41:35.000Z

@qinliuliuqin Thank you. I just test demo. Right click is nagative point. I have some other question : 1, How you create positive , negative , Prev. Mask and gt for training ? 2, In paper, Input are image + (Clicks + Prev. Mask). What is Prev. Mask . Is binary image of previous predict ? 3, What are ['NoBRS', 'RGB-BRS', 'DistMap-BRS', 'f-BRS-A', 'f-BRS-B', 'f-BRS-C'] ?

Hi @ThorPham, Thanks for your questions.

This line shows how to create pos and neg clicks during training. This line shows that we concatenate the image and the previous mask (i.e. a probability map), along with the click masks, as the network input.
See here. The previous mask is a probability map obtained by the model in the evaluation mode.
We only tested 'NoBRS' mode. Forget about other modes-they are inherited from RITM.

Answer 6 · 2023-01-18T08:08:18.000Z

@qinliuliuqin thank you for your support.
What is preprocessing of click point . It's mask binary mask or you create distance map . And how to you add it in model.
In paper, I see you concat image + prev mask . Do you add click point ?

Answer 7 · 2023-01-19T01:14:08.000Z

@qinliuliuqin thank you for your support. What is preprocessing of click point . It's mask binary mask or you create distance map . And how to you add it in model. In paper, I see you concat image + prev mask . Do you add click point ?

Hi @ThorPham, the clicks (i.e coordinates) are encoded as a 2-channel binary mask, one for positive clicks and one for negative clicks. Each click is represented as a disk on the binary mask. We concatenate the click mask (2 channels), prev mask (1 channel), and RGB image (3 channels) to form a 6-channel input. Since we want to reuse the pretrained ViT, whose patch embedding layer only accepts 3-channel input, we add one more patch embedding layer and split the 6-channel input into two groups (each group has 3 channels, as shown in Fig. 1). In this way, we can turn the plain ViT backbone into an iSeg backbone with minimal changes.

Answer 8 · 2023-01-20T13:56:17.000Z

@qinliuliuqin Thank you so much.