Unable to reproduce open segmentation results for Pascal VOC
maxkulicki opened this issue · 3 comments
In your paper you report the mIoU of the X-Decoder (T) model to be 96.2. I tried to reproduce these results. I did not find the appropriate evaluation script so I implemented it myself based on the demo_semseg.py file. I'm using the BestSeg Tiny model and for every image I input the labels present in the ground truth segmentation of that image.
The mIoU I get this way is 51.6. In many cases the model does not find the target class and segments everything as "background".
Here is the function that I use to segment the image, based on demo_semseg.py.
def segment_image(model, image_ori, classes):
with torch.no_grad():
model.model.sem_seg_head.predictor.lang_encoder.get_text_embeddings(classes + ["background"], is_eval=True)
metadata = MetadataCatalog.get('demo')
model.model.metadata = metadata
model.model.sem_seg_head.num_classes = len(classes)
t = [transforms.Resize(512, interpolation=Image.BICUBIC)]
transform = transforms.Compose(t)
width = image_ori.size[-2]
height = image_ori.size[-1]
image = transform(image_ori)
image = np.asarray(image)
image = torch.from_numpy(image.copy()).permute(2, 0, 1).cuda()
batch_inputs = [{'image': image.squeeze(), 'height': height, 'width': width}]
outputs = model.forward(batch_inputs)
sem_seg = outputs[-1]['sem_seg'].max(0)[1]
classes_detected = sem_seg.unique()
classes_detected = [classes[i] for i in classes_detected]
return sem_seg, classes_detected
Am I doing something wrong here? Could you maybe share the code you used to obtain the reported result on VOC?
Thanks so much, I will take a look at the evaluation code.
Is there any update on this?
Hi @MaureenZOU, any update on this? The code is still not able to reproduce the performance reported in the paper..