microsoft/X-Decoder

Unable to reproduce open segmentation results for Pascal VOC

maxkulicki opened this issue · 3 comments

In your paper you report the mIoU of the X-Decoder (T) model to be 96.2. I tried to reproduce these results. I did not find the appropriate evaluation script so I implemented it myself based on the demo_semseg.py file. I'm using the BestSeg Tiny model and for every image I input the labels present in the ground truth segmentation of that image.

The mIoU I get this way is 51.6. In many cases the model does not find the target class and segments everything as "background".

Here is the function that I use to segment the image, based on demo_semseg.py.

def segment_image(model, image_ori, classes):
    with torch.no_grad():
        model.model.sem_seg_head.predictor.lang_encoder.get_text_embeddings(classes + ["background"], is_eval=True)
        metadata = MetadataCatalog.get('demo')
        model.model.metadata = metadata
        model.model.sem_seg_head.num_classes = len(classes)

        t = [transforms.Resize(512, interpolation=Image.BICUBIC)]
        transform = transforms.Compose(t)

        width = image_ori.size[-2]
        height = image_ori.size[-1]
        image = transform(image_ori)
        image = np.asarray(image)
        image = torch.from_numpy(image.copy()).permute(2, 0, 1).cuda()

        batch_inputs = [{'image': image.squeeze(), 'height': height, 'width': width}]
        outputs = model.forward(batch_inputs)
        sem_seg = outputs[-1]['sem_seg'].max(0)[1]
        classes_detected = sem_seg.unique()
        classes_detected = [classes[i] for i in classes_detected]

    return sem_seg, classes_detected

Am I doing something wrong here? Could you maybe share the code you used to obtain the reported result on VOC?

Thanks so much, I will take a look at the evaluation code.

Is there any update on this?

Hi @MaureenZOU, any update on this? The code is still not able to reproduce the performance reported in the paper..