Datasets for semi-supervised video object segmentation

Question

Datasets for semi-supervised video object segmentation

z-jiaming opened this issue a year ago · 6 comments

Could you please tell me how to convert json to mask in DAVIS-format, like first_frame_annotations.json?
And how to convert the mask prediction to json for evaluating?

If possible, it would greatly facilitate BURST's research in VOS.

Thanks a lot!!!

Answer 1 · 2023-08-14T14:47:12.000Z

Hi, you can inspect the format by opening up the JSON file in Firefox. It will show you how the lists and dictionaries are organized. The format is also documented in ANNOTATION_FORMAT.md. To see how you can parse the RLE-encoded mask into an image see this method: https://github.com/Ali2500/BURST-benchmark/blob/main/burstapi/utils.py#L21

Answer 2 · 2023-08-15T08:21:34.000Z

I found that the video numbers in first_frame_annotations.json and all_classes.json of test are different in ArgoVerse.
What's the reason for this?

Thanks!

Answer 3 · 2023-08-15T08:53:13.000Z

The test set contains some videos which do not have any ground-truth annotations. Consequently, these videos are not present in the first_frame_annotations.json file.

Answer 4 · 2023-08-15T08:56:10.000Z

So, if my predictions.json dosen't contain these removed video, the eval result is also correct?

Answer 5 · 2023-08-15T08:57:43.000Z

Yes it should be.

Answer 6 · 2023-08-15T08:58:34.000Z

Got it. Thanks!