/LVOS

BSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

[ICCV 2023] LVOS: A Benchmark for Long-term Video Object Segmentation

[Home Page] [Open Access] [arXiv]

LVOS is a benchmark for long-term video object segmentation. LVOS consists of 220 videos, 120 for training(annotations public), 50 for validation(annotations public) and 50 for testing(annotations unpublic). LVOS provides each video with high-quality and dense pixel-wise annotation.

LVOS overview

Dataset

Download LVOS dataset from Google Drive ( Train | Eval | Test ), Baidu Drive ( Train | Eval | Test ), or Kaggle (Train | Eval | Test ).

After unzipping image data, please download meta jsons from Google Drive | Baidu Drive | Kaggle and put them under corresponding floder.

For the language caption, please download the meta jsons from Google Drive | Baidu Drive | Kaggle and put them under corresponding floder.

Organize as follows:

{LVOS ROOT}
|-- train
    |-- JPEGImages
        |-- video1
            |-- 00000001.jpg
            |-- ...
    |-- Annotations
        |-- video1
            |-- 00000001.jpg
            |-- ...
    |-- train_meta.json
    |-- train_expression_meta.json
|-- val
    |-- ...
|-- test
    |-- ...


x_meta.json
    {
        "videos": {
            "<video_id>": {
                "objects": {
                    "<object_id>": {
                        "frame_range": {
                            "start": <start_frame>,
                            "end": <end_frame>,
                            "frame_nums": <frame_nums>
                        }
                    }
                }
            }
        }
    }

x_expression_meta.json
    {
        "videos": {
            "<video_id>": {
                "objects": {
                    "<object_id>": {
                        "frame_range": {
                            "start": <start_frame>,
                            "end": <end_frame>,
                            "frame_nums": <frame_nums>
                        }
                        "caption": <caption>
                    }
                }
            }
        }
    }

# <object_id> is the same as the pixel values of object in annotated segmentation PNG files.
# <frame_id> is the 5-digit index of frame in video, and not necessary to start from 0.
# <start_frame> is the  start frame id of target object.
# <end_frame> is the  end frame id of target object.
# <frame_nums> is the number of existing frames of target object.
# <caption> is the expression to describe the target object.

Evaluation

We use DDMemory as the example model to analyze LVOS. For some reasons, DDMemory is unavailable now. (DDMemory will come soon). We take advanteage of AOT-T as an alternative. You can download the result from Google Drive

Please our evaluation toolkits to assess your model's result on validation set. See this repository for more details on the usage of toolkits.

For test set, please use the CodaLab server for convenience of evaluating your own algorithms.

APIs

We released the tools and test scripts in this repository. Click on this link for more information.

License

  • The data of LVOS is released for non-commercial research purpose only.
  • All videos and images are from VOT-LT 2019 and LaSOT , which are not property of Fudan. Fudan is not responsible for the content nor the meaning of these videos and images.

Citation

InProceedings{Hong_2023_ICCV,
    author    = {Hong, Lingyi and Chen, Wenchao and Liu, Zhongying and Zhang, Wei and Guo, Pinxue and Chen, Zhaoyu and Zhang, Wenqiang},
    title     = {LVOS: A Benchmark for Long-term Video Object Segmentation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {13480-13492}
}