Task Definitions and Related Tasks

I am mainly gathering works on motion segmentation in autonomous driving with the motivation it can help researchers understand better the task and its relevant ones.

  • Motion Segmentation: pixel-wise classification of the scene to moving/static, and its extension to instance-wise segmentation.
  • Zero-shot Video Object Segmentation: Segmentation of visual and motion salient objects in a video sequence as defined on DAVIS benchmark. Zero-shot, indicates no prior initialization required. It is also called unsupervised-VOS or Primary object segmentation in the literature.
  • Few-shot Video Object Segmentation: Tracking the segmented objects in a video sequence using an initialization mask. Few/One-shot indicates the need for an initialization for the tracking method, and is also called semi-supervised-VOS in the literature.

Each of these tasks have methods that are trained fully supervised and self supervised. Each of them as well can be categorized into pixel-wise or instance-wise segmentation. I prefer to use the term Zero-shot-VOS instead of Unsupervised-VOS as it can be ambiguous whether it indicates no labelled training data or just no initialization in the video sequence.

I am mainly focusing in the paper collection on:

  • Deep Motion Segmentation (specifically in Autonomous Driving application).
  • The related task for zero-shot segmentation (general-purpose video object segmentation).

Zero-shot Video Object Segmentation

Datasets and Benchmarks

  • SegTrack V2
  • DAVIS:
    • Pixel-wise segmentation: 2016 Unsupervised Benchmark
    • Instance-wise segmentation: 2017 Unsupervised Benchmark (using the 2019 paper with updated unsupervised segmentation definition and annotations)

Methods

Fully Supervised

Pixel-wise Segmentation

  • SFL: Joint Flow Estimation and Motion Segmentation.
  • MPNet: Use of Optical flow encoded as RGB for learning Motion Segmentation.
  • FusionSeg: Two-stream Motion Segmentation
  • LVO: Two-stream with visual Memory (bi-directional Conv-GRU)
  • MotAdapt: Teacher-student adaptation
  • PDB:
  • LSMO:
  • COSNet: Co-Attention
  • Anchor Diffusion:
  • MatNet: Two-stream with attention fusion on multiple levels.
  • Epo-Net: Epipolar Constraints violation as indication of motion salient objects.

Instance-wise Segmentation

  • RVOS:
  • AGS:

Self Supervised

Instance-wise Segmentation

  • MUG-W

Deep Motion Segmentation in AD

Datasets and Benchmarks

Methods

Fully Supervised

Pixel-wise Segmentation

  • SMSNet - IROS'17 [ Paper, Code ]
  • MODNet - NeuripsW'17, ITSC'18 [ Paper ]
  • Real-time Motion Segmentation - IROS'18 [ Paper ]
  • FuseMODNet - ICCVW'19 [ Paper ]

Instance-wise Segmentation

  • InstanceMotSeg - NeuripsW'20 [ Paper ]
  • Video Class Agnostic Segmentation - Arxiv [ Paper, Code ]

Self Supervised

  • SFMNet [ Paper ]
  • Competitive Collaboration Framework [ Paper ]

Instance-wise Segmentation

  • Instance-wise Motion and Depth [ Paper ]

Notes:

If you want to add your paper you can create an issue.