Task Definitions and Related Tasks

I am mainly gathering works on motion segmentation in autonomous driving with the motivation it can help researchers understand better the task and its relevant ones.

Motion Segmentation: pixel-wise classification of the scene to moving/static, and its extension to instance-wise segmentation.
Zero-shot Video Object Segmentation: Segmentation of visual and motion salient objects in a video sequence as defined on DAVIS benchmark. Zero-shot, indicates no prior initialization required. It is also called unsupervised-VOS or Primary object segmentation in the literature.
Few-shot Video Object Segmentation: Tracking the segmented objects in a video sequence using an initialization mask. Few/One-shot indicates the need for an initialization for the tracking method, and is also called semi-supervised-VOS in the literature.

Each of these tasks have methods that are trained fully supervised and self supervised. Each of them as well can be categorized into pixel-wise or instance-wise segmentation. I prefer to use the term Zero-shot-VOS instead of Unsupervised-VOS as it can be ambiguous whether it indicates no labelled training data or just no initialization in the video sequence.

I am mainly focusing in the paper collection on:

Deep Motion Segmentation (specifically in Autonomous Driving application).
The related task for zero-shot segmentation (general-purpose video object segmentation).

Zero-shot Video Object Segmentation

Datasets and Benchmarks

SegTrack V2
DAVIS:
- Pixel-wise segmentation: 2016 Unsupervised Benchmark
- Instance-wise segmentation: 2017 Unsupervised Benchmark (using the 2019 paper with updated unsupervised segmentation definition and annotations)

Methods

Fully Supervised

Pixel-wise Segmentation

SFL: Joint Flow Estimation and Motion Segmentation.
MPNet: Use of Optical flow encoded as RGB for learning Motion Segmentation.
FusionSeg: Two-stream Motion Segmentation
LVO: Two-stream with visual Memory (bi-directional Conv-GRU)
MotAdapt: Teacher-student adaptation
PDB:
LSMO:
COSNet: Co-Attention
Anchor Diffusion:
MatNet: Two-stream with attention fusion on multiple levels.
Epo-Net: Epipolar Constraints violation as indication of motion salient objects.

Instance-wise Segmentation

RVOS:
AGS:

Self Supervised

Instance-wise Segmentation

MUG-W

Deep Motion Segmentation in AD

Datasets and Benchmarks

Methods

Fully Supervised

Pixel-wise Segmentation

SMSNet - IROS'17 [ Paper, Code ]
MODNet - NeuripsW'17, ITSC'18 [ Paper ]
Real-time Motion Segmentation - IROS'18 [ Paper ]
FuseMODNet - ICCVW'19 [ Paper ]

Instance-wise Segmentation

InstanceMotSeg - NeuripsW'20 [ Paper ]
Video Class Agnostic Segmentation - Arxiv [ Paper, Code ]

Self Supervised

SFMNet [ Paper ]
Competitive Collaboration Framework [ Paper ]

Instance-wise Segmentation

Instance-wise Motion and Depth [ Paper ]

Notes:

If you want to add your paper you can create an issue.

MSiam/Deep-Motion-Segmentation-AD-Papers

Task Definitions and Related Tasks

Zero-shot Video Object Segmentation

Datasets and Benchmarks

Methods

Fully Supervised

Pixel-wise Segmentation

Instance-wise Segmentation

Self Supervised

Instance-wise Segmentation

Deep Motion Segmentation in AD

Datasets and Benchmarks

Methods

Fully Supervised

Pixel-wise Segmentation

Instance-wise Segmentation

Self Supervised

Instance-wise Segmentation

Notes: