Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild
Zhaoyun Yin, Jia Zheng, Weixin Luo, Shenhan Qian, Hanling Zhang, Shenghua Gao.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[arXiv] [Paper] [Supp. Material]
Create the environment
# create conda env
conda create -n ivosw python=3.7
# activate conda env
conda activate ivosw
# install pytorch
conda install pytorch=1.3 torchvision
# install other dependencies
pip install -r requirements.txt
We adopt MANet, IPN, and ATNet as the VOS algorithms. Please follow the instructions to install the dependencies.
git clone https://github.com/yuk6heo/IVOS-ATNet.git VOS/ATNet
git clone https://github.com/lightas/CVPR2020_MANet.git VOS/MANet
git clone https://github.com/zyy-cn/IPN.git VOS/IPN
- DAVIS 2017 Dataset
- Download the data and human annotated scribbles here.
- Place
DAVIS
folder intoroot/data
.
- YouTube-VOS Dataset
Create a DAVIS-like structure of YouTube-VOS by running the following commands:
python datasets/prepare_ytbvos.py --src path/to/youtube_vos --scb path/to/scribble_dir
For evaluation, please download the pretrained agent model and quality assessment model, then place them into root/weights
and run the following commands:
python eval_agent_{atnet/manet/ipn}.py with setting={oracle/wild} dataset={davis/ytbvos} method={random/linspace/worst/ours}
The results will be stored in results/{VOS}/{setting}/{dataset}/{method}/summary.json
Note: The results may fluctuate slightly with different versions of networkx, which is used by davisinteractive to generate simulated scribbles.
First, prepare the data used to train the agent by downloading reward records and pretrained experience buffer, place them into root/train
, or generate them from scratch:
python produce_reward.py
python pretrain_agent.py
To train the agent:
python train_agent.py
To train the segmentation quality assessment model:
python generate_data.py
python quality_assessment.py
@inproceedings{IVOSW,
title = {Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild},
author = {Zhaoyuan Yin and
Jia Zheng and
Weixin Luo and
Shenhan Qian and
Hanling Zhang and
Shenghua Gao},
booktitle = {CVPR},
year = {2021}
}
The code is released under the MIT license.