This is the official implementation of the CVMN paper:

Primary LanguagePython

Unsupervised Domain Adaptation for Referring Semantic Segmentation

This is the official implementation of the CVMN paper:


First, clone the repo locally:

git clone https://github.com/FenriartS/CVMN

Then, install PyTorch 1.8 and torchvision 0.9:

conda install pytorch==1.8.0 torchvision==0.9.0

Install pycocotools

conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
pip install git+https://github.com/youtubevos/cocoapi.git#"egg=pycocotools&subdirectory=PythonAPI"

If you encounter the problem of missing ytvos.py file, you can manually download the file from here and put it in the installed pycocotools folder.

Compile DCN module(requires GCC>=5.3, cuda>=10.0)

cd models/dcn
python setup.py build_ext --inplace


Download and extract 2021 version of Refer-Youtube-VOS train images from RVOS. Follow the instructions here to download A2D-Sentences dataset.


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --backbone resnet101/50 --ytvos_path /path/to/ytvos --masks --pretrained_weights /path/to/pretrained_path --output_dir /path/to/output_dir


python inference.py --model_path /path/to/model_weights