Pixel-level Contrastive Learning of Driving Videos with Optical Flow

By Tomoya Takahashi, Shingo Yashima, Kohta Ishikawa, Ikuro Sato, Rio Yokota.

This repo is an official implementation of "Pixel-level Contrastive Learning of Driving Videos with Optical Flow" on PyTorch.

Introduction

In this work, we improve the accuracy of self-supervised learning on driving data by combing pixel-wise contrastive learning (PixPro) with optical flow. Unlike most self-supervised methods, PixPro is trained on pixel-level pretext tasks, which yields better accuracy on downstream tasks requiring dense pixel predictions. However, PixPro does not consider the large change in scale of objects, commonly found in driving data. We show that by incorporating optical flow into the pixel-wise contrastive pre-training, we can improve the performance of downstream tasks such as semantic segmentation on CityScapes. We found that using the optical flow between temporarily distant frames can help learn the invariance between large scale changes, which allows us to exceed the performance of the original PixPro method.

Architecture of the Our methods.

Citation

@InProceedings{Takahashi_2023_CVPR,
    author    = {Takahashi, Tomoya and Yashima, Shingo and Ishikawa, Kohta and Sato, Ikuro and Yokota, Rio},
    title     = {Pixel-Level Contrastive Learning of Driving Videos With Optical Flow},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2023},
    pages     = {3179-3186}
}

Main Results

PixPro with Optical Flow pre-trained models

Epochs	Arch	Frames	Optical Flow	Download
2000	ResNet-50	1		script \| model
2000	ResNet-50	2	✔️	script \| model
2000	ResNet-50	6	✔️	script \| model

CityScapes Semantic Segmentation

The results below show the average mIoU score over five training runs on downstream tasks.

config: config

Method	Epochs	Arch	PreDataset	Frames	mIOU	Download
Supervised	-	ResNet-50	ImageNet	-	61.2	-
BYOL	1000	ResNet-50	ImageNet	-	60.0	-
PixPro	100	ResNet-50	ImageNet	-	58.4	-
PixPro	2000	ResNet-50	BDD100k	1	53.0	-
PixPro with OF (Ours)	2000	ResNet-50	BDD100k	6	53.4	-

Getting started

Requirements

At present, we have not checked the compatibility of the code with other versions of the packages, so we only recommend the following configuration.

Python 3.8.6
PyTorch == 1.8.2
Torchvision == 0.9.2
CUDA == 10.2
NCCL == 2.7.3
Open MPI == 4.0.4
Other dependencies

(If you want to fully replicate my current Python environment, please use requirements_all.txt.)

PrePare BDD100K and Optical Flow

BDD100k Dataset
See img dataset inst
Optical Flow
Please follow one of the following explanations depending on your environment.

Enough storage space for data
If you have enough storage space for the data, we recommend create optical flow dataset.

See optical flow dataset inst

No Enugh storage space for data
If you do not have enough storage space for the data, you can still run by simply downloading the pre-trained model for RAFT with the following steps.

cd ~
git clone https://github.com/rioyokotalab/RAFT.git
cd RAFT
pyenv local pixpro-wt-of-cu102-wandb # pyenv virtualenv for this repo
bash scripts/download_models.sh
mkdir ${BDD100k-Path}/pretrained_flow
cp -ra models ${BDD100k-Path}/pretrained_flow

Installation

We recommand using pyenv virtual env to setup the experimental environments.

# Create working directory
mkdir ~/pixpro_wt_of_pj
cd ~/pixpro_wt_of_pj

# Create environment
pyenv virtualenv 3.8.6 pixpro-wt-of-cu102-wandb
pyenv local pixpro-wt-of-cu102-wandb

# If you are managing with modulefiles, please do the following.
module load cuda/10.2 cudnn/8.2.1 nccl/2.7.3 openmpi/4.0.4

# Install PyTorch & Torchvision
pip install torch==1.8.2 torchvision==0.9.2 torchaudio==0.8.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu102

# Install apex
git clone https://github.com/NVIDIA/apex
cd apex
git checkout 8a7a332539809adcad88546f492945c4e752ff49
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ..

# Clone repo
git clone https://github.com/rioyokotalab/PixPro-with-OpticalFlow.git
cd ./PixPro-with-OpticalFlow

# Create soft link for data
mkdir data
ln -s ${BDD100k-Path}/bdd100k ./data/bdd100k
ln -s ${BDD100k-Path}/pretrained_flow ./data/pretrained_flow

# Install other requirements
pip install -r requirements.txt

Pretrain with PixPro using Optical Flow

# Train with PixPro base for 2000 epochs.
bash ./tools/pretrain_bdd100k_job_2000ep_nframe6_gpu16.sh

Transfer to CityScapes Semantic Segmentaion

cd ~/pixpro_wt_of_pj/PixPro-with-OpticalFlow

# Convert a pre-trained PixPro model to detectron2's format
cd transfer/detection
python convert_pretrain_to_d2.py ${Input-Checkpoint(.pth)} ./output.pkl  

cd ~/pixpro_wt_of_pj
# Create environment
pyenv virtualenv 3.8.6 detectron2-cu102-wandb
pyenv local detectron2-cu102-wandb

# If you are managing with modulefiles, please do the following.
module load cuda/10.2 cudnn/8.2.1 nccl/2.7.3 openmpi/4.0.4

# Install PyTorch & Torchvision
pip install torch==1.8.2 torchvision==0.9.2 torchaudio==0.8.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu102

# Install Detectron2
git clone https://github.com/rioyokotalab/detectron2
cd detectron2
git checkout dev-v0.6
pyenv local detectron2-cu102-wandb
pip install -e .
pip install git+https://github.com/mcordts/cityscapesScripts.git
pip install wandb

# Train detector with pre-trained PixPro model without finetune
cd projects/DeepLab
python train_net.py --config-file Cityscapes-SemanticSegmentation/deeplab_v3_R_50_myencoder_mg124_poly_40k_bs8.yaml --output ./output --model_path ./output.pkl --no_finetune --num-gpus 4

Evaluation code using detectron2

Supported for cityscapes semantic segmentaion, etc..

https://github.com/rioyokotalab/detectron2

Acknowledgement and Citing

Our testbed builds upon several existing publicly available codes. Specifically, we have modified and integrated the following code into this project:

Please use the following BibTeX entry.

PixPro

@article{xie2020propagate,
  title={Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning},
  author={Xie, Zhenda and Lin, Yutong and Zhang, Zheng and Cao, Yue and Lin, Stephen and Hu, Han},
  conference={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

RAFT

@article{raft,
  author     = {Zachary Teed and
                Jia Deng},
  title      = {{RAFT:} Recurrent All-Pairs Field Transforms for Optical Flow},
  journal    = {CoRR},
  volume     = {abs/2003.12039},
  year       = {2020},
  url        = {https://arxiv.org/abs/2003.12039},
  eprinttype = {arXiv},
  eprint     = {2003.12039},
  timestamp  = {Mon, 01 Feb 2021 18:33:24 +0100},
  biburl     = {https://dblp.org/rec/journals/corr/abs-2003-12039.bib},
  bibsource  = {dblp computer science bibliography, https://dblp.org}
}

Contributing to the project

Any pull requests or issues are welcomed.