Pixel-level Contrastive Learning of Driving Videos with Optical Flow
By Tomoya Takahashi, Shingo Yashima, Kohta Ishikawa, Ikuro Sato, Rio Yokota.
This repo is an official implementation of "Pixel-level Contrastive Learning of Driving Videos with Optical Flow" on PyTorch.
Introduction
In this work, we improve the accuracy of self-supervised learning on driving data by combing pixel-wise contrastive learning (PixPro) with optical flow. Unlike most self-supervised methods, PixPro is trained on pixel-level pretext tasks, which yields better accuracy on downstream tasks requiring dense pixel predictions. However, PixPro does not consider the large change in scale of objects, commonly found in driving data. We show that by incorporating optical flow into the pixel-wise contrastive pre-training, we can improve the performance of downstream tasks such as semantic segmentation on CityScapes. We found that using the optical flow between temporarily distant frames can help learn the invariance between large scale changes, which allows us to exceed the performance of the original PixPro method.
Citation
@InProceedings{Takahashi_2023_CVPR,
author = {Takahashi, Tomoya and Yashima, Shingo and Ishikawa, Kohta and Sato, Ikuro and Yokota, Rio},
title = {Pixel-Level Contrastive Learning of Driving Videos With Optical Flow},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2023},
pages = {3179-3186}
}
Main Results
PixPro with Optical Flow pre-trained models
Epochs | Arch | Frames | Optical Flow | Download |
---|---|---|---|---|
2000 | ResNet-50 | 1 | script | model | |
2000 | ResNet-50 | 2 | ✔️ | script | model |
2000 | ResNet-50 | 6 | ✔️ | script | model |
CityScapes Semantic Segmentation
The results below show the average mIoU score over five training runs on downstream tasks.
- config: config
Method | Epochs | Arch | PreDataset | Frames | mIOU | Download |
---|---|---|---|---|---|---|
Supervised | - | ResNet-50 | ImageNet | - | 61.2 | - |
BYOL | 1000 | ResNet-50 | ImageNet | - | 60.0 | - |
PixPro | 100 | ResNet-50 | ImageNet | - | 58.4 | - |
PixPro | 2000 | ResNet-50 | BDD100k | 1 | 53.0 | - |
PixPro with OF (Ours) | 2000 | ResNet-50 | BDD100k | 6 | 53.4 | - |
Getting started
Requirements
At present, we have not checked the compatibility of the code with other versions of the packages, so we only recommend the following configuration.
- Python 3.8.6
- PyTorch == 1.8.2
- Torchvision == 0.9.2
- CUDA == 10.2
- NCCL == 2.7.3
- Open MPI == 4.0.4
- Other dependencies
(If you want to fully replicate my current Python environment, please use requirements_all.txt
.)
PrePare BDD100K and Optical Flow
-
BDD100k Dataset
See img dataset inst -
Optical Flow
Please follow one of the following explanations depending on your environment.
-
Enough storage space for data
If you have enough storage space for the data, we recommend create optical flow dataset. -
No Enugh storage space for data
If you do not have enough storage space for the data, you can still run by simply downloading the pre-trained model for RAFT with the following steps.cd ~ git clone https://github.com/rioyokotalab/RAFT.git cd RAFT pyenv local pixpro-wt-of-cu102-wandb # pyenv virtualenv for this repo bash scripts/download_models.sh mkdir ${BDD100k-Path}/pretrained_flow cp -ra models ${BDD100k-Path}/pretrained_flow
Installation
We recommand using pyenv virtual env to setup the experimental environments.
# Create working directory
mkdir ~/pixpro_wt_of_pj
cd ~/pixpro_wt_of_pj
# Create environment
pyenv virtualenv 3.8.6 pixpro-wt-of-cu102-wandb
pyenv local pixpro-wt-of-cu102-wandb
# If you are managing with modulefiles, please do the following.
module load cuda/10.2 cudnn/8.2.1 nccl/2.7.3 openmpi/4.0.4
# Install PyTorch & Torchvision
pip install torch==1.8.2 torchvision==0.9.2 torchaudio==0.8.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu102
# Install apex
git clone https://github.com/NVIDIA/apex
cd apex
git checkout 8a7a332539809adcad88546f492945c4e752ff49
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ..
# Clone repo
git clone https://github.com/rioyokotalab/PixPro-with-OpticalFlow.git
cd ./PixPro-with-OpticalFlow
# Create soft link for data
mkdir data
ln -s ${BDD100k-Path}/bdd100k ./data/bdd100k
ln -s ${BDD100k-Path}/pretrained_flow ./data/pretrained_flow
# Install other requirements
pip install -r requirements.txt
Pretrain with PixPro using Optical Flow
# Train with PixPro base for 2000 epochs.
bash ./tools/pretrain_bdd100k_job_2000ep_nframe6_gpu16.sh
Transfer to CityScapes Semantic Segmentaion
cd ~/pixpro_wt_of_pj/PixPro-with-OpticalFlow
# Convert a pre-trained PixPro model to detectron2's format
cd transfer/detection
python convert_pretrain_to_d2.py ${Input-Checkpoint(.pth)} ./output.pkl
cd ~/pixpro_wt_of_pj
# Create environment
pyenv virtualenv 3.8.6 detectron2-cu102-wandb
pyenv local detectron2-cu102-wandb
# If you are managing with modulefiles, please do the following.
module load cuda/10.2 cudnn/8.2.1 nccl/2.7.3 openmpi/4.0.4
# Install PyTorch & Torchvision
pip install torch==1.8.2 torchvision==0.9.2 torchaudio==0.8.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu102
# Install Detectron2
git clone https://github.com/rioyokotalab/detectron2
cd detectron2
git checkout dev-v0.6
pyenv local detectron2-cu102-wandb
pip install -e .
pip install git+https://github.com/mcordts/cityscapesScripts.git
pip install wandb
# Train detector with pre-trained PixPro model without finetune
cd projects/DeepLab
python train_net.py --config-file Cityscapes-SemanticSegmentation/deeplab_v3_R_50_myencoder_mg124_poly_40k_bs8.yaml --output ./output --model_path ./output.pkl --no_finetune --num-gpus 4
Evaluation code using detectron2
Supported for cityscapes semantic segmentaion, etc..
Acknowledgement and Citing
Our testbed builds upon several existing publicly available codes. Specifically, we have modified and integrated the following code into this project:
Please use the following BibTeX entry.
- PixPro
@article{xie2020propagate,
title={Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning},
author={Xie, Zhenda and Lin, Yutong and Zhang, Zheng and Cao, Yue and Lin, Stephen and Hu, Han},
conference={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2021}
}
- RAFT
@article{raft,
author = {Zachary Teed and
Jia Deng},
title = {{RAFT:} Recurrent All-Pairs Field Transforms for Optical Flow},
journal = {CoRR},
volume = {abs/2003.12039},
year = {2020},
url = {https://arxiv.org/abs/2003.12039},
eprinttype = {arXiv},
eprint = {2003.12039},
timestamp = {Mon, 01 Feb 2021 18:33:24 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2003-12039.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Contributing to the project
Any pull requests or issues are welcomed.