(News) Our paper received the Outstanding Paper Award in ICLR 2023!
This repository contains official code for Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching (ICLR 2023 oral).
- Download Taskonomy Dataset (tiny split) from the official github page https://github.com/StanfordVL/taskonomy/tree/master/data.
- You may download data of
depth_euclidean
,depth_zbuffer
,edge_occlusion
,keypoints2d
,keypoints3d
,normal
,principal_curvature
,reshading
,segment_semantic
, andrgb
. - (Optional) Resize the images and labels into (256, 256) resolution.
- To reduce the I/O bottleneck of dataloader, we stored data from all buildings in a single directory. The directory structure looks like:
<root>
|--<task1>
| |--<building1>_<file_name1>
| | ...
| |--<building2>_<file_name1>
| |...
|
|--<task2>
| |--<building1>_<file_name1>
| | ...
| |--<building2>_<file_name1>
| |...
|
|...
-
Create
data_paths.yaml
file and write the root directory path (<root>
in the above structure) bytaskonomy: PATH_TO_YOUR_TASKONOMY
. -
Install pre-requirements by
pip install -r requirements.txt
. -
Create
model/pretrained_checkpoints
directory and download BEiT pre-trained checkpoints to the directory.
- We used
beit_base_patch16_224_pt22k
checkpoint for our experiment.
python main.py --stage 0 --task_fold [0/1/2/3/4]
python main.py --stage 1 --task [segment_semantic/normal/depth_euclidean/depth_zbuffer/edge_texture/edge_occlusion/keypoints2d/keypoints3d/reshading/principal_curvature]
python main.py --stage 2 --task [segment_semantic/normal/depth_euclidean/depth_zbuffer/edge_texture/edge_occlusion/keypoints2d/keypoints3d/reshading/principal_curvature]
After the evaluation, you can print the test results by running python print_results.py
Our code refers the following repositores:
- Taskonomy
- timm
- BEiT: BERT Pre-Training of Image Transformers
- Vision Transformers for Dense Prediction
- Inverted Pyramid Multi-task Transformer for Dense Scene Understanding
- Hypercorrelation Squeeze for Few-Shot Segmentation
- Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation
If you find this work useful, please consider citing:
@inproceedings{kim2023universal,
title={Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching},
author={Donggyun Kim and Jinwoo Kim and Seongwoong Cho and Chong Luo and Seunghoon Hong},
booktitle={International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=88nT0j5jAn}
}
The development of this open-sourced code was supported in part by the National Research Foundation of Korea (NRF) (No. 2021R1A4A3032834).