This repository contains code and models for our paper:
Vision Transformers for Dense Prediction
René Ranftl, Alexey Bochkovskiy, Vladlen Koltun
- [March 2021] Initial release of inference code and models
- Download the model weights and place them in the
weights
folder:
Monodepth:
Segmentation:
-
Set up dependencies:
conda install pytorch torchvision opencv pip install timm
The code was tested with Python 3.7, PyTorch 1.8.0, OpenCV 4.5.1, and timm 0.4.5
-
Place one or more input images in the folder
input
. -
Run a monocular depth estimation model:
python run_monodepth.py
Or run a semantic segmentation model:
python run_segmentation.py
-
The results are written to the folder
output_monodepth
andoutput_segmentation
, respectively.
Use the flag -t
to switch between different models. Possible options are dpt_hybrid
(default) and dpt_large
.
import torch
import torch.hub
import cv2
model = torch.hub.load('Tord-Zhang/DPT', 'DPT', source='github', pretrained=True)
model = model.cuda()
transform = torch.hub.load('Tord-Zhang/DPT', 'transforms', source='github')
img_reader = torch.hub.load('Tord-Zhang/DPT', 'read_image', source='github')
img = img_reader("input/test.png")
img_input = transform({"image": img})["image"]
with torch.no_grad():
sample = torch.from_numpy(img_input).cuda().unsqueeze(0)
prediction = model.forward(sample)
prediction = (
torch.nn.functional.interpolate(
prediction.unsqueeze(1),
size=img.shape[:2],
mode="bicubic",
align_corners=False,
)
.squeeze()
.cpu()
.numpy()
)
Please cite our papers if you use this code or any of the models.
@article{Ranftl2021,
author = {Ren\'{e} Ranftl and Alexey Bochkovskiy and Vladlen Koltun},
title = {Vision Transformers for Dense Prediction},
journal = {ArXiv preprint},
year = {2021},
}
@article{Ranftl2020,
author = {Ren\'{e} Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun},
title = {Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
year = {2020},
}
Our work builds on and uses code from timm and PyTorch-Encoding. We'd like to thank the authors for making these libraries available.
MIT License