Pytorch implementation of ACAN for monocular depth estimation.
More detalis arXiv
- The first and second row respectively denotes the attention maps trained with and w/o
Attention Loss
.
- The third column and the fourth column respectively denotes the results of soft inference and hard inference.
torch=0.4.1
torchvision
tensorboardX
pillow
tqdm
h5py
scikit-learn
cv2
This code was tested with Pytorch 0.4.1, CUDA 9.1 and Ubuntu 18.04.
Training takes about 48 hours with the default parameters on the KITTI dataset on a Nvidia GTX1080Ti machine.
There are two main datasets available:
We used Eigen split of the data, amounting for approximately 22k training samples, you can find them in the kitti_path_txt folder.
We download the raw dataset, which weights about 428GB. We use the toolbox of NYU v2 to sample around 12k training samples, you can find them in the matlab folder and use Get_Dataset.m
to produce the training set or download the processed dataset from BaiduCloud.
Warning: The input sizes need to be mutiples of 8.
bash ./code/train_nyu_script.sh
bash ./code/test_nyu_script.sh
If you want to get the task-specific attention maps, you should first train your model from scratch, then finetuning with attention loss, by setting
BETA=1
RESUME=./workspace/log/best.pkl
EPOCHES=10