ACAN: Attention-based Context Aggregation Model for Monocular Depth Estimation.
Pytorch implementation of ACAN for monocular depth estimation.
More detalis arXiv
Architecture
Visualization of Attention Maps
- The first and second row respectively denotes the attention maps trained with and w/o
Attention Loss
.
Soft Inference VS Hard Inference
- The third column and the fourth column respectively denotes the results of soft inference and hard inference.
Quick start
Requirements
torch=0.4.1
torchvision
tensorboardX
pillow
tqdm
h5py
scikit-learn
cv2
This code was tested with Pytorch 0.4.1, CUDA 9.1 and Ubuntu 18.04.
Training takes about 48 hours with the default parameters on the KITTI dataset on a Nvidia GTX1080Ti machine.
Data
There are two main datasets available:
KITTI
We used Eigen split of the data, amounting for approximately 22k training samples, you can find them in the kitti_path_txt folder.
NYU v2
We download the raw dataset, which weights about 428GB. We use the toolbox of NYU v2 to sample around 12k training samples, you can find them in the matlab folder and use Get_Dataset.m
to produce the training set or download the processed dataset from BaiduCloud.
Training
Warning: The input sizes need to be mutiples of 8.
bash ./code/train_nyu_script.sh
Testing
bash ./code/test_nyu_script.sh
Attention Map
If you want to get the task-specific attention maps, you should first train your model from scratch, then finetuning with attention loss, by setting
BETA=1
RESUME=./workspace/log/best.pkl
EPOCHES=10