/DfFintheWild

[ECCV2022] Official implementation of 'Learning Depth from Focus in the Wild'

Primary LanguagePythonMIT LicenseMIT

Learning Depth from Focus in the Wild

[ECCV2022] Official pytorch implementation of "Learning Depth from Focus in the Wild"

Requirements

  • Python == 3.7.7
  • imageio==2.16.1
  • h5py==3.6.0
  • numpy==1.21.5
  • matplotlib==3.5.1
  • opencv-python==4.5.5.62
  • OpenEXR==1.3.7
  • Pillow==7.2.0
  • scikit-image==0.19.2
  • scipy==1.7.3
  • torch==1.6.0
  • torchvision==0.7.0
  • tensorboard==2.8.0
  • tqdm==4.46.0
  • mat73==0.58
  • typing-extensions==4.1.1

Depth estimation Network

  • Our depth estimation network is implemented based on the codes released by PSMNet [1] and CFNet [2].

1. Download Datasets

2. Use pretrained model

Or train the model by using train codes.

  • Put the datasets in folders in 'train_codes/Datasets/' .
  • Run train codes in 'train_codes'

python train_code_[Dataset].py --lr [learning rate]

3. Run test.py

python test.py --dataset [Dataset]

Simulator

1. Download NYU v2 dataset [4]

2. Run the code.

python synthetic_blur_movement.py

End-To-End Network

1. Upload your dataset.

  • Put your focal stacks and meta-data in 'End_to_End/Datasets/folder'.
  • The name of metadata should be 'focus_distance.txt' and 'focal_length.txt' for focus distances and focal length of lens, respectively.
  • The pretrained weight is trained by the synthetic dataset whose focal stacks consist of 10 images generated by our simulator.
  • Please refer the uploaded dataset( folder 'ball')

2. Run test_real_scenes.py in 'End_to_End'

python test_real_scenes.py

Results

  • Before alignment network (input focal stack)

  • After alignment network (aligned focal stack)

  • After depth estimation network (depth map)

Limitation

  • Our network only handles 3 basic motions, which can't cover all motions, so performance is poor when dealing with complex motions.
  • There are rooms between the synthetic dataset generated by the simulator and real datasets. Please note that handling the parameters of our simulator and tuning the training epochs will yield better results than ours.

Sources

[1] Chang, Jia-Ren, and Yong-Sheng Chen. "Pyramid stereo matching network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. code paper

[2] Shen, Zhelun, Yuchao Dai, and Zhibo Rao. "Cfnet: Cascade and fused cost volume for robust stereo matching." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. code paper

[3] Abuolaim, Abdullah, et al. "Learning to reduce defocus blur by realistically modeling dual-pixel data." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. code paper

[4]Silberman, Nathan, et al. "Indoor segmentation and support inference from rgbd images." European conference on computer vision. Springer, Berlin, Heidelberg, 2012. page

[5]Hazirbas, Caner, et al. "Deep depth from focus." Asian Conference on Computer Vision. Springer, Cham, 2018. code paper

[6]Maximov, Maxim, Kevin Galim, and Laura Leal-Taixé. "Focus on defocus: bridging the synthetic to real domain gap for depth estimation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. code paper

[7]Wang, Ning-Hsu, et al. "Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. code paper

[8] Scharstein, Daniel, et al. "High-resolution stereo datasets with subpixel-accurate ground truth." German conference on pattern recognition. Springer, Cham, 2014. page

[9] Honauer, Katrin, et al. "A dataset and evaluation methodology for depth estimation on 4D light fields." Asian Conference on Computer Vision. Springer, Cham, 2016. page

[10] Herrmann, Charles, et al. "Learning to autofocus." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. page