qinenergy/corda

Coufusion about the 'depth' of cityscapes

ganyz opened this issue · 6 comments

ganyz commented

Hello, nice work but i meet some question.

in 'data/cityscapes_loader.py' line 181-183:

depth = cv2.imread(depth_path, flags=cv2.IMREAD_ANYDEPTH).astype(np.float32) / 256. + 1.
if depth.shape != lbl.shape:
depth = cv2.resize(depth, lbl.shape[::-1], interpolation=cv2.INTER_NEAREST)
Monocular depth: in disparity form 0 - 65535

(1) Why the depth is calculated from x/256+1
(2) is it the depth or the disparity ? In the official doc of cityscapes, it say disparity = (x-1)/256

Thank you!

Hi, thanks for your interest on our work.

  • The main results we reported in the paper are based on the stereo depth. The stereo depth was generated by this paper. This corresponds to the Line 163-173. So if you are trying to reproduce our main results, the lines you are referring to do not have any impact.

  • Line 181-183 refers to the setup where we use the monocular depth. This was an ablation study in the paper. The monocular depth was generated by ourselves by training a monodepth2 model. You can find more details here #5. It is the raw output (normalized disparity-like) from the monocular self-supervised network. The scale of our monocular dataset is 0 to 65535. The x/256 + 1 part is simply a rescaling to make it more consistent with SYNTHIA's inversed depth. We did not use the official Cityscapes disparity dataset for the monocular experiments, because it is based on stereo images. You won't need them in all our experimenets.

Added comments in data tree to avoid confusion.

ganyz commented

Hi, thanks for your interest on our work.

  • The main results we reported in the paper are based on the stereo depth. The stereo depth was generated by this paper. This corresponds to the Line 163-173. So if you are trying to reproduce our main results, the lines you are referring to do not have any impact.
  • Line 181-183 refers to the setup where we use the monocular depth. This was an ablation study in the paper. The monocular depth was generated by ourselves by training a monodepth2 model. You can find more details here How to obtain your depth datasets? #5. It is the raw output (normalized disparity-like) from the monocular self-supervised network. The scale of our monocular dataset is 0 to 65535. The x/256 + 1 part is simply a rescaling to make it more consistent with SYNTHIA's inversed depth. We did not use the official Cityscapes disparity dataset for the monocular experiments, because it is based on stereo images. You won't need them in all our experimenets.

So useful for me !
I also can find the pre-preparation from the SFSU_synthetic git (though in matlab >.<)
Thank you !

ganyz commented

Hi, thanks for your interest on our work.

  • The main results we reported in the paper are based on the stereo depth. The stereo depth was generated by this paper. This corresponds to the Line 163-173. So if you are trying to reproduce our main results, the lines you are referring to do not have any impact.

I have another question about the Line 170-173:

170 depth = io.loadmat(depth_path)["depth_map"]
171 depth = np.clip(depth, 0., 655.35)
172 depth[depth<0.1] = 655.35
173 depth = 655.36 / (depth + 0.01)

I don't understand the clip and 1/depth operation, could you explain them ?
Thank you !

  • 1/depth: From the paper: "Following [3, 41], inverse depth is adopted for the depth learning losses."
  • clip: There are "inf" values in the Cityscapes stereo depth. We clip it and process it to make them equal to the ignore value (ignore_index = 1 defined in the BerhuLoss) in the loss term.
ganyz commented
  • 1/depth: From the paper: "Following [3, 41], inverse depth is adopted for the depth learning losses."
  • clip: There are "inf" values in the Cityscapes stereo depth. We clip it and process it to make them equal to the ignore value (ignore_index = 1 defined in the BerhuLoss) in the loss term.

Thank you ! Nice man