zhyever/Monocular-Depth-Estimation-Toolbox

I'm a little confused about the inconsistency in the number of datasets

BayMaxBHL opened this issue · 5 comments

About the NYUv2:
Paper:“We train our network on a 50K RGB-Depth pairs subset following previous works.”
dataset_prepare.md:“Following previous work, I utilize about 50K image-depth pairs as our training set and standard 652 images as the validation set. ”
nyu_train.txt:Only 24,231 pairs of data.

Follow python utils/download_from_gdrive.py 1AysroWpfISmm-yRFGBgFTrLy6FjQwvwP sync.zip:
The sync.zip has only 72,792 files and 284 folders

I am not sure whether the amount of training data in the paper is the same as that in nyu_train.txt.

It's interesting that everyone's papers say 50k training data. Maybe everyone uses sync.zip

Thanks for finding the typo in our paper. It is true that everyone uses sync.zip. :D
As you can see in the log file, we use 24231 pairs for training.

From 《From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation》:
“The NYU Depth V2 dataset [42] contains 120K RGB and depth pairs having a size of 480 × 640 acquired as video sequences using a Microsoft Kinect from 464 indoor scenes. We follow the official train/test split as previous works, using 249 scenes for training and 215 scenes (654 images) for testing. From the total 120K image-depth pairs, due to asynchronous capturing rates between RGB images and depth maps, we associate and sample them using timestamps by even-spacing in time, resulting in 24231 imagedepth pairs for the training set. Using raw depth images and camera projections provided by the dataset, we align the image-depth pairs for accurate pixel registrations. We use κ = 10 for this dataset.”