I'm a little confused about the inconsistency in the number of datasets

Question

I'm a little confused about the inconsistency in the number of datasets

BayMaxBHL opened this issue 2 years ago · 5 comments

About the NYUv2：
Paper：“We train our network on a 50K RGB-Depth pairs subset following previous works.”
dataset_prepare.md：“Following previous work, I utilize about 50K image-depth pairs as our training set and standard 652 images as the validation set. ”
nyu_train.txt：Only 24,231 pairs of data.

Answer 1 · 2022-11-16T07:08:25.000Z

Follow python utils/download_from_gdrive.py 1AysroWpfISmm-yRFGBgFTrLy6FjQwvwP sync.zip:
The sync.zip has only 72,792 files and 284 folders

Answer 2 · 2022-11-16T07:10:46.000Z

I am not sure whether the amount of training data in the paper is the same as that in nyu_train.txt.

Answer 3 · 2022-11-16T07:17:00.000Z

It's interesting that everyone's papers say 50k training data. Maybe everyone uses sync.zip

Answer 4 · 2022-11-16T08:02:46.000Z

Thanks for finding the typo in our paper. It is true that everyone uses sync.zip. :D
As you can see in the log file, we use 24231 pairs for training.

Answer 5 · 2022-11-17T01:52:17.000Z

From 《From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation》：
“The NYU Depth V2 dataset [42] contains 120K RGB and depth pairs having a size of 480 × 640 acquired as video sequences using a Microsoft Kinect from 464 indoor scenes. We follow the official train/test split as previous works, using 249 scenes for training and 215 scenes (654 images) for testing. From the total 120K image-depth pairs, due to asynchronous capturing rates between RGB images and depth maps, we associate and sample them using timestamps by even-spacing in time, resulting in 24231 imagedepth pairs for the training set. Using raw depth images and camera projections provided by the dataset, we align the image-depth pairs for accurate pixel registrations. We use κ = 10 for this dataset.”