SPengLiang/DID-M3D

The Monocular setting and performance comparison

Closed this issue · 2 comments

Hi, Thanks for your great work! But I'm confused about the Monocular 3D detection setting setup in your paper. You generate the sparse depth maps by projecting the LiDAR point clouds into the image frames. After going through the depth completion and series conversion, these depth maps serve as the depth label for visual depth or attribute depth learning. But this operation means you will introduce the extra LiDAR data, which is not pure Monocular 3D detection. So I wonder whether the performance comparison between the other Monocular 3D detection methods without extra data is fair.

Thanks for your interest in this work! Yes, you are right, our method needs LiDAR data. It is because the visual depth is obtained by LiDAR, thus our method cannot work without LiDAR during training.
Actually, most training data in self-driving applications has LiDAR data, because the 3D box annotation process is performed on LiDAR points. Many previous monocular methods also employ LiDAR data, e.g., CaDDN, DD3D, PCT, AutoShape, DDMP-3D, PatchNet ... And we follow their evaluation settings.

Ok, I got it, thanks for your reply.