Strange (incorrect) depth image on Waymo dataset

Hi there,

I've been working with the Waymo dataset using the neurad method of configuration settings, following the guidelines outlined in the associated paper. However, I'm encountering unexpected results that need further investigation.
Here is the comparison with Pandaset seq 011:

Strange Depth Image: The depth image generated appears to be incorrect. While it correctly depicts the sky as distant, it also portrays the ground as "far," which doesn't align with expectations.

Blurry Right Sides in Rendered RGB Image: In both scenes, the right sides of the rendered RGB images appear very blurry.

Here is the training loss profile:

I've double-checked my implementation against the paper's specifications, but I'm still unable to pinpoint the source of these discrepancies.

Any hints or ideas you could provide to help diagnose and resolve these issues would be appreciated.

Thanks for your attention to this matter.

Hi,

The depth images indeed look very weird. Reminds me of the performance when running image-only training. Then the model often cheats and puts the ground very far away (as the low contrast offers little to no depth information). Have you tried increasing the depth loss multiplier, or oversampling the lidar rays (increasing lidar rays per batch)?

Hi @georghess,

Thank you for the kind reply.

This issue has been solved recently as we have found some error in lidar points when export/convert Waymo into NeuRAD dataparser.

I close this as solved.
Thank you again for your great work!

Great! Feel free to open a PR with your parser btw :)

@Crescent-Saturn Did you happen to account for Waymo's rolling shutter? As far as I know, the Waymo dataset uses a column-by-column rolling shutter, but it seems like the rolling shutter compensation in the neurad code compensates for row-by-row rolling shutter, so it would require some minor modifications.

@Crescent-Saturn Did you happen to account for Waymo's rolling shutter? As far as I know, the Waymo dataset uses a column-by-column rolling shutter, but it seems like the rolling shutter compensation in the neurad code compensates for row-by-row rolling shutter, so it would require some minor modifications.

@MartinEthier Thank you for this kind reminder. Yeah, it was one of the reason that caused this poor performance. In addition to column-by-column rolling shutter in Waymo, it shows that the direction depends on the sequence. As far as we found, in some sequences the rolling shutter direction is LEFT_2_RIGHT, in others it is RIGHT_2_LEFT. Thus, we made the corresponding modifications on considering this.

Yeah I also noticed that the direction can change based on the camera. Would you mind providing your modification to the rolling shutter compensation code? I'm also working with the Waymo dataset and it would save me some trouble lol

From here:

neurad-studio/nerfstudio/cameras/cameras.py

Lines 922 to 929 in 445f41d

    
           if self.metadata and "rolling_shutter_offsets" in self.metadata and "velocities" in self.metadata: 
        
               cam_idx = camera_indices.squeeze(-1) 
        
               heights, rows = self.height[cam_idx], coords[..., 0:1] 
        
               duration = self.metadata["rolling_shutter_offsets"][cam_idx].diff() 
        
               time_offsets = rows / heights * duration + self.metadata["rolling_shutter_offsets"][cam_idx][..., 0:1] 
        
               origins = origins + self.metadata["velocities"][cam_idx] * time_offsets 
        
               times = times + time_offsets 
        
               del metadata["rolling_shutter_offsets"]  # it has served its purpose

we modified L924-L926 for column-by-column into:

offsets = self.metadata["rolling_shutter_offsets"][cam_idx]
duration = offsets.diff()

width, cols = self.width[cam_idx], coords[..., 1:2]
time_offsets = cols / width * duration + offsets[..., 0:1]

hope this helps.

Perfect, thanks!

	if self.metadata and "rolling_shutter_offsets" in self.metadata and "velocities" in self.metadata:
	cam_idx = camera_indices.squeeze(-1)
	heights, rows = self.height[cam_idx], coords[..., 0:1]
	duration = self.metadata["rolling_shutter_offsets"][cam_idx].diff()
	time_offsets = rows / heights * duration + self.metadata["rolling_shutter_offsets"][cam_idx][..., 0:1]
	origins = origins + self.metadata["velocities"][cam_idx] * time_offsets
	times = times + time_offsets
	del metadata["rolling_shutter_offsets"] # it has served its purpose