Misalignment in unprojected RGBD images
ayushjain1144 opened this issue · 5 comments
Hi @pengsongyou ,
Thank you for your excellent work, and especially in preprocessing the data into a common format!
I am using your 2D pre-processed matterport data and I am unprojecting it to 3D using the provided intrinsics and extrinsics. It gave me the following pointcloud, which does not look good I think:
wandb link: https://wandb.ai/ayushjain1144/m3d/runs/kdbjp0xg?workspace=user-ayushjain1144
I did some further analysis and found that all images unprotected from same camera id are well aligned, but there seems to be some translation misalignment when combining images from different cameras. For eg: in this wandb link, unprojected
is the pointcloud from same camera uiud, unprojected_all
is combined pointcloud from all these cameras.
I am fairly certain that my unprojection code is ok, because it works well on scannet. Do you have any ideas on what might be wrong? Do the visualizations look ok on your side?
Thank you!
Hi @ayushjain1144 thanks for your interest in our work! I am not sure how exactly you do your unprojection from 2D to 3D, but there might be some differences between ScanNet and Matterport3D camera poses. I am not sure what the problem is, but can you try one thing:
pose[:3, 1] *= -1.0
pose[:3, 2] *= -1.0
Maybe this might make a difference. If not, maybe you should check if there are some differences in Matterport3D and ScanNet poses.
Best
Songyou
Hi,
thank you for your reply. I am not using raw matterport data but your processed data where I think you already did that processing over the poses. I also tried doing this processing again over your data, which just makes it into the original poses, and the misalignment is still there.
My unprojection code is pretty standard I think and has worked for several other datasets too. does unprotected pointclouds on your side looked ok?
def unproject(intrinsics, poses, depths, mask_valid=True):
"""
Inputs:
intrinsics: B X V X 3 X 3
poses: B X V X 4 X 4 (torch.tensor)
depths: B X V X H X W (torch.tensor)
Outputs:
world_coords: B X V X H X W X 3 (all valid 3D points)
valid: B X V X H X W (bool to indicate valid points)
can be used to index into RGB images
to get N X 3 valid RGB values
"""
B, V, H, W = depths.shape
fx, fy, px, py = intrinsics[..., 0, 0][..., None], intrinsics[..., 1, 1][..., None], intrinsics[..., 0, 2][..., None], intrinsics[..., 1, 2][..., None]
y = torch.arange(0, H).to(depths.device)
x = torch.arange(0, W).to(depths.device)
y, x = torch.meshgrid(y, x)
x = x[None, None].repeat(B, V, 1, 1).flatten(2)
y = y[None, None].repeat(B, V, 1, 1).flatten(2)
z = depths.flatten(2)
x = (x - px) * z / fx
y = (y - py) * z / fy
cam_coords = torch.stack([
x, y, z, torch.ones_like(x)
], -1)
world_coords = (poses @ cam_coords.permute(0, 1, 3, 2)).permute(0, 1, 3, 2)
world_coords = world_coords[..., :3] / world_coords[..., 3][..., None]
world_coords = world_coords.reshape(B, V, H, W, 3)
if mask_valid:
world_coords[depths == 0] = -10
return world_coords
I did try it and it makes things worse very weirdly:
To be precise, this is what I did:
poses = torch.from_numpy(np.array(poses)).float().cuda()
poses[..., :3, 1] *= -1.0
poses[..., :3, 2] *= -1.0
earlier the images were aligned per camera uiud, but multiplying with -1 breaks that too. here is also a wandb link to visualizations: https://wandb.ai/ayushjain1144/m3d/runs/nvq21cwp?workspace=user-ayushjain1144
Thank you Songyou for your reply and continued help!
I did try it and it makes things worse very weirdly:
To be precise, this is what I did:poses = torch.from_numpy(np.array(poses)).float().cuda() poses[..., :3, 1] *= -1.0 poses[..., :3, 2] *= -1.0
earlier the images were aligned per camera uiud, but multiplying with -1 breaks that too. here is also a wandb link to visualizations: https://wandb.ai/ayushjain1144/m3d/runs/nvq21cwp?workspace=user-ayushjain1144
Thank you Songyou for your reply and continued help!
That's quite strange... Can you first try to run our feature fusion code, see whether it is correct. We project from 3D to 2D to obtain the per-point features. If it works, maybe you can check if you can adapt your unprojection from 2D to 3D accordingly?
Best
Songyou